In order to Get data type of column in pyspark we will be using dtypes function and printSchema() function. dtypes function is used to get the datatype of the single column and multiple columns of the dataframe. We will explain how to get data type of single and multiple columns in Pyspark with an example.
- Get data type of single column in pyspark using printSchema() function
- Get data type of single column in pyspark using dtypes
- Get data type of multiple column in pyspark using printSchema() and dtypes
- Get data type of all the column in pyspark
We will use the dataframe named df_basket1.
Get data type of single column in pyspark using printSchema() – Method 1:
dataframe.select(‘columnname’).printschema() is used to select data type of single column
df_basket1.select('Price').printSchema()
We use select function to select a column and use printSchema() function to get data type of that particular column. So in our case we get the data type of ‘Price’ column as shown above.
Get data type of single column in pyspark using dtypes – Method 2:
dataframe.select(‘columnname’).dtypes is syntax used to select data type of single column
df_basket1.select('Price').dtypes
We use select function to select a column and use dtypes to get data type of that particular column. So in our case we get the data type of ‘Price’ column as shown above.
Get data type of multiple column in pyspark : Method 1
dataframe.select(‘columnname1′,’columnname2’).printSchema() is used to select data type of multiple columns
df_basket1.select('Price','Item_name').printSchema()
We use select function to select multiple columns and use printSchema() function to get data type of these columns. So in our case we get the data type of ‘Price’ and ‘Item_name’ column as shown above
Get data type of multiple column in pyspark using dtypes : Method 2
dataframe.select(‘columnname1′,’columnname2’).dtypes is used to select data type of multiple columns
df_basket1.select('Price','Item_name').dtypes
We use select function to select multiple columns and use dtypes function to get data type of these columns. So in our case we get the data type of ‘Price’ and ‘Item_name’ column as shown above
Get data type of all the columns in pyspark:
Method 1: using printSchema()
dataframe.printSchema() is used to get the data type of each column in pyspark.
df_basket1.printSchema()
printSchema() function gets the data type of each column as shown below
Method 2: using dtypes
dataframe.dtypes is used to get the data type of each column in pyspark
df_basket1.dtypes
dtypes function gets the data type of each column as shown below
Other Related Topics:
- Get Substring of the column in Pyspark
- Get String length of column in Pyspark
- Typecast string to date and date to string in Pyspark
- Typecast Integer to string and String to integer in Pyspark
- Extract First N and Last N character in pyspark
- Convert to upper case, lower case and title case in pyspark
- Add leading zeros to the column in pyspark
- Concatenate two columns in pyspark
- Simple random sampling and stratified sampling in pyspark – Sample(), SampleBy()
- Join in pyspark (Merge) inner , outer, right , left join in pyspark
- Get duplicate rows in pyspark
- Get data type of column in Pyspark (single & Multiple columns)
- Quantile rank, decile rank & n tile rank in pyspark – Rank by Group
- Populate row number in pyspark – Row number by Group
- Percentile Rank of the column in pyspark
- Mean of two or more columns in pyspark