In order to Get list of columns and its data type in pyspark we will be using dtypes function and printSchema() function . We will explain how to get list of column names of the dataframe along with its data type in pyspark with an example.
- Get List of column names in pyspark dataframe.
- Get List of columns and its datatype in pyspark using dtypes function.
- Extract List of column name and its datatype in pyspark using printSchema() function
- we can also get the datatype of single specific column in pyspark.
We have used two methods to get list of column name and its data type in Pyspark.
We will use the dataframe named df_basket1.
Get List of columns in pyspark:
To get list of columns in pyspark we use dataframe.columns syntax
df_basket1.columns
So the list of columns will be
Get list of columns and its data type in pyspark
Method 1: using printSchema() function.
df_basket1.printSchema()
printSchema() function gets the data type of each column as shown below
Method 2: using dtypes function.
df_basket1.dtypes
dtypes function gets the data type of each column as shown below
Get data type of single column in pyspark using printSchema() – Method 1
dataframe.select(‘columnname’).printschema() is used to select data type of single column
df_basket1.select('Price').printSchema()
We use select function to select a column and use printSchema() function to get data type of that particular column. So in our case we get the data type of ‘Price’ column as shown above.
Get data type of single column in pyspark using dtypes – Method 2
dataframe.select(‘columnname’).dtypes is syntax used to select data type of single column
df_basket1.select('Price').dtypes
We use select function to select a column and use dtypes to get data type of that particular column. So in our case we get the data type of ‘Price’ column as shown above.
Other Related Topics:
- Get Substring of the column in Pyspark
- Get String length of column in Pyspark
- Typecast string to date and date to string in Pyspark
- Typecast Integer to string and String to integer in Pyspark
- Extract First N and Last N character in pyspark
- Convert to upper case, lower case and title case in pyspark
- Add leading zeros to the column in pyspark
- Concatenate two columns in pyspark
- Simple random sampling and stratified sampling in pyspark – Sample(), SampleBy()
- Join in pyspark (Merge) inner , outer, right , left join in pyspark
- Get duplicate rows in pyspark
- Get data type of column in Pyspark (single & Multiple columns)
- Quantile rank, decile rank & n tile rank in pyspark – Rank by Group
- Populate row number in pyspark – Row number by Group
- Percentile Rank of the column in pyspark
- Mean of two or more columns in pyspark