In order to typecast string to date in pyspark we will be using to_date() function with column name and date format as argument, To typecast date to string in pyspark we will be using cast() function with StringType() as argument. Let’s see an example of type conversion or casting of string column to date column and date column to string column in pyspark.
- Type cast string column to date column in pyspark using cast() function
- Type cast date column to string column in pyspark
We will be using the dataframe named df_student
Typecast string column to date column in pyspark:
First let’s get the datatype of “birthday” column as shown below
### Get datatype of birthday column df_student.select("birthday").dtypes
so the resultant data type of birthday column is string
Now let’s convert the birthday column to date using to_date() function with column name and date format passed as arguments, which converts the string column to date column in pyspark and it is stored as a dataframe named output_df
########## Type cast string column to date column in pyspark from pyspark.sql.functions import to_date df1 = df_student.withColumn('birthday',to_date(df_student.birthday, 'dd-MM-yyyy'))
Now let’s get the datatype of birthday column as shown below
### Get datatype of birthday output_df.select("birthday").dtypes
so the resultant data type of birthday column is date
Type cast date column to string column in pyspark:
First let’s get the datatype of birthday column from output_df as shown below
### Get datatype of birthday column output_df.select("birthday").dtypes
so the resultant data type of birthday column is date
Now let’s convert the birthday column to string using cast() function with StringType() passed as an argument which converts the date column to string column in pyspark and it is stored as a dataframe named output_df
########## Type cast date column to string column in pyspark from pyspark.sql.types import StringType output_df = df_student.withColumn("birthday",df_student["birthday"].cast(StringType()))
Now let’s get the datatype of birthday column as shown below
### Get datatype of birthday column output_df.select("birthday").dtypes
So the resultant data type of birthday column is string
Other Related Topics :
- Typecast Integer to string and String to integer in Pyspark
- Extract First N and Last N character in pyspark
- Convert to upper case, lower case and title case in pyspark
- Add leading zeros to the column in pyspark
- Concatenate two columns in pyspark
- Simple random sampling and stratified sampling in pyspark – Sample(), SampleBy()
- Join in pyspark (Merge) inner , outer, right , left join in pyspark
- Get duplicate rows in pyspark
- Quantile rank, decile rank & n tile rank in pyspark – Rank by Group
- Populate row number in pyspark – Row number by Group
- Percentile Rank of the column in pyspark
- Mean of two or more columns in pyspark
- Sum of two or more columns in pyspark
- Row wise mean, sum, minimum and maximum in pyspark
- Rename column name in pyspark – Rename single and multiple column
- Typecast Integer to Decimal and Integer to float in Pyspark
- Get number of rows and number of columns of dataframe in pyspark