In order to get difference between two dates in days, years, months and quarters in pyspark can be accomplished by using datediff() and months_between() function. datediff() Function calculates the difference between two dates in days in pyspark. Dividing the result by 365.25 we will get the difference between two dates in years in pyspark and if we divide the results by 52 we will get the difference between two dates in weeks in pyspark. Months_between() Function calculates the difference between two dates in months in pyspark. Dividing the result by 4 we will get the difference between two dates in quarter in pyspark. Let’s see an Example for each.
- Calculate difference between two dates in days in pyspark
- Calculate difference between two dates in weeks in pyspark
- Calculate difference between two dates in months in pyspark
- Calculate difference between two dates in years in pyspark
- Calculate difference between two dates in quarters in pyspark
We will be using the dataframe named df1
Calculate difference between two dates in days in pyspark
In order to calculate the difference between two dates in days we use datediff() function. datediff() function takes two argument, both are date on which we need to find the difference between two dates.
### Calculate difference between two dates in days in pyspark from pyspark.sql.functions import datediff,col df1.withColumn("diff_in_days", datediff(col("current_time"),col("birthdaytime"))).show(truncate=False)
So the resultant dataframe will be
Calculate difference between two dates in months in pyspark
In order to calculate the difference between two dates in months we use months_between() function. months_between() function takes two argument, both are date on which we need to find the difference between two dates in months.
### Calculate difference between two dates in months in pyspark from pyspark.sql.functions import months_between,col df1.withColumn("diff_in_months", months_between(col("current_time"),col("birthdaytime"))).show(truncate=False)
So the resultant dataframe will be
Calculate difference between two dates in weeks in pyspark
In order to calculate the difference between two dates in weeks we use datediff() function. datediff() function takes two argument, both are date and returns the difference between two dates in days. We divide the result by 52 to calculate the difference between two dates in weeks as shown below
### Calculate difference between two dates in week in pyspark from pyspark.sql.functions import datediff,col df1.withColumn("diff", datediff(col("current_time"),col("birthdaytime"))/52).show()
So the resultant dataframe will be
Calculate difference between two dates in quarters in pyspark
In order to calculate the difference between two dates in months we use months_between() function. months_between() function takes two argument, both are date and returns the difference between two dates in months. We divide the result by 4 to calculate the difference between two dates in quarter as shown below
### Calculate difference between two dates in quarters in pyspark from pyspark.sql.functions import months_between,col df1.withColumn("diff_in_quaters", months_between(col("current_time"),col("birthdaytime"))/4).show(truncate=False)
So the resultant dataframe will be
Calculate difference between two dates in years in pyspark
In order to calculate the difference between two dates in months we use datediff() function. datediff() function takes two argument, both are date and returns the difference between two dates in days. We divide the result by 365.25 to calculate the difference between two dates in years as shown below
### Calculate difference between two dates in years in pyspark from pyspark.sql.functions import datediff,col df1.withColumn("diff_in_years", datediff(col("current_time"),col("birthdaytime"))/365.25).show()
So the resultant dataframe will be
similar to difference between two dates in days, years months and quarters in pyspark. Lets look at difference between two timestamps in next chapter.
Other related topics :
- Get week number from date in Pyspark
- Get difference between two timestamps in hours, minutes & seconds in Pyspark
- Populate current date and current timestamp in pyspark
- Get day of month, day of year, day of week from date in pyspark
- Add Hours, minutes and seconds to timestamp in Pyspark
- subtract or Add days, months and years to timestamp in Pyspark
- Get Hours, minutes, seconds and milliseconds from timestamp in Pyspark
- Get Month, Year and Quarter from date in Pyspark