In order to add leading zeros to the column in pyspark we will be using concat() function. There are some other ways to add preceding zeros to the column in pyspark using format_string() function. Let’s see an example for each method
- Add leading or preceding zeros to the column in pyspark using concat() function
- Add lead zeros in to the column in pyspark using format_string() function
- Add preceding zeros to the column in pyspark using lpad() function
We will be using dataframe df_student_detail
Add leading zeros to the column in pyspark using concat() function – Method 1
We will be Using lit() and concat() function to add the leading zeros to the column in pyspark. lit() function takes up ‘00’ and concatenate with ‘grad_score’ column there by adding leading zeros to the column
### Add leading zeros to the column in pyspark -1 from pyspark.sql import functions as sf df_student_detail.withColumn('joined_column',sf.concat(sf.lit('00'), sf.col('grad_score'))).show()
So the column with leading zeros added will be
Add preceding zeros to the column in pyspark using format_string() function – Method 2
format_string() function takes up “%03d” and column name “grad_score” as argument. Which adds leading zeros to the “grad_score” column till the string length becomes 3.
### Add leading zeros to the column in pyspark - 2 from pyspark.sql import functions as sf df_student_detail.withColumn("joined_column", sf.format_string("%03d","grad_score")).show()
So the column with leading zeros added will be
Add preceding zeros to the column in pyspark using lpad() function – Method 3
lpad() function takes up “grad_score” as argument followed by 3 i.e. total string length followed by “0” which will be padded to left of the “grad_score” . Which adds leading zeros to the “grad_score” column till the string length becomes 3.
### Add leading zeros to the column in pyspark - 3 from pyspark.sql.functions import * df_student_detail = df_student_detail.withColumn('grad_score_new', lpad(df_student_detail.grad_score,3, '0')) df_student_detail.show()
So the column with leading zeros added will be
for more details you can refer this
OTHER RELATED TOPICS