In order to add padding to the left side of the column we use left pad of column in pyspark, left padding is accomplished using lpad() function. In order to add padding to the right side of the column we use right pad of column in pyspark, right padding is accomplished using rpad() function. Let’s see how to
- Left pad of the column in pyspark – lpad()
- Right pad of the column in pyspark – rpad()
- Add both left and right padding in pyspark
We will be using dataframe df_states
Add left pad of the column in pyspark
Padding is accomplished using lpad() function. lpad() Function takes column name ,length and padding string as arguments. In our case we are using state_name column and “#” as padding string so the left padding is done till the column reaches 14 characters.
### Add Left pad of the column in pyspark from pyspark.sql.functions import * df_states = df_states.withColumn('states_Name_new', lpad(df_states.state_name,14, '#')) df_states.show(truncate =False)
So the resultant left padding string and dataframe will be
Add Right pad of the column in pyspark
Padding is accomplished using rpad() function. rpad() Function takes column name ,length and padding string as arguments. In our case we are using state_name column and “#” as padding string so the right padding is done till the column reaches 14 characters.
### Add Right pad of the column in pyspark from pyspark.sql.functions import * df_states = df_states.withColumn('states_Name_new', rpad(df_states.state_name,14, '#')) df_states.show(truncate =False)
So the resultant right padding string and dataframe will be
Add Both Left and Right pad of the column in pyspark
Adding both left and right Pad is accomplished using lpad() and rpad() function. lpad() Function takes column name, length and padding string as arguments. Then again the same is repeated for rpad() function. In our case we are using state_name column and “#” as padding string so the left padding is done till the column reached 20 characters followed by right padding till the column reaches 24 characters.
#### Add both leading and Trailing space df_states = df_states.withColumn('states_Name_new', lpad(df_states.state_name,20, '#')) df_states = df_states.withColumn('states_Name_new', rpad(df_states.states_Name_new,24, '#')) df_states.show(truncate =False)
So the resultant left and right padding of the column will be
Other Related Topics:
- Remove leading zero of column in pyspark
- Left and Right pad of column in pyspark –lpad() & rpad()
- Add Leading and Trailing space of column in pyspark – add space
- Remove Leading, Trailing and all space of column in pyspark – strip & trim space
- String split of the columns in pyspark
- Repeat the column in Pyspark
- Get Substring of the column in Pyspark
- Get String length of column in Pyspark
- Typecast string to date and date to string in Pyspark
- Typecast Integer to string and String to integer in Pyspark
- Extract First N and Last N character in pyspark
- Convert to upper case, lower case and title case in pyspark
- Add leading zeros to the column in pyspark
- Concatenate two columns in pyspark