In order to get string length of column in pyspark we will be using length() Function. We look at an example on how to get string length of the specific column in pyspark. we will also look at an example on filter using the length of the column.
- Get string length of the column in pyspark using length() function.
- Filter the dataframe using length of the column in pyspark
Syntax:
colname – column name
We will be using the dataframe named df_books
Get String length of column in Pyspark:
In order to get string length of the column we will be using length() function. which takes up the column name as argument and returns length
### Get String length of the column in pyspark import pyspark.sql.functions as F df = df_books.withColumn("length_of_book_name", F.length("book_name")) df.show(truncate=False)
So the resultant dataframe with length of the column appended to the dataframe will be
Filter the dataframe using length of the column in pyspark:
Filtering the dataframe based on the length of the column is accomplished using length() function. we will be filtering the rows only if the column “book_name” has greater than or equal to 20 characters.
### Filter using length of the column in pyspark from pyspark.sql.functions import length df_books.where(length(col("book_name")) >= 20).show()
So the resultant dataframe which is filtered based on the length of the column will be
Other Related Topics:
- Remove leading zero of column in pyspark
- Left and Right pad of column in pyspark –lpad() & rpad()
- Add Leading and Trailing space of column in pyspark – add space
- Remove Leading, Trailing and all space of column in pyspark – strip & trim space
- String split of the columns in pyspark
- Repeat the column in Pyspark
- Get Substring of the column in Pyspark
- Typecast string to date and date to string in Pyspark
- Typecast Integer to string and String to integer in Pyspark
- Extract First N and Last N character in pyspark
- Convert to upper case, lower case and title case in pyspark
- Add leading zeros to the column in pyspark
- Concatenate two columns in pyspark.