In order to sort the dataframe in pyspark we will be using orderBy() function. orderBy() Function in pyspark sorts the dataframe in by single column and multiple column. It also sorts the dataframe in pyspark by descending order or ascending order. Let’s see an example of each.
- Sort the dataframe in pyspark by single column – ascending order
- Sort the dataframe in pyspark by single column – descending order
- Sorting the dataframe in pyspark by multiple columns – ascending order
- Sorting the dataframe in pyspark by multiple columns – descending order
Syntax:
df.orderBy(‘colname1’,‘colname2’,ascending=False)
df – dataframe
colname1 – Column name
ascending = False – sort by descending order
ascending= True – sort by ascending order
We will be using dataframe df_student_detail
Sort the dataframe in pyspark by single column – descending order
orderBy() function takes up the column name as argument and sorts the dataframe by column name. It also takes another argument ascending =False which sorts the dataframe by decreasing order of the column
## Sort dataframe in descending - sort by single column df_student_detail1 = df_student_detail.orderBy('science_score', ascending=False) df_student_detail1.show()
so the sorted dataframe will be
Sort the dataframe in pyspark by single column – ascending order
orderBy() function takes up the column name as argument and sorts the dataframe by column name. orderBy() function sorts the dataframe by ascending order of the column
## Sort dataframe in ascending - sort by single column df_student_detail1 = df_student_detail.orderBy('science_score') df_student_detail1.show()
so the sorted dataframe will be
Sort the dataframe in pyspark by multiple columns – descending order
orderBy() function takes up the two column name as argument and sorts the dataframe by first column name and then by second column both by decreasing order
## Sort dataframe in descending - sort by multiple column df_student_detail1 = df_student_detail.orderBy('grad_score','science_score', ascending=False) df_student_detail1.show()
So the sorted dataframe will be
Sort the dataframe in pyspark by multiple columns – ascending order
orderBy() function takes up the two column name as argument and sorts the dataframe by first column name and then by second column both by ascending order
## Sort dataframe in ascending - sort by multiple column df_student_detail1 = df_student_detail.orderBy('grad_score','science_score') df_student_detail1.show()
So the sorted dataframe will be
Other Related Topics :
- Simple random sampling and stratified sampling in pyspark – Sample(), SampleBy()
- Join in pyspark (Merge) inner , outer, right , left join in pyspark
- Get duplicate rows in pyspark
- Quantile rank, decile rank & n tile rank in pyspark – Rank by Group
- Populate row number in pyspark – Row number by Group
- Percentile Rank of the column in pyspark
- Mean of two or more columns in pyspark
- Sum of two or more columns in pyspark
- Row wise mean, sum, minimum and maximum in pyspark
- Rename column name in pyspark – Rename single and multiple column
- Typecast Integer to Decimal and Integer to float in Pyspark
- Get number of rows and number of columns of dataframe in pyspark
- Extract Top N rows in pyspark – First N rows
- Absolute value of column in Pyspark – abs() function
- Set Difference in Pyspark – Difference of two dataframe
- Union and union all of two dataframe in pyspark (row bind)