In order to Rearrange or reorder the column in pyspark we will be using select function. To reorder the column in ascending order we will be using Sorted function. To reorder the column in descending order we will be using Sorted function with an argument reverse =True. We also rearrange the column by position. lets get clarity with an example.
- Rearrange or Reorder the column in pyspark
- Reorder the column names in pyspark in ascending order
- Reorder the column names in pyspark in descending order
- Reorder the column by position in pyspark
We will use the dataframe named df_basket1.
Rearrange the column in pyspark :
Using select() function in pyspark we can select the column in the order which we want which in turn rearranges the column according to the order that we want which is shown below
df_basket_reordered = df_basket1.select("price","Item_group","Item_name") df_basket_reordered.show()
so the resultant dataframe with rearranged columns will be
Reorder the column in pyspark in ascending order
With the help of select function along with the sorted function in pyspark we first sort the column names in ascending order. Column name is passed to the sorted () function and then it is selected using select function as shown below.
## Reorder column by ascending order df_basket_reordered = df_basket1.select(sorted(df_basket1.columns)) df_basket_reordered.show()
So the resultant dataframe with columns sorted in ascending order will be
Reorder the column in pyspark in descending order
Column name is passed to the sorted () function along with the argument reverse=True which sorts the column in descending order and then it is selected using select function as shown below.
## Reorder column by descending order df_basket_reordered = df_basket1.select(sorted(df_basket1.columns,reverse=True)) df_basket_reordered.show()
Reorder the column by position in pyspark :
We can use the select function to reorder the column by position. In the below example the columns are reordered in such away that 2nd ,0th and 1st column takes the position of 0 to 2 respectively
## Reorder column by position df_basket1.select(df_basket1.columns[2],df_basket1.columns[0],df_basket1.columns[1]).show()
so the resultant dataframe with column reodered by position will be
Other Related Topics:
- Round up, Round down and Round off in pyspark – (Ceil & floor pyspark)
- Sort the dataframe in pyspark – Sort on single column & Multiple column
- Drop rows in pyspark – drop rows with condition
- Distinct value of a column in pyspark
- Distinct value of dataframe in pyspark – drop duplicates
- Count of Missing (NaN,Na) and null values in Pyspark
- Mean, Variance and standard deviation of column in Pyspark
- Maximum or Minimum value of column in Pyspark
- Raised to power of column in pyspark – square, cube , square root and cube root in pyspark
- Drop column in pyspark – drop single & multiple columns
- Subset or Filter data with multiple conditions in pyspark
- Frequency table or cross table in pyspark – 2 way cross table
- Groupby functions in pyspark (Aggregate functions) – Groupby count, Groupby sum, Groupby mean, Groupby min and Groupby max
- Descriptive statistics or Summary Statistics of dataframe in pyspark
- cumulative sum of column and group in pyspark
- Calculate Percentage and cumulative percentage of column in pyspark
- Select column in Pyspark (Select single & Multiple columns)
- Get data type of column in Pyspark (single & Multiple columns)
- Get List of columns and its data type in Pyspark