Maximum and minimum value of the column in pyspark can be accomplished using aggregate() function with argument column name followed by max or min according to our need. Maximum or Minimum value of the group in pyspark can be calculated by using groupby along with aggregate() Function. We will see with an example for each
- Maximum value of the column in pyspark with example
- Minimum value of the column in pyspark with example
- Maximum value of each group of dataframe in pyspark with example
- Minimum value of each group of dataframe in pyspark with example
We will be using dataframe named df_basket1
Maximum value of the column in pyspark with example:
Maximum value of the column in pyspark is calculated using aggregate function – agg() function. The agg() Function takes up the column name and ‘max’ keyword which returns the maximum value of that column
## Maximum value of the column in pyspark df_basket1.agg({'Price': 'max'}).show()
Maximum value of price column is calculated
Minimum value of the column in pyspark with example:
Minimum value of the column in pyspark is calculated using aggregate function – agg() function. The agg() Function takes up the column name and ‘min’ keyword which returns the minimum value of that column
## Minimum value of the column in pyspark df_basket1.agg({'Price': 'min'}).show()
Minimum value of price column is calculated
Maximum value of each group in pyspark with example:
Maximum value of each group in pyspark is calculated using aggregate function – agg() function along with groupby(). The agg() Function takes up the column name and ‘max’ keyword, groupby() takes up column name which returns the maximum value of each group in a column
#Maximum value of each group df_basket1.groupby('Item_group').agg({'Price': 'max'}).show()
Maximum price of each “Item_group” is calculated
Minimum value of each group in pyspark with example:
Minimum value of each group in pyspark is calculated using aggregate function – agg() function along with groupby(). The agg() Function takes up the column name and ‘min’ keyword, groupby() takes up column name which returns the minimum value of each group in a column
#Minimum value of each group df_basket1.groupby('Item_group').agg({'Price': 'min'}).show()
Minimum price of each “Item_group” is calculated
Other Related Topics:
- Drop rows in pyspark – drop rows with condition
- Distinct value of a column in pyspark
- Distinct value of dataframe in pyspark – drop duplicates
- Count of Missing (NaN,Na) and null values in Pyspark
- Mean, Variance and standard deviation of column in Pyspark
- Maximum or Minimum value of column in Pyspark
- Raised to power of column in pyspark – square, cube , square root and cube root in pyspark
- Drop column in pyspark – drop single & multiple columns
- Subset or Filter data with multiple conditions in pyspark
- Frequency table or cross table in pyspark – 2 way cross table
- Groupby functions in pyspark (Aggregate functions) – Groupby count, Groupby sum, Groupby mean, Groupby min and Groupby max
- Descriptive statistics or Summary Statistics of dataframe in pyspark
- Rearrange or reorder column in pyspark
- cumulative sum of column and group in pyspark
- Calculate Percentage and cumulative percentage of column in pyspark
- Select column in Pyspark (Select single & Multiple columns)
- Get data type of column in Pyspark (single & Multiple columns)