Groupby mean in R

Groupby mean in R can be accomplished by aggregate() or group_by() function of dplyr package. Groupby mean of multiple column and single column in R is accomplished by multiple ways some among them are group_by() function of dplyr package in R and aggregate() function in R. Let’s see how to

Groupby mean of single column in R
Groupby mean of multiple columns in R
Groupby mean using aggregate() function
Groupby mean using group_by() function.

Groupby mean and its functionality has been pictographically represented as shown below

Generic Groupby mean 1

First let’s create a dataframe

df1= data.frame(Name=c('James','Paul','Richards','Marico','Samantha','Ravi','Raghu','Richards','George','Ema','Samantha','Catherine'),
    State=c('Alaska','California','Texas','North Carolina','California','Texas','Alaska','Texas','North Carolina','Alaska','California','Texas'),
    Sales=c(14,24,31,12,13,7,9,31,18,16,18,14))
df1

df1 will be

groupby mean in R 1

Groupby using aggregate() syntax:

aggregate(x, by, FUN, …, simplify = TRUE, drop = TRUE)

X	an R object, mostly a dataframe.
by	a list of grouping elements, by which the subsets are grouped by
FUN	a function to compute the summary statistics
simplify	a logical indicating whether results should be simplified to a vector or matrix if possible
drop	a logical indicating whether to drop unused combinations of grouping values.

Groupby mean of single column in R:

Method 1:

Aggregate function along with parameter by – by which it is to be grouped and function mean is mentioned as shown below

# Groupby mean of single column

aggregate(df1$Sales, by=list(df1$State), FUN=mean)

so the grouped dataframe will be

groupby mean in R 2a

Method 2: groupby using dplyr

group_by() function takes “state” column as argument summarise() uses mean() function to find mean of sales.

library(dplyr)
df1 %>% group_by(State) %>% summarise(Mean_sales = mean(Sales))

so the grouped dataframe will be

groupby mean in R 2b

Groupby mean of multiple column in R:

Method 1:

Aggregate function which is grouped by “State” and “Name”, along with function mean is mentioned as shown below

# Groupby mean of multiple columns

aggregate(df1$Sales, by=list(df1$State,df1$Name), FUN=mean)

so the grouped dataframe will be

groupby mean in R 3a

Method 2: groupby using dplyr

group_by() function takes “State” and “Name” column as argument and groups by these two columns and summarise() uses mean() function to find mean of a sales.

library(dplyr)
df1 %>% group_by(State,Name) %>% summarise(Mean_sales = mean(Sales))

so the grouped dataframe by “State” and “Name” column with aggregated mean of sales will be

groupby mean in R 3b

For further understanding of group_by() function in R using dplyr one can refer the dplyr documentation.

Author

Sridhar Venkatachalam

With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.
View all posts