Groupby mean in R can be accomplished by aggregate() or group_by() function of dplyr package. Groupby mean of multiple column and single column in R is accomplished by multiple ways some among them are group_by() function of dplyr package in R and aggregate() function in R. Let’s see how to
- Groupby mean of single column in R
- Groupby mean of multiple columns in R
- Groupby mean using aggregate() function
- Groupby mean using group_by() function.
Groupby mean and its functionality has been pictographically represented as shown below
First let’s create a dataframe
df1= data.frame(Name=c('James','Paul','Richards','Marico','Samantha','Ravi','Raghu','Richards','George','Ema','Samantha','Catherine'), State=c('Alaska','California','Texas','North Carolina','California','Texas','Alaska','Texas','North Carolina','Alaska','California','Texas'), Sales=c(14,24,31,12,13,7,9,31,18,16,18,14)) df1
df1 will be
Groupby using aggregate() syntax:
X | an R object, mostly a dataframe. |
by | a list of grouping elements, by which the subsets are grouped by |
FUN | a function to compute the summary statistics |
simplify | a logical indicating whether results should be simplified to a vector or matrix if possible |
drop | a logical indicating whether to drop unused combinations of grouping values. |
Groupby mean of single column in R:
Method 1:
Aggregate function along with parameter by – by which it is to be grouped and function mean is mentioned as shown below
# Groupby mean of single column aggregate(df1$Sales, by=list(df1$State), FUN=mean)
so the grouped dataframe will be
Method 2: groupby using dplyr
group_by() function takes “state” column as argument summarise() uses mean() function to find mean of sales.
library(dplyr) df1 %>% group_by(State) %>% summarise(Mean_sales = mean(Sales))
so the grouped dataframe will be
Groupby mean of multiple column in R:
Method 1:
Aggregate function which is grouped by “State” and “Name”, along with function mean is mentioned as shown below
# Groupby mean of multiple columns aggregate(df1$Sales, by=list(df1$State,df1$Name), FUN=mean)
so the grouped dataframe will be
Method 2: groupby using dplyr
group_by() function takes “State” and “Name” column as argument and groups by these two columns and summarise() uses mean() function to find mean of a sales.
library(dplyr) df1 %>% group_by(State,Name) %>% summarise(Mean_sales = mean(Sales))
so the grouped dataframe by “State” and “Name” column with aggregated mean of sales will be
For further understanding of group_by() function in R using dplyr one can refer the dplyr documentation.