Groupby function in R using Dplyr - group_by

Groupby Function in R – group_by is used to group the dataframe in R. Dplyr package in R is provided with group_by() function which groups the dataframe by multiple columns with mean, sum and other functions like count, maximum and minimum.

dplyr group by can be done by using pipe operator (%>%) or by using aggregate() function or by summarise_at() Example of each is shown below.
Special weightage on dplyr pipe operator (%>%) is given in this section with all the groupby functions like groupby minimum & maximum, groupby count & mean, groupby sum is depicted with an example of each.
Groupby mean in R using dplyr pipe operator.
Groupby count in R using dplyr pipe operator.
Groupby minimum and Groupby maximum in R using dplyr pipe operator.
Groupby sum in R using dplyr pipe operator.

Pictographical example of a groupby sum in Dplyr

Generic Groupby sum 1

Groupby function in R with dplyr using summarize_at() function:

We will be using iris data to depict the example of group_by() function

library(dplyr)
mydata2 <-iris 

# Groupby function for dataframe in R
summarise_at(group_by(mydata2,Species),vars(Sepal.Length),funs(mean(.,na.rm=TRUE)))

Mean of Sepal.Length is grouped by Species variable.

Group by function in R using dplyr 1

Groupby function in R with dplyr pipe operator %>%:

library(dplyr)
mydata2 = iris 

# Group by function for dataframe in R using pipe operator 
mydata2 %>% group_by(Species) %>% summarise_at(vars(Sepal.Length),funs(sum(.,na.rm=TRUE)))

Sum of Sepal.Length is grouped by Species variable with the help of pipe operator (%>%) in dplyr package. As the result we will getting the sum of all the Sepal.Lengths of each species

So the output will be

Group by function in R using dplyr 3

Groupby in R without dplyr using aggregate function:

In this example we will be using aggregate function in R to do group by operation as shown below

mydata2 <-iris 

# Group by in R using aggregate function

aggregate(mydata2$Sepal.Length, by=list(Species=mydata2$Species), FUN=sum)

Sum of Sepal.Length is grouped by Species variable with the help of aggregate function in R

Group by function in R using dplyr 2

More Emphasis on Pipe Operator (%>%):

Groupby mean in R with dplyr pipe operator %>%:

library(dplyr)
mydata2 = iris 

# Group by function for dataframe in R using pipe operator 
mydata2 %>% group_by(Species) %>% summarise_at(vars(Sepal.Length),funs(mean(.,na.rm=TRUE)))

mean of Sepal.Length is grouped by Species variable with the help of pipe operator (%>%) in dplyr package. As the result we will getting the mean Sepal.Length of each species

So the output will be

Groupby function in R using Dplyr - group_by 12

Groupby count in R with dplyr pipe operator %>%:

library(dplyr)
mydata2 = iris 

# Group by function for dataframe in R using pipe operator 
mydata2 %>% group_by(Species) %>% summarise_at(vars(Sepal.Length),funs(length))

count of Sepal.Length column is grouped by Species variable with the help of pipe operator (%>%) in dplyr package. As the result we will getting the count of observations of Sepal.Length for each species

So the output will be

Groupby function in R using Dplyr - group_by 13

Groupby max in R with dplyr pipe operator %>%:

library(dplyr)
mydata2 = iris 

# Group by function for dataframe in R using pipe operator 
mydata2 %>% group_by(Species) %>% summarise_at(vars(Sepal.Length),funs(max(.,na.rm=TRUE)))

max of Sepal.Length column is grouped by Species variable with the help of pipe operator (%>%) in dplyr package. As the result we will getting the max value of Sepal.Length variable for each species

So the output will be

Groupby function in R using Dplyr - group_by 14

Groupby min in R with dplyr pipe operator %>%:

library(dplyr)
mydata2 = iris 

# Group by function for dataframe in R using pipe operator 
mydata2 %>% group_by(Species) %>% summarise_at(vars(Sepal.Length),funs(min(.,na.rm=TRUE)))

min of Sepal.Length column is grouped by Species variable with the help of pipe operator (%>%) in dplyr package. As the result we will getting the min value of Sepal.Length variable for each species

So the output will be

Groupby function in R using Dplyr - group_by 15

For further understanding of group_by() function in R using dplyr one can refer the dplyr documentation

Author

Sridhar Venkatachalam

With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.
View all posts