Groupby sum in R

Groupby sum in R can be accomplished by aggregate() or group_by() function of dplyr package. Groupby sum of multiple column and single column in R is accomplished by multiple ways some among them are group_by() function of dplyr package in R and aggregate() function in R. Let’s see how to

Groupby sum of single column in R
Groupby sum of multiple columns
Groupby sum using aggregate() function
Groupby sum using group_by() function.

Groupby sum and its functionality has been pictographically represented as shown below

Generic Groupby sum 1

First let’s create a dataframe


df1= data.frame(Name=c('James','Paul','Richards','Marico','Samantha','Ravi','Raghu','Richards','George','Ema','Samantha','Catherine'),
    State=c('Alaska','California','Texas','North Carolina','California','Texas','Alaska','Texas','North Carolina','Alaska','California','Texas'),
    Sales=c(14,24,31,12,13,7,9,31,18,16,18,14))
df1

df1 will be

groupby sum in R 1

Groupby using aggregate() syntax:

aggregate(x, by, FUN, …, simplify = TRUE, drop = TRUE)

X	an R object, mostly a dataframe
by	a list of grouping elements, by which the subsets are grouped by
FUN	a function to compute the summary statistics
simplify	a logical indicating whether results should be simplified to a vector or matrix if possible
drop	a logical indicating whether to drop unused combinations of grouping values.

Groupby sum of single column in R

Method 1 : using Aggregate ()

Aggregate function along with parameter by – by which it is to be grouped and function sum is mentioned as shown below

# Groupby sum of single column

aggregate(df1$Sales, by=list(df1$State), FUN=sum)

so the grouped dataframe will be

groupby sum in R 2

Method 2: groupby using dplyr

group_by() function takes “state” column as argument summarise() uses sum() function to find sum of sales.


library(dplyr)
df1 %>% group_by(State) %>% summarise(sum_sales = sum(Sales))

so the grouped dataframe with sum of sales calculated will be

groupby sum in R 2b

Groupby sum of multiple column in R:

Method 1:

Aggregate function which is grouped by state and name, along with function sum is mentioned as shown below

# Groupby sum of multiple columns
aggregate(df1$Sales, by=list(df1$State,df1$Name), FUN=sum)

so the grouped dataframe will be

groupby sum in R 3

Method 2: groupby using dplyr

group_by() function takes “State” and “Name” column as argument and groups by these two columns and summarise() uses sum() function to find sum of a sales.

library(dplyr)
df1 %>% group_by(State,Name) %>% summarise(sum_sales = sum(Sales))

so the grouped dataframe by “State” and “Name” column with aggregated sum of sales will be

groupby sum in R 3b

For further understanding of group_by() function in R using dplyr one can refer the dplyr documentation.

Author

Sridhar Venkatachalam

With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.
View all posts