groupby function in pandas python: In this section we will learn how to groupby in python pandas and perform aggregate functions. we will be finding the mean of a group in pandas, sum of a group in pandas python and count of a group.
We will be working on
- getting mean score of a group using groupby function in python
- getting sum of score of a group using groupby function in python
- descriptive statistics of a group using pandas groupby function
Create dataframe :
import pandas as pd
import numpy as np
#Create a DataFrame
d = {
'Name':['Alisa','Bobby','Cathrine','Alisa','Bobby','Cathrine',
'Alisa','Bobby','Cathrine','Alisa','Bobby','Cathrine'],
'Exam':['Semester 1','Semester 1','Semester 1','Semester 1','Semester 1','Semester 1',
'Semester 2','Semester 2','Semester 2','Semester 2','Semester 2','Semester 2'],
'Subject':['Mathematics','Mathematics','Mathematics','Science','Science','Science',
'Mathematics','Mathematics','Mathematics','Science','Science','Science'],
'Score':[62,47,55,74,31,77,85,63,42,67,89,81]}
df = pd.DataFrame(d,columns=['Name','Exam','Subject','Score'])
print df
so the resultant dataframe will be

Get mean score of a group using groupby function in pandas:
Now lets group by name of the student and find the average score of students in the following code
# mean score of Students df['Score'].groupby([df['Name']]).mean()
result will be

Get sum of score of a group using groupby function in pandas:
Now lets group by name of the student and Exam and find the sum of score of students across the groups
# sum of score group by Name and Exam df['Score'].groupby([df['Name'],df['Exam']]).sum()
so the result will be

Group the entire dataframe by Subject and Exam:
Now lets group the entire dataframe by subject and exam and then find the sum of score of students
# group the entire dataframe by Subject and Exam df.groupby(['Subject', 'Exam']).sum()
so the result will be

Descriptive statistics of the group :
Now lets group by subject and find the descriptive statistics of that group as shown below
# descriptive statistics by group - subject df['Score'].groupby(df['Subject']).describe()
so the result will be






