Quantile and Decile rank of a column in pandas python

Quantile and Decile rank of a column in pandas python is carried out using qcut() function with argument (labels=False) . qcut() function is used for dividing data into quantiles (such as deciles, quartiles, etc.).  we will see example of both quantile and decile rank

Let’s see how to ·

  • Get the Quantile rank of a column in pandas dataframe in python·
  • Get the Decile rank of a column in pandas dataframe in python

With an example for each .First let’s create a dataframe


import pandas as pd
import numpy as np

#Create a DataFrame
df1 = {
     'Name':['George','Andrea','micheal','maggie','Ravi','Xien','Jalpa'],
   'Mathematics_score':[62,47,55,74,32,77,86]}

df1 = pd.DataFrame(df1,columns=['Name','Mathematics_score'])
print(df1)

df1 will be

Quantile and Decile rank of a column in pandas python 1

 

Quantile rank of a column in a pandas dataframe python

In this case, you’ll divide the data into 4 equal parts, assigning each row a rank from 0 to 3, where 0 represents the lowest decile and 3 represents the highest.

Quantile rank of the column (Mathematics_score) is computed using qcut() function and with argument  (labels=False) and 4 ,  and stored in a new column namely “Quantile_rank”  as shown below


df1['Quantile_rank']=pd.qcut(df1['Mathematics_score'],4,labels=False)
print(df1)

Explanation:

  • pd.qcut(df1['Mathematics_score'], 4, labels=False): Divides the data in Mathematics_score column into 4 equal parts (quantiles) and assigns each row a quantile rank from 0 to 3.
  • labels=False makes sure the result is an integer representing the quantile rank instead of an interval range.

so the resultant dataframe will have quantile rank ranging from 0 to 3

 

Quantile and Decile rank of a column in pandas python 2

 

Decile rank of a column in a pandas dataframe python using qcut() function:

In this case, you’ll divide the data into 10 equal parts, assigning each row a rank from 0 to 9, where 0 represents the lowest decile and 9 represents the highest rank

Decile rank of the column (Mathematics_score) is computed using qcut() function and with argument  (labels=False) and 10 ,  and stored in a new column namely “Decile_rank”  as shown below


df1['Decile_rank']=pd.qcut(df1['Mathematics_score'],10,labels=False)
print(df1)

Explanation:

  • pd.qcut(df1['Mathematics_score'], 10, labels=False): Divides the data in Mathematics_score column into 10 equal parts (deciles) and assigns each row a decile rank from 0 to 9.
  • labels=False makes sure the result is an integer representing the decile rank instead of an interval range.

so the resultant dataframe will have decile rank ranging from 0 to 9

 

Quantile and Decile rank of a column in pandas python 3

 

 

 

Decile rank of a column in a pandas dataframe python using rank() function:

In this case, you’ll divide the data into 10 equal parts, assigning each row a rank from 1 to 10, where 1 represents the lowest decile and 10 represents the highest rank

Decile rank of the column (Mathematics_score) is computed using rank() function and with argument  (pct=True)  and stored in a new column namely “Decile_rank”  as shown below


df1['Decile_rank'] = (df1['Mathematics_score'].rank(pct=True) * 10).astype(int)
df1

Explanation:

  • rank(pct=True) gives a percentile rank for each value.
  • Multiplying by 10 gives values from 1 to 10.
  • astype(int) converts these to integer deciles (1 to 10).

so the resultant dataframe will have decile rank ranging from 1 to 10

Quantile and Decile rank of a column in pandas python 4

 

Quantile and Decile rank of a column in pandas python                                                                                                                Quantile and Decile rank of a column in pandas python

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.

    View all posts