Set difference of two dataframe in Pandas python

Set difference of two dataframe in pandas is carried out in roundabout way using drop_duplicates and concat function. It will become clear when we explain it with an example.

Set difference of two dataframe in pandas Python:

Set difference of two dataframes in pandas can be achieved in roundabout way using drop_duplicates and concat function. Let’s see with an example. First let’s create two data frames.

import pandas as pd
import numpy as np

#Create a DataFrame
df1 = {
    'Subject':['semester1','semester2','semester3','semester4','semester1',
               'semester2','semester3'],
   'Score':[62,47,55,74,31,77,85]}

df2 = {
    'Subject':['semester1','semester2','semester3','semester4'],
   'Score':[90,47,85,12]}


df1 = pd.DataFrame(df1,columns=['Subject','Score'])
df2 = pd.DataFrame(df2,columns=['Subject','Score'])

print(df1)
print(df2)

df1 will be

df2 will be

Set Difference of two dataframes in pandas python:

concat() function along with drop duplicates in pandas can be used to create the set difference of two dataframe as shown below.

Set difference of df2 over df1, something like df2.set_diff(df1) is shown below


set_diff_df = pd.concat([df2, df1, df1]).drop_duplicates(keep=False)
print(set_diff_df)

so the set differenced dataframe will be (data in df2 but not in df1)

Author

Sridhar Venkatachalam

With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.

View all posts

Set difference of two dataframe in Pandas python

Set difference of two dataframe in pandas Python:

Set Difference of two dataframes in pandas python:

Author

Related Posts:

.