Set difference of two dataframe in pandas is carried out in roundabout way using drop_duplicates and concat function. It will become clear when we explain it with an example.
Set difference of two dataframe in pandas Python:
Set difference of two dataframes in pandas can be achieved in roundabout way using drop_duplicates and concat function. Let’s see with an example. First let’s create two data frames.
import pandas as pd
import numpy as np
#Create a DataFrame
df1 = {
'Subject':['semester1','semester2','semester3','semester4','semester1',
'semester2','semester3'],
'Score':[62,47,55,74,31,77,85]}
df2 = {
'Subject':['semester1','semester2','semester3','semester4'],
'Score':[90,47,85,12]}
df1 = pd.DataFrame(df1,columns=['Subject','Score'])
df2 = pd.DataFrame(df2,columns=['Subject','Score'])
print(df1)
print(df2)
df1 will be

df2 will be

Set Difference of two dataframes in pandas python:
concat() function along with drop duplicates in pandas can be used to create the set difference of two dataframe as shown below.
Set difference of df2 over df1, something like df2.set_diff(df1) is shown below
set_diff_df = pd.concat([df2, df1, df1]).drop_duplicates(keep=False) print(set_diff_df)
so the set differenced dataframe will be (data in df2 but not in df1)






