The Python Oracle

compare multiple column value together using pandas

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Lost Meadow

--

Chapters
00:00 Compare Multiple Column Value Together Using Pandas
00:50 Answer 1 Score 3
01:12 Answer 2 Score 2
01:36 Accepted Answer Score 9
01:50 Thank you

--

Full question
https://stackoverflow.com/questions/5588...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #excel #pandas #dataframe

#avk47



ACCEPTED ANSWER

Score 9


You were going in the right direction, just use:

a_key = df['a_id'].astype(str) + df['a_region'] + df['a_ip'].astype(str)
b_key = df['b_id'].astype(str) + df['b_region'] + df['b_ip'].astype(str)

a_key.isin(b_key)

Mine is giving below results:

0     True
1    False
2    False



ANSWER 2

Score 3


You can use isin with DataFrame as value, but as per the docs:

If values is a DataFrame, then both the index and column labels must match

So this should work:

# Removing the prefixes from column names
df_a = df[['a_id', 'a_region', 'a_ip']].rename(columns=lambda x: x[2:])
df_b = df[['b_id', 'b_region', 'b_ip']].rename(columns=lambda x: x[2:])

# Find rows where all values are in the other
matched = df_a.isin(df_b).all(axis=1)

# Get actual rows with boolean indexing
df_a.loc[matched]

# ... or add boolean flag to dataframe
df['flag'] = matched



ANSWER 3

Score 2


Here's one approach using DataFrame.merge, pandas.concat and testing for duplicated values:

df_merged = df.merge(df,
                     left_on=['a_id', 'a_region', 'a_ip'],
                     right_on=['b_id', 'b_region', 'b_ip'],
                     suffixes=('', '_y'))

df['flag'] = pd.concat([df, df_merged[df.columns]]).duplicated(keep=False)[:len(df)].values

[out]

    a_id a_region    a_ip     b_id b_region   b_ip   flag
0      2        a      10  3222222    sssss  22222   True
1  22222    bcccc   10000    43333    ddddd  11111  False
2  33333    acccc  120000        2        a     10  False