compare multiple column value together using pandas
Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Magic Ocean Looping
--
Chapters
00:00 Question
01:16 Accepted answer (Score 8)
01:37 Answer 2 (Score 3)
02:12 Answer 3 (Score 2)
02:41 Thank you
--
Full question
https://stackoverflow.com/questions/5588...
Question links:
[below]: https://stackoverflow.com/questions/5579...
[image]: https://i.stack.imgur.com/0hIab.png
[image]: https://i.stack.imgur.com/BXmlw.png
Answer 1 links:
[docs]: https://pandas.pydata.org/pandas-docs/st...
Answer 2 links:
[DataFrame.merge]: https://pandas.pydata.org/pandas-docs/st...
[pandas.concat]: https://pandas.pydata.org/pandas-docs/st...
[duplicated]: https://pandas.pydata.org/pandas-docs/st...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #excel #pandas #dataframe
#avk47
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Magic Ocean Looping
--
Chapters
00:00 Question
01:16 Accepted answer (Score 8)
01:37 Answer 2 (Score 3)
02:12 Answer 3 (Score 2)
02:41 Thank you
--
Full question
https://stackoverflow.com/questions/5588...
Question links:
[below]: https://stackoverflow.com/questions/5579...
[image]: https://i.stack.imgur.com/0hIab.png
[image]: https://i.stack.imgur.com/BXmlw.png
Answer 1 links:
[docs]: https://pandas.pydata.org/pandas-docs/st...
Answer 2 links:
[DataFrame.merge]: https://pandas.pydata.org/pandas-docs/st...
[pandas.concat]: https://pandas.pydata.org/pandas-docs/st...
[duplicated]: https://pandas.pydata.org/pandas-docs/st...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #excel #pandas #dataframe
#avk47
ACCEPTED ANSWER
Score 9
You were going in the right direction, just use:
a_key = df['a_id'].astype(str) + df['a_region'] + df['a_ip'].astype(str)
b_key = df['b_id'].astype(str) + df['b_region'] + df['b_ip'].astype(str)
a_key.isin(b_key)
Mine is giving below results:
0 True
1 False
2 False
ANSWER 2
Score 3
You can use isin with DataFrame as value, but as per the docs:
If values is a DataFrame, then both the index and column labels must match
So this should work:
# Removing the prefixes from column names
df_a = df[['a_id', 'a_region', 'a_ip']].rename(columns=lambda x: x[2:])
df_b = df[['b_id', 'b_region', 'b_ip']].rename(columns=lambda x: x[2:])
# Find rows where all values are in the other
matched = df_a.isin(df_b).all(axis=1)
# Get actual rows with boolean indexing
df_a.loc[matched]
# ... or add boolean flag to dataframe
df['flag'] = matched
ANSWER 3
Score 2
Here's one approach using DataFrame.merge, pandas.concat and testing for duplicated values:
df_merged = df.merge(df,
left_on=['a_id', 'a_region', 'a_ip'],
right_on=['b_id', 'b_region', 'b_ip'],
suffixes=('', '_y'))
df['flag'] = pd.concat([df, df_merged[df.columns]]).duplicated(keep=False)[:len(df)].values
[out]
a_id a_region a_ip b_id b_region b_ip flag
0 2 a 10 3222222 sssss 22222 True
1 22222 bcccc 10000 43333 ddddd 11111 False
2 33333 acccc 120000 2 a 10 False