Find entries that do not match between columns and iterate through columns
Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Switch On Looping
--
Chapters
00:00 Question
01:31 Accepted answer (Score 6)
02:18 Answer 2 (Score 4)
02:41 Answer 3 (Score 3)
03:03 Answer 4 (Score 3)
03:36 Thank you
--
Full question
https://stackoverflow.com/questions/6045...
Answer 1 links:
[wide_to_long]: https://pandas.pydata.org/pandas-docs/st...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #numpy
#avk47
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Switch On Looping
--
Chapters
00:00 Question
01:31 Accepted answer (Score 6)
02:18 Answer 2 (Score 4)
02:41 Answer 3 (Score 3)
03:03 Answer 4 (Score 3)
03:36 Thank you
--
Full question
https://stackoverflow.com/questions/6045...
Answer 1 links:
[wide_to_long]: https://pandas.pydata.org/pandas-docs/st...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #numpy
#avk47
ACCEPTED ANSWER
Score 6
With a bit more comprehensive regex:
from itertools import groupby
import re
for k, cols in groupby(sorted(df.columns), lambda x: x[:-2] if re.match(".+_(1|2)$", x) else None):
cols=list(cols)
if(len(cols)==2 and k):
df[f"{k}_check"]=df[cols[0]].eq(df[cols[1]])
It will pair together only columns which name ends up with _1 and _2 regardless what you have before in their names, calculating _check only if there are 2- _1 and _2 (assuming you don't have 2 columns with the same name).
For the sample data:
A_1 A_2 B_1 B_2 A_check B_check
0 charlie charlie beta cappa True False
1 charlie charlie beta delta True False
2 charlie charlie beta beta True True
ANSWER 2
Score 4
You can use wide_to_long if you know the first part of the column names, i.e. A,B...:
(pd.wide_to_long(df.reset_index(), ['A','B'], 'index','part',sep='_')
.groupby('index').nunique().eq(1)
.add_suffix('_check')
)
Output:
A_check B_check
index
0 True False
1 True False
2 True True
ANSWER 3
Score 3
Another way is to use dataframe reshaping using pd.MultiIndexes:
df = pd.DataFrame([['charlie', 'charlie', 'beta', 'cappa'],
['charlie', 'charlie', 'beta', 'delta'],
['charlie', 'charlie', 'beta', 'beta']],
columns=['A_1', 'A_2','B_1','B_2'])
df.columns = df.columns.str.split('_', expand=True) #Creates MultiIndex column header
dfs = df.stack(0) #move the 'A' and 'B' and any others to rows
df_out = (dfs == dfs.shift(-1, axis=1))['1'].unstack() #Compare column 1 to column 2 and move 'A's and 'B's back to columns.
print(df_out)
Output:
A B
0 True False
1 True False
2 True True
ANSWER 4
Score 3
You may split the columns and groupby along axis=1 on the series of first value of the split result and call agg to compare
i_cols = df.columns.str.split('_')
df_check = (df.groupby(i_cols.str[0], axis=1).agg(lambda x: x.iloc[:,0] == x.iloc[:,-1])
.add_suffix('_check'))
In [69]: df_check
Out[69]:
A_check B_check
0 True False
1 True False
2 True True