Condensing columns in a Pandas Dataframe
--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Puzzle Game 3
--
Chapters
00:00 Condensing Columns In A Pandas Dataframe
00:55 Accepted Answer Score 4
02:18 Answer 2 Score 0
02:47 Thank you
--
Full question
https://stackoverflow.com/questions/6808...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #list #dataframe #dictionary
#avk47
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Puzzle Game 3
--
Chapters
00:00 Condensing Columns In A Pandas Dataframe
00:55 Accepted Answer Score 4
02:18 Answer 2 Score 0
02:47 Thank you
--
Full question
https://stackoverflow.com/questions/6808...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #list #dataframe #dictionary
#avk47
ACCEPTED ANSWER
Score 4
One Way:
- Set
['event_id', 'target']asindex. splitandexpand columnsto create thehierarchical columns(We'll need this in the dot product).extractthelevel 1values from thehierarchical columnsand do thedot productwith thedataframe.- Do some
string manipulationsandrenamethecolumns(I've added 2 ways to rename the columns) to get the desiredoutput.
df1 = df.set_index(['event_id', 'target'])
df1.columns = df1.columns.str.split('.', expand=True, n=1)
df = (
df1.dot(df1.columns.get_level_values(1) + ',')
.str.strip(',')
.str.split(',', expand=True)
.rename(columns={0: 'Action', 1: 'Actor'})
.reset_index()
)
OUTPUT:
event_id target Action Actor
0 1 Microsoft malware internal.employee
1 2 Bank of America hacking external.competitor
NOTE:
Instead of renaming columns via rename you can also use level=0 column values like this -
df1 = df.set_index(['event_id', 'target'])
df1.columns = df1.columns.str.split('.', expand=True, n=1)
df = (
df1.dot(df1.columns.get_level_values(1) + ',')
.str.strip(',')
.str.split(',',expand=True)
)
df.columns = dict.fromkeys(df1.columns.get_level_values(0)).keys()
df = df.reset_index()
One more alternative is to use mul:
df1 = df.set_index(['event_id', 'target'])
df1.columns = df1.columns.str.split('.', expand=True, n=1)
df = (
df1.mul(df1.columns.get_level_values(1))
.replace('', np.NAN)
.droplevel(1, axis =1)
.stack()
.unstack()
)
NOTE:
In case you've a multiple values for actor and agent for 1 target you can use pivot_table instead of stack/unstack:
df1 = df.set_index(['event_id', 'target'])
df1.columns = df1.columns.str.split('.', expand=True, n=1)
(
df1.mul(df1.columns.get_level_values(1))
.replace('', np.NAN)
.droplevel(1, axis =1)
.stack()
.reset_index()
.pivot_table(index = ['event_id', 'target'], columns = 'level_2' , values = 0, aggfunc = ', '.join)
)
ANSWER 2
Score 0
You can flip the code from wide to long, filter for only rows where True is, make some modifications to get your output.
# flip from wide to long
(pd.wide_to_long(df,
stubnames=['action','actor'],
i=['event_id','target'],
j='subs',
sep='.',
suffix='.+')
.loc[lambda df: df.any(1)] # keeps only rows that have `True`
# if any row has `True`,
# replace it with values from `subs` index
.where(lambda df: df.isna(),
lambda df: df.index.get_level_values('subs'))
.fillna('')
.droplevel('subs') # served its purpose, let it go
# use the groupby with the agg to reduce to single rows
.groupby(['event_id', 'target'])
.agg("".join)
.reset_index()
)
:
event_id target action actor
0 1 Microsoft malware internal.employee
1 2 Bank of America hacking external.competitor