Split a Pandas column of lists into multiple columns
--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Flying Over Ancient Lands
--
Chapters
00:00 Split A Pandas Column Of Lists Into Multiple Columns
00:26 Accepted Answer Score 539
01:12 Answer 2 Score 23
01:29 Answer 3 Score 123
01:54 Answer 4 Score 78
02:10 Thank you
--
Full question
https://stackoverflow.com/questions/3549...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #list #dataframe #split
#avk47
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Flying Over Ancient Lands
--
Chapters
00:00 Split A Pandas Column Of Lists Into Multiple Columns
00:26 Accepted Answer Score 539
01:12 Answer 2 Score 23
01:29 Answer 3 Score 123
01:54 Answer 4 Score 78
02:10 Thank you
--
Full question
https://stackoverflow.com/questions/3549...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #list #dataframe #split
#avk47
ACCEPTED ANSWER
Score 539
You can use the DataFrame constructor with lists created by to_list:
import pandas as pd
d1 = {'teams': [['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG'],
['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG']]}
df2 = pd.DataFrame(d1)
print (df2)
teams
0 [SF, NYG]
1 [SF, NYG]
2 [SF, NYG]
3 [SF, NYG]
4 [SF, NYG]
5 [SF, NYG]
6 [SF, NYG]
df2[['team1','team2']] = pd.DataFrame(df2.teams.tolist(), index= df2.index)
print (df2)
teams team1 team2
0 [SF, NYG] SF NYG
1 [SF, NYG] SF NYG
2 [SF, NYG] SF NYG
3 [SF, NYG] SF NYG
4 [SF, NYG] SF NYG
5 [SF, NYG] SF NYG
6 [SF, NYG] SF NYG
And for a new DataFrame:
df3 = pd.DataFrame(df2['teams'].to_list(), columns=['team1','team2'])
print (df3)
team1 team2
0 SF NYG
1 SF NYG
2 SF NYG
3 SF NYG
4 SF NYG
5 SF NYG
6 SF NYG
A solution with apply(pd.Series) is very slow:
#7k rows
df2 = pd.concat([df2]*1000).reset_index(drop=True)
In [121]: %timeit df2['teams'].apply(pd.Series)
1.79 s ± 52.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [122]: %timeit pd.DataFrame(df2['teams'].to_list(), columns=['team1','team2'])
1.63 ms ± 54.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
ANSWER 2
Score 123
Much simpler solution:
pd.DataFrame(df2["teams"].to_list(), columns=['team1', 'team2'])
Yields,
team1 team2
-------------
0 SF NYG
1 SF NYG
2 SF NYG
3 SF NYG
4 SF NYG
5 SF NYG
6 SF NYG
7 SF NYG
If you wanted to split a column of delimited strings rather than lists, you could similarly do:
pd.DataFrame(df["teams"].str.split('<delim>', expand=True).values,
columns=['team1', 'team2'])
ANSWER 3
Score 78
This solution preserves the index of the df2 DataFrame, unlike any solution that uses tolist():
df3 = df2.teams.apply(pd.Series)
df3.columns = ['team1', 'team2']
Here's the result:
team1 team2
0 SF NYG
1 SF NYG
2 SF NYG
3 SF NYG
4 SF NYG
5 SF NYG
6 SF NYG
ANSWER 4
Score 23
There seems to be a syntactically simpler way, and therefore easier to remember, as opposed to the proposed solutions. I'm assuming that the column is called 'meta' in a dataframe df:
df2 = pd.DataFrame(df['meta'].str.split().values.tolist())