Split a Pandas column of lists into multiple columns

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Future Grid Looping

--

Chapters
00:00 Question
00:37 Accepted answer (Score 473)
01:38 Answer 2 (Score 108)
02:12 Answer 3 (Score 66)
02:34 Answer 4 (Score 22)
02:58 Thank you

--

Full question
https://stackoverflow.com/questions/3549...

Accepted answer links:
[to_list]: https://pandas.pydata.org/pandas-docs/st...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas #list #dataframe #split

#avk47

ACCEPTED ANSWER

Score 539

You can use the DataFrame constructor with lists created by to_list:

import pandas as pd

d1 = {'teams': [['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG'],
                ['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG']]}
df2 = pd.DataFrame(d1)
print (df2)
       teams
0  [SF, NYG]
1  [SF, NYG]
2  [SF, NYG]
3  [SF, NYG]
4  [SF, NYG]
5  [SF, NYG]
6  [SF, NYG]

df2[['team1','team2']] = pd.DataFrame(df2.teams.tolist(), index= df2.index)
print (df2)
       teams team1 team2
0  [SF, NYG]    SF   NYG
1  [SF, NYG]    SF   NYG
2  [SF, NYG]    SF   NYG
3  [SF, NYG]    SF   NYG
4  [SF, NYG]    SF   NYG
5  [SF, NYG]    SF   NYG
6  [SF, NYG]    SF   NYG

And for a new DataFrame:

df3 = pd.DataFrame(df2['teams'].to_list(), columns=['team1','team2'])
print (df3)
  team1 team2
0    SF   NYG
1    SF   NYG
2    SF   NYG
3    SF   NYG
4    SF   NYG
5    SF   NYG
6    SF   NYG

A solution with apply(pd.Series) is very slow:

#7k rows
df2 = pd.concat([df2]*1000).reset_index(drop=True)

In [121]: %timeit df2['teams'].apply(pd.Series)
1.79 s ± 52.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [122]: %timeit pd.DataFrame(df2['teams'].to_list(), columns=['team1','team2'])
1.63 ms ± 54.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

ANSWER 2

Score 123

Much simpler solution:

pd.DataFrame(df2["teams"].to_list(), columns=['team1', 'team2'])

Yields,

  team1 team2
-------------
0    SF   NYG
1    SF   NYG
2    SF   NYG
3    SF   NYG
4    SF   NYG
5    SF   NYG
6    SF   NYG
7    SF   NYG

If you wanted to split a column of delimited strings rather than lists, you could similarly do:

pd.DataFrame(df["teams"].str.split('<delim>', expand=True).values,
             columns=['team1', 'team2'])

ANSWER 3

Score 78

This solution preserves the index of the df2 DataFrame, unlike any solution that uses tolist():

df3 = df2.teams.apply(pd.Series)
df3.columns = ['team1', 'team2']

Here's the result:

  team1 team2
0    SF   NYG
1    SF   NYG
2    SF   NYG
3    SF   NYG
4    SF   NYG
5    SF   NYG
6    SF   NYG

ANSWER 4

Score 23

There seems to be a syntactically simpler way, and therefore easier to remember, as opposed to the proposed solutions. I'm assuming that the column is called 'meta' in a dataframe df:

df2 = pd.DataFrame(df['meta'].str.split().values.tolist())