The Python Oracle

Convert (transpose) dataframe lists to columns

--------------------------------------------------
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Puzzle Game 3 Looping

--

Chapters
00:00 Convert (Transpose) Dataframe Lists To Columns
00:53 Answer 1 Score 3
01:08 Accepted Answer Score 2
01:39 Answer 3 Score 1
01:57 Thank you

--

Full question
https://stackoverflow.com/questions/5402...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #python3x #pandas

#avk47



ANSWER 1

Score 3


From sklearn MultiLabelBinarizer

from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
pd.DataFrame(mlb.fit_transform(df['province']),columns=mlb.classes_, index=df.agency_id).astype(bool)
Out[90]: 
             CH     CS     PE
agency_id                    
1          True  False   True
3          True   True  False



ACCEPTED ANSWER

Score 2


You can clean up / modify your data if you do not like to import from sklearn.preprocessing import MultiLabelBinarizer for this:

import pandas as pd

data = [
{"agency_id": 1,"province": ["CH", "PE"]},
{"agency_id": 3,"province": ["CH", "CS"]}
]

# get all provinces from any included dictionaries of data:
all_prov = sorted(set( (x for y in [d["province"] for d in data] for x in y) ))

# add the missing key:values to your data's dicts:
for d in data:
    for p in all_prov:
        d[p] = p in d["province"]

print(data)

df = pd.DataFrame(data)
print(df)

Output:

# data
[{'agency_id': 1, 'province': ['CH', 'PE'], 'CH': True, 'CS': False, 'PE': True}, 
 {'agency_id': 3, 'province': ['CH', 'CS'], 'CH': True, 'CS': True, 'PE': False}]

# df 
     CH     CS     PE  agency_id  province
0  True  False   True          1  [CH, PE]
1  True   True  False          3  [CH, CS] 



ANSWER 3

Score 1


Another solution, just using pandas:

import pandas as pd

data = [
{"agency_id": 1,"province": ["CH", "PE"]},
{"agency_id": 3,"province": ["CH", "CS"]}
]
df = pd.DataFrame(data)

result = df['province'].apply(lambda x: '|'.join(x)).str.get_dummies().astype(bool).set_index(df.agency_id)
print(result)

Output

             CH     CS     PE
agency_id                    
1          True  False   True
3          True   True  False