Convert (transpose) dataframe lists to columns
--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Book End
--
Chapters
00:00 Convert (Transpose) Dataframe Lists To Columns
01:03 Answer 1 Score 3
01:22 Accepted Answer Score 2
02:05 Answer 3 Score 1
02:28 Thank you
--
Full question
https://stackoverflow.com/questions/5402...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #python3x #pandas
#avk47
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Book End
--
Chapters
00:00 Convert (Transpose) Dataframe Lists To Columns
01:03 Answer 1 Score 3
01:22 Accepted Answer Score 2
02:05 Answer 3 Score 1
02:28 Thank you
--
Full question
https://stackoverflow.com/questions/5402...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #python3x #pandas
#avk47
ANSWER 1
Score 3
From sklearn MultiLabelBinarizer
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
pd.DataFrame(mlb.fit_transform(df['province']),columns=mlb.classes_, index=df.agency_id).astype(bool)
Out[90]:
CH CS PE
agency_id
1 True False True
3 True True False
ACCEPTED ANSWER
Score 2
You can clean up / modify your data if you do not like to import from sklearn.preprocessing import MultiLabelBinarizer for this:
import pandas as pd
data = [
{"agency_id": 1,"province": ["CH", "PE"]},
{"agency_id": 3,"province": ["CH", "CS"]}
]
# get all provinces from any included dictionaries of data:
all_prov = sorted(set( (x for y in [d["province"] for d in data] for x in y) ))
# add the missing key:values to your data's dicts:
for d in data:
for p in all_prov:
d[p] = p in d["province"]
print(data)
df = pd.DataFrame(data)
print(df)
Output:
# data
[{'agency_id': 1, 'province': ['CH', 'PE'], 'CH': True, 'CS': False, 'PE': True},
{'agency_id': 3, 'province': ['CH', 'CS'], 'CH': True, 'CS': True, 'PE': False}]
# df
CH CS PE agency_id province
0 True False True 1 [CH, PE]
1 True True False 3 [CH, CS]
ANSWER 3
Score 1
Another solution, just using pandas:
import pandas as pd
data = [
{"agency_id": 1,"province": ["CH", "PE"]},
{"agency_id": 3,"province": ["CH", "CS"]}
]
df = pd.DataFrame(data)
result = df['province'].apply(lambda x: '|'.join(x)).str.get_dummies().astype(bool).set_index(df.agency_id)
print(result)
Output
CH CS PE
agency_id
1 True False True
3 True True False