Pandas: Pivot to True/False, drop column
--------------------------------------------------
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------
Take control of your privacy with Proton's trusted, Swiss-based, secure services.
Choose what you need and safeguard your digital life:
Mail: https://go.getproton.me/SH1CU
VPN: https://go.getproton.me/SH1DI
Password Manager: https://go.getproton.me/SH1DJ
Drive: https://go.getproton.me/SH1CT
Music by Eric Matyas
https://www.soundimage.org
Track title: Puzzle Island
--
Chapters
00:00 Pandas: Pivot To True/False, Drop Column
00:45 Accepted Answer Score 11
01:47 Answer 2 Score 0
02:23 Answer 3 Score 0
02:37 Thank you
--
Full question
https://stackoverflow.com/questions/4437...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #pivot
#avk47
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------
Take control of your privacy with Proton's trusted, Swiss-based, secure services.
Choose what you need and safeguard your digital life:
Mail: https://go.getproton.me/SH1CU
VPN: https://go.getproton.me/SH1DI
Password Manager: https://go.getproton.me/SH1DJ
Drive: https://go.getproton.me/SH1CT
Music by Eric Matyas
https://www.soundimage.org
Track title: Puzzle Island
--
Chapters
00:00 Pandas: Pivot To True/False, Drop Column
00:45 Accepted Answer Score 11
01:47 Answer 2 Score 0
02:23 Answer 3 Score 0
02:37 Thank you
--
Full question
https://stackoverflow.com/questions/4437...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #pivot
#avk47
ACCEPTED ANSWER
Score 11
Option 1
df.groupby(['company', 'partner']).size().unstack(fill_value=0).astype(bool)
partner x y
company
a True False
b True True
c False True
Get rid of names on columns object
df.groupby(['company', 'partner']).size().unstack(fill_value=0).astype(bool) \
.rename_axis(None, 1).reset_index()
company x y
0 a True False
1 b True True
2 c False True
Option 2
pd.crosstab(df.company, df.partner).astype(bool)
partner x y
company
a True False
b True True
c False True
pd.crosstab(df.company, df.partner).astype(bool) \
.rename_axis(None, 1).reset_index()
company x y
0 a True False
1 b True True
2 c False True
Option 3
f1, u1 = pd.factorize(df.company.values)
f2, u2 = pd.factorize(df.partner.values)
n, m = u1.size, u2.size
b = np.bincount(f1 * m + f2)
pad = np.zeros(n * m - b.size, dtype=int)
b = np.append(b, pad)
v = b.reshape(n, m).astype(bool)
pd.DataFrame(np.column_stack([u1, v]), columns=np.append('company', u2))
company x y
0 a True False
1 b True True
2 c False True
Timing
small data
%timeit df.groupby(['company', 'partner']).size().unstack(fill_value=0).astype(bool).rename_axis(None, 1).reset_index()
%timeit pd.crosstab(df.company, df.partner).astype(bool).rename_axis(None, 1).reset_index()
%%timeit
f1, u1 = pd.factorize(df.company.values)
f2, u2 = pd.factorize(df.partner.values)
n, m = u1.size, u2.size
b = np.bincount(f1 * m + f2)
pad = np.zeros(n * m - b.size, dtype=int)
b = np.append(b, pad)
v = b.reshape(n, m).astype(bool)
pd.DataFrame(np.column_stack([u1, v]), columns=np.append('company', u2))
1000 loops, best of 3: 1.67 ms per loop
100 loops, best of 3: 5.97 ms per loop
1000 loops, best of 3: 301 µs per loop
ANSWER 2
Score 0
Another option:
df = df.pivot(values='partner', columns='partner', index='company').reset_index()
to
df = df.pivot(values='partner', columns='partner', index='company').notna()
Still, I like lukeA's answer in the comments even better:
df.assign(val=True).pivot_table(values='val', index='company', columns='partner', fill_value=False)
ANSWER 3
Score 0
use the aggfunc any which return true for any non nan value
df = pd.DataFrame({'company':['a','b','c','b'], 'partner':['x','x','y','y'], 'str':['just','some','random','words']})
fp=df.pivot_table(index=['company'],columns=['partner'],aggfunc=any).fillna(False)
print(fp.head())
output
str
partner x y
company
a True False
b True True
c False True