The Python Oracle

Pandas: Pivot to True/False, drop column

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Droplet of life

--

Chapters
00:00 Question
01:10 Accepted answer (Score 10)
02:39 Answer 2 (Score 0)
03:01 Answer 3 (Score 0)
03:48 Thank you

--

Full question
https://stackoverflow.com/questions/4437...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas #pivot

#avk47



ACCEPTED ANSWER

Score 11


Option 1

df.groupby(['company', 'partner']).size().unstack(fill_value=0).astype(bool)


partner      x      y
company              
a         True  False
b         True   True
c        False   True

Get rid of names on columns object

df.groupby(['company', 'partner']).size().unstack(fill_value=0).astype(bool) \
    .rename_axis(None, 1).reset_index()

  company      x      y
0       a   True  False
1       b   True   True
2       c  False   True

Option 2

pd.crosstab(df.company, df.partner).astype(bool)

partner      x      y
company              
a         True  False
b         True   True
c        False   True


pd.crosstab(df.company, df.partner).astype(bool) \
    .rename_axis(None, 1).reset_index()

  company      x      y
0       a   True  False
1       b   True   True
2       c  False   True

Option 3

f1, u1 = pd.factorize(df.company.values)
f2, u2 = pd.factorize(df.partner.values)
n, m = u1.size, u2.size

b = np.bincount(f1 * m + f2)
pad = np.zeros(n * m - b.size, dtype=int)
b = np.append(b, pad)
v = b.reshape(n, m).astype(bool)

pd.DataFrame(np.column_stack([u1, v]), columns=np.append('company', u2))

  company      x      y
0       a   True  False
1       b   True   True
2       c  False   True

Timing
small data

%timeit df.groupby(['company', 'partner']).size().unstack(fill_value=0).astype(bool).rename_axis(None, 1).reset_index()
%timeit pd.crosstab(df.company, df.partner).astype(bool).rename_axis(None, 1).reset_index()

%%timeit
f1, u1 = pd.factorize(df.company.values)
f2, u2 = pd.factorize(df.partner.values)
n, m = u1.size, u2.size

b = np.bincount(f1 * m + f2)
pad = np.zeros(n * m - b.size, dtype=int)
b = np.append(b, pad)
v = b.reshape(n, m).astype(bool)

pd.DataFrame(np.column_stack([u1, v]), columns=np.append('company', u2))

1000 loops, best of 3: 1.67 ms per loop
100 loops, best of 3: 5.97 ms per loop
1000 loops, best of 3: 301 µs per loop



ANSWER 2

Score 0


Another option:

df = df.pivot(values='partner', columns='partner', index='company').reset_index()

to

df = df.pivot(values='partner', columns='partner', index='company').notna()

Still, I like lukeA's answer in the comments even better:

df.assign(val=True).pivot_table(values='val', index='company', columns='partner', fill_value=False)




ANSWER 3

Score 0


use the aggfunc any which return true for any non nan value

  df = pd.DataFrame({'company':['a','b','c','b'], 'partner':['x','x','y','y'], 'str':['just','some','random','words']})
  fp=df.pivot_table(index=['company'],columns=['partner'],aggfunc=any).fillna(False)
  print(fp.head())

output

str       
partner      x      y
company              
a         True  False
b         True   True
c        False   True