The Python Oracle

Pandas counting occurrence of list contained in column of lists

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Life in a Drop

--

Chapters
00:00 Question
03:26 Accepted answer (Score 5)
04:05 Answer 2 (Score 0)
04:22 Thank you

--

Full question
https://stackoverflow.com/questions/4741...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas #vectorization

#avk47



ACCEPTED ANSWER

Score 5


You can utilise DataFrame.apply along with the builtin set.issubset method and then .sum() which all operate at a lower level (normally C level) than Python equivalents do.

subset_wanted = {2, 3}
count = df.m.apply(subset_wanted.issubset).sum()

I can't see shaving more time off that than writing a custom C-level function which'd be the equivalent of a custom sum with a check there's a subset to determine 0/1 on a row by row basis. At which point, you could have run this thousands upon thousands of times anyway.




ANSWER 2

Score 0


Since you are looking more a set-like behavior

(df.m.apply(lambda x: set(x).intersection(set([2,3]))) == set([2,3])).sum()

Returns

3