fast selection of rows where at least N many columns hold true in numpy/scipy

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Puzzle Game 5

--

Chapters
00:00 Question
01:02 Accepted answer (Score 3)
01:49 Thank you

--

Full question
https://stackoverflow.com/questions/1192...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #numpy #scipy

#avk47

ACCEPTED ANSWER

Score 3

I'd probably do

In [29]: timeit a[(a % 2 == 0).sum(axis=1) >= 2]
10000 loops, best of 3: 29.5 us per loop

which works because True/False have integer values of 1/0. For comparison:

In [30]: timeit a[where(array(map(lambda x: sum(x), a % 2 == 0)) >= N)]
10000 loops, best of 3: 72 us per loop

In [31]: timeit a[where(sum(apply_along_axis(lambda x: x % 2 == 0, 1, a), axis=1) >= 2)]
1000 loops, best of 3: 220 us per loop

Note that using lambdas costs you a lot of the benefits of using numpy in the first place, and lambda x: sum(x) is simply a more verbose and slower way of writing sum here anyway.

Also note that if the array were large, it'd probably be more efficient to use a method which could short-circuit rather than the above.