fast selection of rows where at least N many columns hold true in numpy/scipy
Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Puzzle Game 5
--
Chapters
00:00 Question
01:02 Accepted answer (Score 3)
01:49 Thank you
--
Full question
https://stackoverflow.com/questions/1192...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #numpy #scipy
#avk47
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Puzzle Game 5
--
Chapters
00:00 Question
01:02 Accepted answer (Score 3)
01:49 Thank you
--
Full question
https://stackoverflow.com/questions/1192...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #numpy #scipy
#avk47
ACCEPTED ANSWER
Score 3
I'd probably do
In [29]: timeit a[(a % 2 == 0).sum(axis=1) >= 2]
10000 loops, best of 3: 29.5 us per loop
which works because True/False have integer values of 1/0. For comparison:
In [30]: timeit a[where(array(map(lambda x: sum(x), a % 2 == 0)) >= N)]
10000 loops, best of 3: 72 us per loop
In [31]: timeit a[where(sum(apply_along_axis(lambda x: x % 2 == 0, 1, a), axis=1) >= 2)]
1000 loops, best of 3: 220 us per loop
Note that using lambdas costs you a lot of the benefits of using numpy in the first place, and lambda x: sum(x) is simply a more verbose and slower way of writing sum here anyway.
Also note that if the array were large, it'd probably be more efficient to use a method which could short-circuit rather than the above.