Selecting columns in numpy based on a Boolean vector
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Lost Meadow
--
Chapters
00:00 Selecting Columns In Numpy Based On A Boolean Vector
00:27 Accepted Answer Score 5
01:16 Answer 2 Score 4
01:49 Answer 3 Score 4
02:42 Thank you
--
Full question
https://stackoverflow.com/questions/2776...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #arrays #numpy
#avk47
ACCEPTED ANSWER
Score 5
First off, let's set up some example code:
import numpy as np
m, n = 5, 3
a = np.zeros((m, n))
b = np.ones((m, n))
boolvec = np.random.randint(0, 2, m).astype(bool)
Just to show what this data might look like:
In [2]: a
Out[2]:
array([[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]])
In [3]: b
Out[3]:
array([[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.]])
In [4]: boolvec
Out[4]: array([ True, True, False, False, False], dtype=bool)
In this case, it's most efficient to use np.where for this. However, we need boolvec to be of a shape that can broadcast to the same shape as a and b. Therefore, we can make it a column vector by slicing with np.newaxis or None (they're the same):
In [5]: boolvec[:,None]
Out[5]:
array([[ True],
[ True],
[False],
[False],
[False]], dtype=bool)
And then we can make the final result using np.where:
In [6]: c = np.where(boolvec[:, None], a, b)
In [7]: c
Out[7]:
array([[ 0., 0., 0.],
[ 0., 0., 0.],
[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.]])
ANSWER 2
Score 4
You could use np.choose for this.
For example a and b arrays:
>>> a = np.arange(12).reshape(3,4)
>>> b = np.arange(12).reshape(3,4) + 100
>>> a_and_b = np.array([a, b])
To use np.choose, we want a 3D array with both arrays; a_and_b looks like this:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[100, 101, 102, 103],
[104, 105, 106, 107],
[108, 109, 110, 111]]])
Now let the Boolean array be bl = np.array([0, 1, 1, 0]). Then:
>>> np.choose(bl, a_and_b)
array([[ 0, 101, 102, 3],
[ 4, 105, 106, 7],
[ 8, 109, 110, 11]])
ANSWER 3
Score 4
Timings for (5000,3000) arrays are:
In [107]: timeit np.where(boolvec[:,None],b,a)
1 loops, best of 3: 993 ms per loop
In [108]: timeit np.choose(boolvec[:,None],[a,b])
1 loops, best of 3: 929 ms per loop
In [109]: timeit c=a[:];c[boolvec,:]=b[boolvec,:]
1 loops, best of 3: 786 ms per loop
where and choose are essentially the same; boolean indexing slightly faster. select uses choose, so I didn't time it.
My timings for column sampling are similar, except this time the indexing is slower:
In [119]: timeit np.where(cols,b,a)
1 loops, best of 3: 878 ms per loop
In [120]: timeit np.choose(cols,[a,b])
1 loops, best of 3: 915 ms per loop
In [121]: timeit c=a[:];c[:,cols]=b[:,cols]
1 loops, best of 3: 1.25 s per loop
Correction, for the indexing I should be using a.copy().
In [32]: timeit c=a.copy();c[boolvec,:]=b[boolvec,:]
1 loops, best of 3: 783 ms per loop
In [33]: timeit c=a.copy();c[:,cols]=b[:,cols]
1 loops, best of 3: 1.44 s per loop
I get the same timings for Python2.7 and 3, numpy 1.8.2 and 1.9.0 dev