Explanation of boolean indexing behaviors

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Sunrise at the Stream

--

Chapters
00:00 Question
02:22 Accepted answer (Score 4)
04:20 Thank you

--

Full question
https://stackoverflow.com/questions/6559...

Question links:
[Boolean array indexing]: https://numpy.org/devdocs/reference/arra...

Accepted answer links:
[numpy source]: https://github.com/numpy/numpy/blob/10ee...
[numpy#3798]: https://github.com/numpy/numpy/pull/3798/
[this comment]: https://github.com/numpy/numpy/pull/3798...
[numpy#3798]: https://github.com/numpy/numpy/pull/3798/

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #numpy #booleanindexing

#avk47

ACCEPTED ANSWER

Score 4

Boolean scalar indexing is not well-documented, but you can trace how it is handled in the source code. See for example this comment and associated code in the numpy source:

/*
* This can actually be well defined. A new axis is added,
* but at the same time no axis is "used". So if we have True,
* we add a new axis (a bit like with np.newaxis). If it is
* False, we add a new axis, but this axis has 0 entries.
*/

So if an index is a scalar boolean, a new axis is added. If the value is True the size of the axis is 1, and if the value is False, the size of the axis is zero.

This behavior was introduced in numpy#3798, and the author outlines the motivation in this comment; roughly, the aim was to provide consistency in the output of filtering operations. For example:

x = np.ones((2, 2))
assert x[x > 0].ndim == 1

x = np.ones(2)
assert x[x > 0].ndim == 1

x = np.ones(())
assert x[x > 0].ndim == 1  # scalar boolean here!

The interesting thing is that any subsequent scalar booleans after the first do not add additional dimensions! From an implementation standpoint, this seems to be due to consecutive 0D boolean indices being treated as equivalent to consecutive fancy indices (i.e. HAS_0D_BOOL is treated as HAS_FANCY in some cases) and thus are combined in the same way as fancy indices. From a logical standpoint, this corner-case behavior does not appear to be intentional: for example, I can't find any discussion of it in numpy#3798.

Given that, I would recommend considering this behavior poorly-defined, and avoid it in favor of well-documented indexing approaches.