The Python Oracle

Why is creating a masked numpy array so slow with mask=None or mask=0

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Puzzling Curiosities

--

Chapters
00:00 Question
01:31 Accepted answer (Score 5)
01:59 Answer 2 (Score 2)
03:28 Thank you

--

Full question
https://stackoverflow.com/questions/3746...

Question links:
[function docs]: http://docs.scipy.org/doc/numpy/referenc...

Accepted answer links:
[NumPy 1.11.0 source code]: https://github.com/numpy/numpy/blob/v1.1...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #performance #numpy

#avk47



ACCEPTED ANSWER

Score 5


mask=False is special-cased in the NumPy 1.11.0 source code:

if mask is True and mdtype == MaskType:
    mask = np.ones(_data.shape, dtype=mdtype)
elif mask is False and mdtype == MaskType:
    mask = np.zeros(_data.shape, dtype=mdtype)

mask=0 or mask=None take the slow path, making a 0-dimensional mask array and going through np.resize to resize it.




ANSWER 2

Score 2


I believe @user2357112 has the explanation. I profiled both cases, here are the results:

In [14]: q.run('q.np.ma.array(q.data, mask=None, copy=False)')
         49 function calls in 0.161 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        3    0.000    0.000    0.000    0.000 :0(array)
        1    0.154    0.154    0.154    0.154 :0(concatenate)
        1    0.000    0.000    0.161    0.161 :0(exec)
       11    0.000    0.000    0.000    0.000 :0(getattr)
        1    0.000    0.000    0.000    0.000 :0(hasattr)
        7    0.000    0.000    0.000    0.000 :0(isinstance)
        1    0.000    0.000    0.000    0.000 :0(len)
        1    0.000    0.000    0.000    0.000 :0(ravel)
        1    0.000    0.000    0.000    0.000 :0(reduce)
        1    0.000    0.000    0.000    0.000 :0(reshape)
        1    0.000    0.000    0.000    0.000 :0(setprofile)
        5    0.000    0.000    0.000    0.000 :0(update)
        1    0.000    0.000    0.161    0.161 <string>:1(<module>)
        1    0.000    0.000    0.161    0.161 core.py:2704(__new__)
        1    0.000    0.000    0.000    0.000 core.py:2838(_update_from)
        1    0.000    0.000    0.000    0.000 core.py:2864(__array_finalize__)
        5    0.000    0.000    0.000    0.000 core.py:3264(__setattr__)
        1    0.000    0.000    0.161    0.161 core.py:6119(array)
        1    0.007    0.007    0.161    0.161 fromnumeric.py:1097(resize)
        1    0.000    0.000    0.000    0.000 fromnumeric.py:128(reshape)
        1    0.000    0.000    0.000    0.000 fromnumeric.py:1383(ravel)
        1    0.000    0.000    0.000    0.000 numeric.py:484(asanyarray)
        0    0.000             0.000          profile:0(profiler)
        1    0.000    0.000    0.161    0.161 profile:0(q.np.ma.array(q.data, mask=None, copy=False))

In [15]: q.run('q.np.ma.array(q.data, mask=False, copy=False)')
         37 function calls in 0.000 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 :0(array)
        1    0.000    0.000    0.000    0.000 :0(exec)
       11    0.000    0.000    0.000    0.000 :0(getattr)
        1    0.000    0.000    0.000    0.000 :0(hasattr)
        5    0.000    0.000    0.000    0.000 :0(isinstance)
        1    0.000    0.000    0.000    0.000 :0(setprofile)
        5    0.000    0.000    0.000    0.000 :0(update)
        1    0.000    0.000    0.000    0.000 :0(zeros)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 core.py:2704(__new__)
        1    0.000    0.000    0.000    0.000 core.py:2838(_update_from)
        1    0.000    0.000    0.000    0.000 core.py:2864(__array_finalize__)
        5    0.000    0.000    0.000    0.000 core.py:3264(__setattr__)
        1    0.000    0.000    0.000    0.000 core.py:6119(array)
        0    0.000             0.000          profile:0(profiler)
        1    0.000    0.000    0.000    0.000 profile:0(q.np.ma.array(q.data, mask=False, copy=False))

So it seems that the concatenation step of arrays is the bottleneck.