Why is creating a masked numpy array so slow with mask=None or mask=0
--------------------------------------------------
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Switch On Looping
--
Chapters
00:00 Why Is Creating A Masked Numpy Array So Slow With Mask=None Or Mask=0
01:14 Accepted Answer Score 5
01:37 Answer 2 Score 2
02:59 Thank you
--
Full question
https://stackoverflow.com/questions/3746...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #performance #numpy
#avk47
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Switch On Looping
--
Chapters
00:00 Why Is Creating A Masked Numpy Array So Slow With Mask=None Or Mask=0
01:14 Accepted Answer Score 5
01:37 Answer 2 Score 2
02:59 Thank you
--
Full question
https://stackoverflow.com/questions/3746...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #performance #numpy
#avk47
ACCEPTED ANSWER
Score 5
mask=False is special-cased in the NumPy 1.11.0 source code:
if mask is True and mdtype == MaskType:
mask = np.ones(_data.shape, dtype=mdtype)
elif mask is False and mdtype == MaskType:
mask = np.zeros(_data.shape, dtype=mdtype)
mask=0 or mask=None take the slow path, making a 0-dimensional mask array and going through np.resize to resize it.
ANSWER 2
Score 2
I believe @user2357112 has the explanation. I profiled both cases, here are the results:
In [14]: q.run('q.np.ma.array(q.data, mask=None, copy=False)')
49 function calls in 0.161 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
3 0.000 0.000 0.000 0.000 :0(array)
1 0.154 0.154 0.154 0.154 :0(concatenate)
1 0.000 0.000 0.161 0.161 :0(exec)
11 0.000 0.000 0.000 0.000 :0(getattr)
1 0.000 0.000 0.000 0.000 :0(hasattr)
7 0.000 0.000 0.000 0.000 :0(isinstance)
1 0.000 0.000 0.000 0.000 :0(len)
1 0.000 0.000 0.000 0.000 :0(ravel)
1 0.000 0.000 0.000 0.000 :0(reduce)
1 0.000 0.000 0.000 0.000 :0(reshape)
1 0.000 0.000 0.000 0.000 :0(setprofile)
5 0.000 0.000 0.000 0.000 :0(update)
1 0.000 0.000 0.161 0.161 <string>:1(<module>)
1 0.000 0.000 0.161 0.161 core.py:2704(__new__)
1 0.000 0.000 0.000 0.000 core.py:2838(_update_from)
1 0.000 0.000 0.000 0.000 core.py:2864(__array_finalize__)
5 0.000 0.000 0.000 0.000 core.py:3264(__setattr__)
1 0.000 0.000 0.161 0.161 core.py:6119(array)
1 0.007 0.007 0.161 0.161 fromnumeric.py:1097(resize)
1 0.000 0.000 0.000 0.000 fromnumeric.py:128(reshape)
1 0.000 0.000 0.000 0.000 fromnumeric.py:1383(ravel)
1 0.000 0.000 0.000 0.000 numeric.py:484(asanyarray)
0 0.000 0.000 profile:0(profiler)
1 0.000 0.000 0.161 0.161 profile:0(q.np.ma.array(q.data, mask=None, copy=False))
In [15]: q.run('q.np.ma.array(q.data, mask=False, copy=False)')
37 function calls in 0.000 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 :0(array)
1 0.000 0.000 0.000 0.000 :0(exec)
11 0.000 0.000 0.000 0.000 :0(getattr)
1 0.000 0.000 0.000 0.000 :0(hasattr)
5 0.000 0.000 0.000 0.000 :0(isinstance)
1 0.000 0.000 0.000 0.000 :0(setprofile)
5 0.000 0.000 0.000 0.000 :0(update)
1 0.000 0.000 0.000 0.000 :0(zeros)
1 0.000 0.000 0.000 0.000 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 core.py:2704(__new__)
1 0.000 0.000 0.000 0.000 core.py:2838(_update_from)
1 0.000 0.000 0.000 0.000 core.py:2864(__array_finalize__)
5 0.000 0.000 0.000 0.000 core.py:3264(__setattr__)
1 0.000 0.000 0.000 0.000 core.py:6119(array)
0 0.000 0.000 profile:0(profiler)
1 0.000 0.000 0.000 0.000 profile:0(q.np.ma.array(q.data, mask=False, copy=False))
So it seems that the concatenation step of arrays is the bottleneck.