Numpy sum of operator results without allocating an unnecessary array
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Hypnotic Orient Looping
--
Chapters
00:00 Numpy Sum Of Operator Results Without Allocating An Unnecessary Array
00:58 Answer 1 Score 1
01:17 Answer 2 Score 1
01:50 Accepted Answer Score 2
02:43 Answer 4 Score 0
03:24 Thank you
--
Full question
https://stackoverflow.com/questions/1796...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #arrays #optimization #numpy #numexpr
#avk47
ACCEPTED ANSWER
Score 2
On my machine this is faster:
(a == b).sum()
If you don't want to use any extra storage, than I would suggest using numba. I'm not too familiar with it, but this seems to work well. I ran into some trouble getting Cython to take a boolean NumPy array.
from numba import autojit
def pysumeq(a, b):
tot = 0
for i in xrange(a.shape[0]):
for j in xrange(a.shape[1]):
if a[i,j] == b[i,j]:
tot += 1
return tot
# make numba version
nbsumeq = autojit(pysumeq)
A = (rand(10,10)<.5)
B = (rand(10,10)<.5)
# do a simple dry run to get it to compile
# for this specific use case
nbsumeq(A, B)
If you don't have numba, I would suggest using the answer by @user2357112
Edit: Just got a Cython version working, here's the .pyx file. I'd go with this.
from numpy cimport ndarray as ar
cimport numpy as np
cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)
def cysumeq(ar[np.uint8_t,ndim=2,cast=True] a, ar[np.uint8_t,ndim=2,cast=True] b):
cdef int i, j, h=a.shape[0], w=a.shape[1], tot=0
for i in xrange(h):
for j in xrange(w):
if a[i,j] == b[i,j]:
tot += 1
return tot
ANSWER 2
Score 1
To start with you can skip then A*B step:
>>> a
array([ True, False, True, False, True], dtype=bool)
>>> b
array([False, True, True, False, True], dtype=bool)
>>> np.sum(~(a^b))
3
If you do not mind destroying array a or b, I am not sure you will get faster then this:
>>> a^=b #In place xor operator
>>> np.sum(~a)
3
ANSWER 3
Score 1
If the problem is allocation and deallocation, maintain a single output array and tell numpy to put the results there every time:
out = np.empty_like(a) # Allocate this outside a loop and use it every iteration
num_eq = np.equal(a, b, out).sum()
This'll only work if the inputs are always the same dimensions, though. You may be able to make one big array and slice out a part that's the size you need for each call if the inputs have varying sizes, but I'm not sure how much that slows you down.
ANSWER 4
Score 0
Improving upon IanH's answer, it's also possible to get access to the underlying C array in a numpy array from within Cython, by supplying mode="c" to ndarray.
from numpy cimport ndarray as ar
cimport numpy as np
cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)
cdef int cy_sum_eq(ar[np.uint8_t,ndim=2,cast=True,mode="c"] a, ar[np.uint8_t,ndim=2,cast=True,mode="c"] b):
cdef int i, j, h=a.shape[0], w=a.shape[1], tot=0
cdef np.uint8_t* adata = &a[0, 0]
cdef np.uint8_t* bdata = &b[0, 0]
for i in xrange(h):
for j in xrange(w):
if adata[j] == bdata[j]:
tot += 1
adata += w
bdata += w
return tot
This is about 40% faster on my machine than IanH's Cython version, and I've found that rearranging the loop contents doesn't seem to make much of a difference at this point probably due to compiler optimizations. At this point, one could potentially link to a C function optimized with SSE and such to perform this operation and pass adata and bdata as uint8_t*s