Taking mean of numpy ndarray with masked elements
Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Music Box Puzzles
--
Chapters
00:00 Question
01:19 Accepted answer (Score 5)
01:37 Answer 2 (Score 1)
02:27 Thank you
--
Full question
https://stackoverflow.com/questions/5284...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #arrays #numpy #mask
#avk47
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Music Box Puzzles
--
Chapters
00:00 Question
01:19 Accepted answer (Score 5)
01:37 Answer 2 (Score 1)
02:27 Thank you
--
Full question
https://stackoverflow.com/questions/5284...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #arrays #numpy #mask
#avk47
ACCEPTED ANSWER
Score 5
I think the best way to do this would be something along the lines of:
masked = np.ma.masked_where(mat1 == 0 && mat2 == 0, array_to_mask)
Then take the mean with
masked.mean(axis=1)
ANSWER 2
Score 1
One similarly clunky but efficient way is to multiply your array with the mask, setting the masked values to zero. Then of course you'll have to divide by the number of non-masked values manually. Hence clunkiness. But this will work with integer-valued arrays, something that can't be said about the nan case. It also seems to be fastest for both small and larger arrays (including the masked array solution in another answer):
import numpy as np
def nanny(mat, mask):
mat = mat.astype(float).copy() # don't mutate the original
mat[~mask] = np.nan # mask values
return np.nanmean(mat, axis=0) # compute mean
def manual(mat, mask):
# zero masked values, divide by number of nonzeros
return (mat*mask).sum(axis=0)/mask.sum(axis=0)
# set up dummy data for testing
N,M = 400,400
mat1 = np.random.randint(0,N,(N,M))
mask = np.random.randint(0,2,(N,M)).astype(bool)
print(np.array_equal(nanny(mat1, mask), manual(mat1, mask))) # True