Taking mean of numpy ndarray with masked elements

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Music Box Puzzles

--

Chapters
00:00 Question
01:19 Accepted answer (Score 5)
01:37 Answer 2 (Score 1)
02:27 Thank you

--

Full question
https://stackoverflow.com/questions/5284...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #arrays #numpy #mask

#avk47

ACCEPTED ANSWER

Score 5

I think the best way to do this would be something along the lines of:

masked = np.ma.masked_where(mat1 == 0 && mat2 == 0, array_to_mask)

Then take the mean with

masked.mean(axis=1)

ANSWER 2

Score 1

One similarly clunky but efficient way is to multiply your array with the mask, setting the masked values to zero. Then of course you'll have to divide by the number of non-masked values manually. Hence clunkiness. But this will work with integer-valued arrays, something that can't be said about the nan case. It also seems to be fastest for both small and larger arrays (including the masked array solution in another answer):

import numpy as np

def nanny(mat, mask):
    mat = mat.astype(float).copy() # don't mutate the original
    mat[~mask] = np.nan            # mask values
    return np.nanmean(mat, axis=0) # compute mean

def manual(mat, mask):
    # zero masked values, divide by number of nonzeros
    return (mat*mask).sum(axis=0)/mask.sum(axis=0)

# set up dummy data for testing
N,M = 400,400
mat1 = np.random.randint(0,N,(N,M))
mask = np.random.randint(0,2,(N,M)).astype(bool)

print(np.array_equal(nanny(mat1, mask), manual(mat1, mask))) # True