The Python Oracle

Numpy: Duplicate mask for an array (returning True if we've seen that value before, False otherwise)

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Romantic Lands Beckon

--

Chapters
00:00 Numpy: Duplicate Mask For An Array (Returning True If We'Ve Seen That Value Before, False Otherw
00:31 Accepted Answer Score 3
01:21 Answer 2 Score 3
01:33 Answer 3 Score 0
01:50 Thank you

--

Full question
https://stackoverflow.com/questions/6396...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #list #numpy

#avk47



ACCEPTED ANSWER

Score 3


Approach #1 : With sorting

def mask_firstocc(a):
    sidx = a.argsort(kind='stable')
    b = a[sidx]
    out = np.r_[False,b[:-1] == b[1:]][sidx.argsort()]
    return out

We can use array-assignment to boost perf. further -

def mask_firstocc_v2(a):
    sidx = a.argsort(kind='stable')
    b = a[sidx]
    mask = np.r_[False,b[:-1] == b[1:]]
    out = np.empty(len(a), dtype=bool)
    out[sidx] = mask
    return out

Sample run -

In [166]: a
Out[166]: array([2, 1, 1, 0, 0, 4, 0, 3])

In [167]: mask_firstocc(a)
Out[167]: array([False, False,  True, False,  True, False,  True, False])

Approach #2 : With np.unique(..., return_index)

We can leverage np.unique with its return_index which seems to return the first occurence of each unique elemnent, hence a simple array-assignment and then indexing works -

def mask_firstocc_with_unique(a):
    mask = np.ones(len(a), dtype=bool)
    mask[np.unique(a, return_index=True)[1]] = False
    return mask



ANSWER 2

Score 3


Use np.unique

a = np.array([1, 2, 1, 2, 3])
_, ix = np.unique(a, return_index=True)
b = np.full(a.shape, True)
b[ix] = False

In [45]: b
Out[45]: array([False, False,  True,  True, False])



ANSWER 3

Score 0


You can achieve that using the enumerate method - which lets you loop through using index + value :

array = [1, 2, 1, 2, 3]

mask = []

for i,v in enumerate(array):
  if array.index(v) == i:
    mask.append(False)
  else:
    mask.append(True)


print(mask)  

Output:

[False, False, True, True, False]