Fast way to find index of array in array of arrays

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Puzzle Game 5 Looping

--

Chapters
00:00 Fast Way To Find Index Of Array In Array Of Arrays
01:04 Answer 1 Score 0
01:23 Accepted Answer Score 6
02:24 Thank you

--

Full question
https://stackoverflow.com/questions/1781...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #arrays #search #numpy #multidimensionalarray

#avk47

ACCEPTED ANSWER

Score 6

This is Jaime's idea, I just love it:

import numpy as np

def asvoid(arr):
    """View the array as dtype np.void (bytes)
    This collapses ND-arrays to 1D-arrays, so you can perform 1D operations on them.
    https://stackoverflow.com/a/16216866/190597 (Jaime)"""    
    arr = np.ascontiguousarray(arr)
    return arr.view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1])))

def find_index(arr, x):
    arr_as1d = asvoid(arr)
    x = asvoid(x)
    return np.nonzero(arr_as1d == x)[0]


arr = np.array([[  1,  15,   0,   0],
                [ 30,  10,   0,   0],
                [ 30,  20,   0,   0],
                [1, 2, 3, 4],
                [104, 139, 146,  75],
                [  9,  11, 146,  74],
                [  9, 138, 146,  75]], dtype='uint8')

arr = np.tile(arr,(1221488,1))
x = np.array([1,2,3,4], dtype='uint8')

print(find_index(arr, x))

yields

[      3      10      17 ..., 8550398 8550405 8550412]

The idea is to view each row of the array as a string. For example,

In [15]: x
Out[15]: 
array([^A^B^C^D], 
      dtype='|V4')

The strings look like garbage, but they are really just the underlying data in each row viewed as bytes. You can then compare arr_as1d == x to find which rows equal x.

There is another way to do it:

def find_index2(arr, x):
    return np.where((arr == x).all(axis=1))[0]

but it turns out to be not as fast:

In [34]: %timeit find_index(arr, x)
1 loops, best of 3: 209 ms per loop

In [35]: %timeit find_index2(arr, x)
1 loops, best of 3: 370 ms per loop

ANSWER 2

Score 0

If you perform search more than one time and you don't mind to use extra memory, you can create set from you array (I'm using list here, but it's almost the same code):

>>> elem = [1, 2, 3, 4]    
>>> elements = [[  1,  15,   0,   0], [ 30,  10,   0,   0], [1, 2, 3, 4]]
>>> index = set([tuple(x) for x in elements])
>>> True if tuple(elem) in index else False
True