The Python Oracle

Delete columns based on repeat value in one row in numpy array

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: RPG Blues Looping

--

Chapters
00:00 Question
01:27 Accepted answer (Score 2)
02:31 Answer 2 (Score 1)
02:55 Answer 3 (Score 0)
03:25 Thank you

--

Full question
https://stackoverflow.com/questions/3860...

Answer 2 links:
[numpy_indexed]: https://github.com/EelcoHoogendoorn/Nump...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #arrays #numpy

#avk47



ACCEPTED ANSWER

Score 2


You can use the optional arguments that come with np.unique and then use np.bincount to use the last row as weights to get the final averaged output, like so -

_,unqID,tag,C = np.unique(arr[1],return_index=1,return_inverse=1,return_counts=1)
out = arr[:,unqID]
out[-1] = np.bincount(tag,arr[3])/C

Sample run -

In [212]: arr
Out[212]: 
array([[   1. ,    1. ,    1. ,    1. ,    1. ,    1. ,    1. ,    1. ],
       [   0.5,    1. ,    2.5,    4. ,    2.5,    2. ,    1. ,    3.5],
       [   1. ,    1.5,    3. ,    4.5,    3. ,    2.5,    1.5,    4. ],
       [ 228. ,  314. ,  173. ,  452. ,  168. ,  351. ,  300. ,  396. ]])

In [213]: out
Out[213]: 
array([[   1. ,    1. ,    1. ,    1. ,    1. ,    1. ],
       [   0.5,    1. ,    2. ,    2.5,    3.5,    4. ],
       [   1. ,    1.5,    2.5,    3. ,    4. ,    4.5],
       [ 228. ,  307. ,  351. ,  170.5,  396. ,  452. ]])

As can be seen that the output has now an order with the second row being sorted. If you are looking to keep the order as it was originally, use np.argsort of unqID, like so -

In [221]: out[:,unqID.argsort()]
Out[221]: 
array([[   1. ,    1. ,    1. ,    1. ,    1. ,    1. ],
       [   0.5,    1. ,    2.5,    4. ,    2. ,    3.5],
       [   1. ,    1.5,    3. ,    4.5,    2.5,    4. ],
       [ 228. ,  307. ,  170.5,  452. ,  351. ,  396. ]])



ANSWER 2

Score 1


You can find the indices of wanted columns using unique:

>>> indices = np.sort(np.unique(A[1], return_index=True)[1])

Then use a simple indexing to get the desire columns:

>>> A[:,indices]
array([[   1. ,    1. ,    1. ,    1. ,    1. ,    1. ],
       [   0.5,    1. ,    2.5,    4. ,    2. ,    3.5],
       [   1. ,    1.5,    3. ,    4.5,    2.5,    4. ],
       [ 228. ,  314. ,  173. ,  452. ,  351. ,  396. ]])



ANSWER 3

Score 0


This is a typical grouping problem, which can be solve elegantly and efficiently using the numpy_indexed package (disclaimer: I am its author):

import numpy_indexed as npi
unique, final_array = npi.group_by(initial_array[1]).mean(initial_array, axis=1)

Note that there are many other reductions than mean; if you want the original behavior you described, you could replace 'mean' with 'first', for instance.