The Python Oracle

Delete columns based on repeat value in one row in numpy array

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Droplet of life

--

Chapters
00:00 Delete Columns Based On Repeat Value In One Row In Numpy Array
01:11 Answer 1 Score 1
01:31 Accepted Answer Score 2
02:20 Answer 3 Score 0
02:43 Thank you

--

Full question
https://stackoverflow.com/questions/3860...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #arrays #numpy

#avk47



ACCEPTED ANSWER

Score 2


You can use the optional arguments that come with np.unique and then use np.bincount to use the last row as weights to get the final averaged output, like so -

_,unqID,tag,C = np.unique(arr[1],return_index=1,return_inverse=1,return_counts=1)
out = arr[:,unqID]
out[-1] = np.bincount(tag,arr[3])/C

Sample run -

In [212]: arr
Out[212]: 
array([[   1. ,    1. ,    1. ,    1. ,    1. ,    1. ,    1. ,    1. ],
       [   0.5,    1. ,    2.5,    4. ,    2.5,    2. ,    1. ,    3.5],
       [   1. ,    1.5,    3. ,    4.5,    3. ,    2.5,    1.5,    4. ],
       [ 228. ,  314. ,  173. ,  452. ,  168. ,  351. ,  300. ,  396. ]])

In [213]: out
Out[213]: 
array([[   1. ,    1. ,    1. ,    1. ,    1. ,    1. ],
       [   0.5,    1. ,    2. ,    2.5,    3.5,    4. ],
       [   1. ,    1.5,    2.5,    3. ,    4. ,    4.5],
       [ 228. ,  307. ,  351. ,  170.5,  396. ,  452. ]])

As can be seen that the output has now an order with the second row being sorted. If you are looking to keep the order as it was originally, use np.argsort of unqID, like so -

In [221]: out[:,unqID.argsort()]
Out[221]: 
array([[   1. ,    1. ,    1. ,    1. ,    1. ,    1. ],
       [   0.5,    1. ,    2.5,    4. ,    2. ,    3.5],
       [   1. ,    1.5,    3. ,    4.5,    2.5,    4. ],
       [ 228. ,  307. ,  170.5,  452. ,  351. ,  396. ]])



ANSWER 2

Score 1


You can find the indices of wanted columns using unique:

>>> indices = np.sort(np.unique(A[1], return_index=True)[1])

Then use a simple indexing to get the desire columns:

>>> A[:,indices]
array([[   1. ,    1. ,    1. ,    1. ,    1. ,    1. ],
       [   0.5,    1. ,    2.5,    4. ,    2. ,    3.5],
       [   1. ,    1.5,    3. ,    4.5,    2.5,    4. ],
       [ 228. ,  314. ,  173. ,  452. ,  351. ,  396. ]])



ANSWER 3

Score 0


This is a typical grouping problem, which can be solve elegantly and efficiently using the numpy_indexed package (disclaimer: I am its author):

import numpy_indexed as npi
unique, final_array = npi.group_by(initial_array[1]).mean(initial_array, axis=1)

Note that there are many other reductions than mean; if you want the original behavior you described, you could replace 'mean' with 'first', for instance.