Convert array of indices to one-hot encoded array in NumPy

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Horror Game Menu Looping

--

Chapters
00:00 Question
00:25 Accepted answer (Score 520)
00:48 Answer 2 (Score 256)
01:01 Answer 3 (Score 54)
01:23 Answer 4 (Score 51)
02:07 Thank you

--

Full question
https://stackoverflow.com/questions/2983...

Answer 2 links:
[@YXD's answer]: https://stackoverflow.com/a/29831596/422...
[source-code]: https://github.com/keras-team/keras/blob...

Answer 3 links:
[Sequence models - deeplearning.ai]: https://www.coursera.org/learn/nlp-seque...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #numpy #machinelearning #numpyndarray #onehotencoding

#avk47

ACCEPTED ANSWER

Score 545

Create a zeroed array b with enough columns, i.e. a.max() + 1.
Then, for each row i, set the a[i]th column to 1.

>>> a = np.array([1, 0, 3])
>>> b = np.zeros((a.size, a.max() + 1))
>>> b[np.arange(a.size), a] = 1

>>> b
array([[ 0.,  1.,  0.,  0.],
       [ 1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  1.]])

ANSWER 2

Score 280

>>> values = [1, 0, 3]
>>> n_values = np.max(values) + 1
>>> np.eye(n_values)[values]
array([[ 0.,  1.,  0.,  0.],
       [ 1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  1.]])

ANSWER 3

Score 59

In case you are using keras, there is a built in utility for that:

from keras.utils.np_utils import to_categorical   

categorical_labels = to_categorical(int_labels, num_classes=3)

And it does pretty much the same as @YXD's answer (see source-code).

ANSWER 4

Score 56

Here is what I find useful:

def one_hot(a, num_classes):
  return np.squeeze(np.eye(num_classes)[a.reshape(-1)])

Here num_classes stands for number of classes you have. So if you have a vector with shape of (10000,) this function transforms it to (10000,C). Note that a is zero-indexed, i.e. one_hot(np.array([0, 1]), 2) will give [[1, 0], [0, 1]].

Exactly what you wanted to have I believe.

PS: the source is Sequence models - deeplearning.ai