How to correctly use mask_zero=True for Keras Embedding with pre-trained weights?
--------------------------------------------------
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Hypnotic Orient Looping
--
Chapters
00:00 How To Correctly Use Mask_zero=True For Keras Embedding With Pre-Trained Weights?
02:10 Accepted Answer Score 6
02:50 Thank you
--
Full question
https://stackoverflow.com/questions/5138...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #tensorflow #keras #wordembedding
#avk47
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Hypnotic Orient Looping
--
Chapters
00:00 How To Correctly Use Mask_zero=True For Keras Embedding With Pre-Trained Weights?
02:10 Accepted Answer Score 6
02:50 Thank you
--
Full question
https://stackoverflow.com/questions/5138...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #tensorflow #keras #wordembedding
#avk47
ACCEPTED ANSWER
Score 6
You're second approach is correct. You will want to construct your embedding layer in the following way
embedding = Embedding(
output_dim=embedding_size,
input_dim=vocabulary_size + 1,
input_length=input_length,
mask_zero=True,
weights=[np.vstack((np.zeros((1, embedding_size)),
embedding_matrix))],
name='embedding'
)(input_layer)
where embedding_matrix is the second matrix you provided.
You can see this by looking at the implementation of keras' embedding layer. Notably, how mask_zero is only used to literally mask the inputs
def compute_mask(self, inputs, mask=None):
if not self.mask_zero:
return None
output_mask = K.not_equal(inputs, 0)
return output_mask
thus the entire kernel is still multiplied by the input, meaning all indexes are shifted up by one.