I have found an implementation of the said layer from this paper, "Self-Attention Encoding and Pooling for Speaker Recognition", available at here via Pytorch. However, due to CUDA compatibility issues, I can't want to use the said code. Also, thus far, all my codes have been implemented in Tensorflow. So, I want to do a one-to-one translation/conversion or whatever, from PyTorch to Tensorflow.
First of all, this is the code in PyTorch:
class SelfAttentionPooling(nn.Module):
def __init__(self, input_dim):
super(SelfAttentionPooling, self).__init__()
self.W = nn.Linear(input_dim, 1)
def forward(self, batch_rep):
"""
input:
batch_rep : size (N, T, H), N: batch size, T: sequence length, H: Hidden dimension
attention_weight:
att_w : size (N, T, 1)
return:
utter_rep: size (N, H)
"""
softmax = nn.functional.softmax
att_w = softmax(self.W(batch_rep).squeeze(-1)).unsqueeze(-1)
utter_rep = torch.sum(batch_rep * att_w, dim=1)
return utter_rep
And this is my translation of the snippet code to Tensorflow:
class Self_Attention_Pooling(keras.layers.Layer): ?
def __init__(self, input_dim):
super(Self_Attention_Pooling, self).__init__()
self.W = Dense(input_dim)
def forward(self, batch_rep):
softmax = Softmax()
att_w = self.W(batch_rep)
att_w = softmax(att_w)
# Not so sure about these two lines though.
#x = np.expand(batch_rep)
#att_w = softmax(self.W(x))
utter_rep = np.sum(batch_rep * att_w, axis=1)
return utter_rep
Is my implementation/translation/conversion from PyTorch to Tensorflow correct? If not, please edit and help me.
Thank you very much.
2 remarks regarding your implementation:
callmethod instead of theforwardmethod cf Implementing custom layers.numpyfunctions bytensorflowfunctions to enable GPU support.Here is the code I am using in TF for the
SelfAttentionPooling:You can quickly check it gives the expected output: