I have 414 input features, with a sequence length of 120, and batch size 1000. I want to feed the first 200 features through an embedding layer, before having the final network_input that will go to a transformer decoder.
# Concatenate input features (continuous and categorical)
network_input = torch.cat([x["encoder_cont"], x["encoder_cat"]], dim=2)
print("network_input size: " + str(network_input.size())) #OUTPUTS: torch.Size([1000, 120, 414])
# Separate the first 200 elements for embedding
embedding_input = network_input[:,:, :200]
remaining_input = network_input[:,:, 200:]
print("embed size: " + str(embedding_input.size())) # output: torch.Size([1000, 120, 200])
print("remaining_input size: " + str(remaining_input.size())) # output: torch.Size([1000, 120, 214])
# Apply the embedding layer to the first 200 elements
embedded_input = self.embedding_layer(embedding_input.long())
print("2 size: " + str(embedded_input.size())) # output torch.Size([1000, 120, 200, 30])
embedded_input = embedded_input.view(embedded_input.size(0), embedded_input.size(1), -1) #embedded_input.view(embedded_input.size(1), -1)
print("3 size: " + str(embedded_input.size())) # output torch.Size([1000, 120, 6000])
# Concatenate the embedded input with the remaining input
network_input = torch.cat([embedded_input, remaining_input], dim=2) # ERROR input and weight.T shapes cannot be multiplied (120x6214 and 244x244)
I know the output of the last layer is not correct. But I am confused by the output of the embedding layer. Can I even use this in this way, while keeping sequence length 120 the same?