3D Embedding layer output shape

16 Views Asked by At

I have 414 input features, with a sequence length of 120, and batch size 1000. I want to feed the first 200 features through an embedding layer, before having the final network_input that will go to a transformer decoder.

    # Concatenate input features (continuous and categorical)
    network_input = torch.cat([x["encoder_cont"], x["encoder_cat"]], dim=2)
    print("network_input size: " + str(network_input.size())) #OUTPUTS: torch.Size([1000, 120, 414])
    
    # Separate the first 200 elements for embedding
    embedding_input = network_input[:,:, :200]
    remaining_input = network_input[:,:, 200:]

    print("embed size: " + str(embedding_input.size())) # output: torch.Size([1000, 120, 200])
    print("remaining_input size: " + str(remaining_input.size())) # output: torch.Size([1000, 120, 214])
    
    # Apply the embedding layer to the first 200 elements
    embedded_input = self.embedding_layer(embedding_input.long())
    print("2 size: " + str(embedded_input.size())) # output torch.Size([1000, 120, 200, 30])

    embedded_input = embedded_input.view(embedded_input.size(0), embedded_input.size(1), -1) #embedded_input.view(embedded_input.size(1), -1)
    print("3 size: " + str(embedded_input.size())) # output torch.Size([1000, 120, 6000])

    # Concatenate the embedded input with the remaining input
    network_input = torch.cat([embedded_input, remaining_input], dim=2) # ERROR input and weight.T shapes cannot be multiplied (120x6214 and 244x244)

I know the output of the last layer is not correct. But I am confused by the output of the embedding layer. Can I even use this in this way, while keeping sequence length 120 the same?

0

There are 0 best solutions below