Hello StackOverflow community,
I am currently working on a project where I aim to build a Siamese network for determining the similarity between names. For my word embeddings, I'm using a hybrid approach that incorporates both Glove and FastText embeddings.
Here's a brief outline of my methodology:
Embedding Layer: Combining Glove and FastText vectors to generate the embeddings for names.
def contrastive_loss(y_true, y_pred, margin=1.0):
square_pred = tf.square(y_pred)
margin_square = tf.square(tf.maximum(margin - y_pred, 0))
return tf.reduce_mean(y_true * square_pred + (1 - y_true) * margin_square)
lstm_units = 200
# Define the Siamese Network
input_seq = Input(shape=(max_sequence_length,))
embedding_layer = Embedding(vocab_size, embedding_dim, weights=[embedding_matrix], trainable=False)(input_seq)
lstm_layer = LSTM(lstm_units, kernel_regularizer=l2(0.01), recurrent_regularizer=l2(0.01), bias_regularizer=l2(0.01), dropout=0.4, recurrent_dropout=0.4)(embedding_layer)
model = Model(input_seq, lstm_layer)
# Define the two input sequences and process them through the shared model
input_seq1 = Input(shape=(max_sequence_length,))
input_seq2 = Input(shape=(max_sequence_length,))
output1 = model(input_seq1)
output2 = model(input_seq2)
# Use a lambda function to compute the L1 distance between the two LSTM outputs
distance = Lambda(lambda x: K.sqrt(K.maximum(K.sum(K.square(x[0] - x[1]), axis=1, keepdims=True), K.epsilon())))([output1, output2])
# Define the model with the two input sequences and the single output
siamese_net = Model([input_seq1, input_seq2], distance)
siamese_net.compile(optimizer='adam', loss=contrastive_loss, metrics=['accuracy'])
Training Data: My dataset consists of pairs of names with labels indicating if the names are similar (1) or not (0).
Loss Function: I'm using the contrastive loss for training my Siamese network. Despite the above setup, it appears that my network isn't learning anything. The loss doesn't decrease significantly, and the performance metrics (e.g., accuracy) remain largely unchanged throughout the epochs.
I have already tried:
- Verifying the data preprocessing steps.
- Tweaking the network architecture (e.g., adding more layers, changing LSTM units).
- Experimenting with different learning rates. However, none of the above has led to any significant improvements.
Has anyone faced a similar issue or can provide insights on what might be going wrong? Are there specific challenges or considerations to be aware of when using hybrid embeddings in Siamese networks?
Thank you in advance for your help and suggestions!