Why text_to_sequence from tokenizer is not encoding all my data?

48 Views Asked by At

I'm trying to encode my input data, padded it and trained it using some Neural networks and then making a single instance prediction, while encoding the training data, text_to_sequence is working fine for e.g.: ['பார்வத', 'தேசம்', 'என்பது', 'இன்றைய', 'பூட்டான்', '.'] [8114, 1971, 21, 660, 8115, 1]

But while making single instance prediction , while i try to encode it in the same way it is encoding it in a unexpected way ['பின்பு', 'விந்தையாக', 'உருவகம்', 'ஆகிறது', '.'] [682, 1308, 1]

I want it to encode for all the data in the list.

//Define the new input sentence for prediction
input_list = ["பின்பு","விந்தையாக","உருவகம்","ஆகிறது","."]

input_list_joined=" ".join(input_list)

//Tokenize the new input sentence and encode it
input_encoded = word_tokenizer.texts_to_sequences([input_list])

I tried both the input_list and the input_list_joined, but the same result is produced.

0

There are 0 best solutions below