Why text_to_sequence from tokenizer is not encoding all my data?

48 Views Asked by Shrishanmathi At 04 September 2023 at 07:28

I'm trying to encode my input data, padded it and trained it using some Neural networks and then making a single instance prediction, while encoding the training data, text_to_sequence is working fine for e.g.: ['பார்வத', 'தேசம்', 'என்பது', 'இன்றைய', 'பூட்டான்', '.'] [8114, 1971, 21, 660, 8115, 1]

But while making single instance prediction , while i try to encode it in the same way it is encoding it in a unexpected way ['பின்பு', 'விந்தையாக', 'உருவகம்', 'ஆகிறது', '.'] [682, 1308, 1]

I want it to encode for all the data in the list.

//Define the new input sentence for prediction
input_list = ["பின்பு","விந்தையாக","உருவகம்","ஆகிறது","."]

input_list_joined=" ".join(input_list)

//Tokenize the new input sentence and encode it
input_encoded = word_tokenizer.texts_to_sequences([input_list])

I tried both the input_list and the input_list_joined, but the same result is produced.

Original Q&A

Why text_to_sequence from tokenizer is not encoding all my data?

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in ENCODING

Related Questions in NLP

Related Questions in WORD-EMBEDDING

Related Questions in TAMIL

Trending Questions

Popular # Hahtags

Popular Questions