What value to set for max_len in pad sequences?

684 Views Asked by At

Does the value of max_len in pad sequences for deep learning depend upon the use case? Suppose if it was a Twitter related classification, should the value be set to 280 (280 is the maximum length of characters in tweets)?

1

There are 1 best solutions below

4
Soroush Mirzaei On BEST ANSWER

Absolutely not, After you converted texts into sequences by tokenizer which had been fitted on list of tweets, you could iterate over these sequences to derive the length of seqeunces.

the max_len parameter in pad_sqeuences function refer to the maximum length of the sequence, so it won't mean the length of a tweet based on its characters, but also it means the length of sequence.

and after that, you don't need to set it the maximum length of the tweets sequences, even you could set it lower than that. but notice by this approach, it would be better to remove stopwords and filter characters before you fit tokenizer on the list of tweets.