I would like to tokenize the following text :
"text": "king martin"
into
[k, ki, kin, king, i, in, ing, ng, g, m, ma, mar, mart, martin, ar, art, arti, artin, r, rt, rti, rtin, t, ti, tin, i, in, n]
But more especially into :
[kin, king, ing, mar, mart, martin, art, arti, artin, rti, rtin, tin]
It is a way to get these tokens? I have tried with the following tokenizer, but how to say :"Start at any start_offset ?"
"ngram_tokenizer": {
"type": "edge_ngram",
"min_gram": "3",
"max_gram": "15",
"token_chars": [
"letter",
"digit"
]
}
Thank you !
You can use the ngram tokenizer rather than edge_gram.