I have a text file containing more than 100k words, each separated by a newline in the file. I want to implement a function that would return the list of words containing a given substring. For example: If the substring is "coat", then it would return words like "coating", "raincoat", "raincoats" etc.
Is there an efficient algorithm to implement something like this, given that the list of words won't change. I was thinking of implementing a generalized suffix tree using Ukkonen's algorithm, but I can't find a proper implementation of that anywhere. Is there any better way to solve this apart from using suffix trees?
EDIT : The maximum length of each word can be 100 characters and it only has lowercase alphabets.
If you can afford the extra storage, you can generate all suffixes of all words and use a single sorted list (to perform binary searches), or a hash. This multiplies space by the average word length.
If this average exceeds 4, it can be preferable to store 32 bit pointers that index the suffixes in the original strings (assumed null-terminated). It is also possible to use 3-bytes integers, in a packed format.