I have a short string s (max 8 characters) which I want to search in many strings. Actually I want to search the first occurrence in each string of a stream. Finding the first index of s has to be as fast as possible for my use-case, because a massive amount of strings per second is processed and the latency is very critical. Of course, machines can be scaled, but a big point is to reduce costs (and latency).
In general I want to create a C (or C++) function which behaves like strstr, but for a fixed "needle". The needle is not known at compile-time, but only at runtime (at the startup). However, it's okay to generate code at runtime and compile it (or any other "expensive" initialization is fine). Once the needle is known it won't change anymore.
Another detail: The needle will be in almost every string of the input stream. So it's okay if the algorithm is slower for the case that the needle is not available (because that will almost never happen). Also maybe important: The input strings have always extra 64 byte allocated at the end (which might be helpful for SIMD operations).
I was actually surprised that strstr is already quite fast, but I guess there might be a more optimal algorithm for the case that the needle is does not change?
Thanks a lot
If your target handles unaligned reads gracefully, you could use this approach:
If the string is modifiable, has extra slack and its length is provided, you can remove the terminator test:
str8andmask8must be precomputed from the bytes of the needle string and according to the target endianness. For example, to search forHelloon a little endian machinestr8is0x6f6c6c6548andmask8is0xffffffffff.For short strings, this simplistic brute force approach might perform better than using a tailored Boyer Moore implementation, depending on your specific data: array and needle lengths and contents... You can start by comparing the performance with that of your standard library's
strstrfunction.Here is a benchmark for various string lengths:
Here are the results on my 2015 Mac x86_64 laptop:
This hack consistently improves performance by 15 to 30% on long strings and even more on shorter ones. I made a special case of 8 byte needles that could be adapted for 1, 2 and 4 byte needles too.