I have an array of constant strings which I iterate through to find an index of element which is a string that contains a search pattern. Which search algorithm should I choose to improve the speed of finding this element? I am not limited in time before running the application for preparing the look up tables if any are necessary.
I corrected a question - I am not doing exact string match - I am searching for pattern inside the element, which is in an array
array of strings:
[0] Red fox jumps over the fence
[1] Blue table
[2] Red flowers on the fence
I need to find an element which contains word 'table' - in this case its element 1
I do like 50000 iterations of a set of 30 array which could contain up to 30000 strings of not less than 128 characters. Now I am using good-old strstr brute force which is too slow...
Ok, posting a part of my function, the first strstr - looks up in an uncut array of lines if there are any occurrences, then the brute search follows. I know I can speed this part, but I am not doing optimization on this approach...
// sheets[i].buffer - contains a page of a text which is split into lines
// fullfunccall - is the pattern
// sheets[i].cells[k] - are the pointers to lines in a buffer
for (i = 0; i < sheetcount; i++) {
if (i != currsheet && sheets[i].name && sheets[i].name[0] != '\0') {
if (strstr(sheets[i].buffer, fullfunccall)) {
usedexternally = 1;
int foundatleastone = 0;
for (k = 0; k < sheets[i].numcells; k++) {
strncpy_s(testline, MAX_LINE_SIZE,
sheets[i].cells[k]->line,
sheets[i].cells[k]->linesize);
testline[sheets[i].cells[k]->linesize] = '\0';
if (strstr(testline, fullfunccall)) {
dependency_num++;
if (dependency_num >= MAX_CELL_DEPENDENCIES-1) {
printf("allocation for sheet cell dependencies is insufficient\n");
return;
}
sheets[currsheet].cells[currcellposinsheet]->numdeps = dependency_num + 1;
foundatleastone++;
sheets[currsheet].cells[currcellposinsheet]->deps[dependency_num] = &sheets[i].cells[k];
}
}
if (foundatleastone == 0) {
printf("error locating dependency for external func: %s\n", fullfunccall);
return;
}
}
};
}
I believe the most practical would be DFA as it reads every character of input at most once - more precisely it reads every input char once and stops as soon as pattern will not match definitely (if set up properly). With DFA you can also check against multiple patterns simultaneously.
Two good (but different) implementations of DFA algorithms well tested in practice are
It's not possible say which fits your task unless you provide more on that.
edit: DFA stays for "Deterministic Finite Automata"
edit: as you indicated your patterns are exact substrings the most common solution is KMP algorithm (Knuth-Morris-Pratt)