I'm running into multiple issues with the following code:
- When I'm running this code on a file containing multiple URLs per line, the regex
whileloop runs indefinitely. - On a file containing only one HTTP URL on each line, valgrind complains that
matches[count] = malloc(sizeof(char) * match_length);andmatches[count][match_length] = 0;are Invalid size 1 write. I think my allocation code sucks (sizeof(matches) / sizeof(matches[0])after mywhileloops compute to1). - I thought
strncpywould add a null byte at the end of my string automatically but it appears it doesn't - Might be related to 3. : After those
whileloops, if I print each element of the array, they contains a newline, which shouldn't be the case, only way to get rid of those is to runmatches[count][match_length] = 0;
I think I misunderstood lots of part about regex, strncpy and malloc. I haven't done C programming since college and wanted to get back at it.
char **matches = (char **)malloc(sizeof(char *));
char *line = NULL;
size_t l = 0, count = 0, size = 1;
ssize_t read = 0;
regex_t rgx;
if (regcomp(&rgx, "((gopher|https?)://[[:alnum:][:punct:]]+)", REG_ICASE | REG_EXTENDED)) {
return 0;
}
while ((read = getline(&line, &l, file)) != -1) {
regmatch_t match = { 0, 0 };
int match_length, error;
if ((error = regexec(&rgx, line + match.rm_eo, 1, &match, 0)) != 0)
continue;
do {
printf("Error: %d, Line: %s\n", error, line + match.rm_eo);
match_length = match.rm_eo - match.rm_so;
matches[count] = malloc(sizeof(char) * match_length);
strncpy(matches[count], line + match.rm_so, match_length);
matches[count][match_length] = 0;
if (++count == size) {
size *= 2;
matches = realloc(matches, sizeof(char *) * size);
if (!matches) {
regfree(&rgx);
free(line);
return 0;
}
}
} while ((error = regexec(&rgx, line + match.rm_eo + 1, 1, &match, REG_NOTBOL)) == 0);
}
There are multiple problems:
you should allocate one more byte in
matches[count] = malloc(sizeof(char) * match_length);for the null terminator.strncpydoes not set a null byte at the end of the destination array if the source string has a length greater or equal to the third length argument. This function semantics are confusing and error prone, learn why you should stop usingstrncpyalready!instead of
strncpy(matches[count], line + match.rm_so, match_length);you should use:Note that allocating and copying can be achieved with:
matches[count][match_length] = 0;is causing the valgrind error message as you write beyond the allocation length ofmatch_length.matches = realloc(matches, sizeof(char *) * size);causes a memory leak ifreallocfails: the original pointermatchesis lost and the memory it pointed to is still allocated.you can strip the newline at the end of
linebefore applying the regex matcher.the positions stored in the
matchstructure are relative to the beginning of the string argument toregexec, but you passline + match.rm_soandline + match.rm_eo + 1so the offsets are not relative to thelinearray and cannot be used.after the loop, you cannot use
(sizeof(matches) / sizeof(matches[0])to determine the number of entries in the array. Bothmatchesandmatches[0]are pointers, hence the value1. You must passcountback to the caller along with thematchespointer.Here is a modified version: