I'm editing a large dictionary file and the term and definition pairs do not have a consistent format. Some words are "simple", some words include the base term plus some suffix to alter things like its gender, basically stacking two terms into one entry:
abacora (definition)
abacorar (definition)
abad, desa (definition)
This last term means "abad" and "abadesa" (feminine variant).
I've been trying to write the regular expression to capture this "peculiarity" but I can't seem to make it work. This matches the first part of the term fine, but fails to capture the second part:
^[^\s(?<!,)]+
It should return:
"abacora"
"abacorar"
"abad, desa"
I would use the following pattern, which should capture all leading words possibly including a CSV list:
This pattern says to match:
^from the start of the line\w+match a word(?:,\s*\w+)*optionally followed by a CSV list of other wordsDemo
Edit:
More generally, we can match on
[^,\s]+for a non whitespace, non comma, character, and use this pattern:Demo