Regex to match everything but two different elements

88 Views Asked by At

From this example text, I need to extract the city and the date, but since I am running this from an ITSM tool, I only have access to a regexReplace function, meaning I need a regex that matches everything but those two elements

First line of text

Second line I only need the last word before the dot London.

Here i only want the date 11-11-2023.

Goodbye

I was able to match everything but the date with [^\d-].

To match the city name I got this one: \w+(?=\.) but also matches the year (2023 in example above). Since the city word comes from a known list of 5-6 possible cities, something like London|Rome|Paris|Lima would work, but still need a way to "reverse" the match.

Desired output would be

London - 11-11-2023
2

There are 2 best solutions below

1
Chris Maurer On BEST ANSWER

Whenever I hear negation, I always think about using the Regex Replace function because it allows you to control exactly what should appear in the result string. To use it successfully, you must consume the entire string within your regex, because any part of the string that does not match the regex is preserved into the result string. In this case, you can use,

.*?(\S+\.)$|.+

with the replacement string $1 to include only the first capture group in the result. You must assert the /g, /m, and /s flags for this to work. I included the dot in the capture group so you could have a delimiter between occurrences. If you want something else as a delimiter you can do a replace on this result. Here is that regex from the website Regex101:

This regex makes few assumptions about the contents you want to capture. Specifically the dot must be at end-of-line, and it will capture back up to the previous whitespace, even if it's the only thing on the line, because newline counts as whitespace.

1
aabdulahad On

One method of performing this is to assume that only the second line has a period at the end of it and to grab the word before that.
Ideally a list of cities separated with | would be used if exhaustive or if the rule does not apply with regards to the period.
You can select the date easily as you have said.
Use two capturing groups with brackets.

Regex: /([A-Za-z]{1,})\.|(\d{2}-\d{2}-\d{4})/gm

Replace with: $1 - $2

(I am using the JS flavor as you haven't specified which language)