Problems with assertions, specifically the negative lookahead assertion 'x(?!y)'

76 Views Asked by At

So I have been practicing and reading about assertions, and ran into this problem:

const text14 = "123 456 7890";
const pattern12 = /\d+(?!0)/g;

let matches12 = text14.match(pattern12);
console.log(matches12);

the output is supposed to be ['123', '456'] Yet it isn't. its ['123', '456', '7890']

After tinkering with it a bit I realized that when I put a space on the assertion as well as on the string itself, it removed, yet only the 9.

const text14 = "123 456 789 0";
const pattern12 = /\d+(?! 0)/g;

let matches12 = text14.match(pattern12);
console.log(matches12);

Ouput:

['123', '456', '78', '0']

This made me believe that there is a different way in which assertion works with numbers. The desired outcome I've been trying to get is to turn the original "123 456 7890" into ['123', '456'] using the negative lookahead assertion: 'x(?!y)'.

3

There are 3 best solutions below

0
Domino On

The regular expression /\d+(?!0)/g will match all substrings that:

  • begin with a sequence of digits (as many as possible, at least one)
  • are not followed by a 0

The problem is that 0 is itself a digit. The regex keeps accepting digits including zeroes until it encounters a character that isn't a digit, and only then does it check that the character after that is not a zero. So the negative lookahead never comes into play.

You might be tempted to simply use negative lookbehind instead, like so:

"123 456 7890".match(/\d+(?<!0)/g); // ["123", "456", "789"]

But in such a case the regular expression will simply stop before the zero instead, and not discard the entire sequence as you wished. Instead, you should first match a sequence of digits that ends in a nonzero digit, then make sure there isn't another digit after that.

"123 456 7890".match(/\d*[1-9](?!\d)/g); // ["123", "456"]

Keep in mind that the way you write a regular expression can affect its performance. I would not expect this one to be very efficient. A more naïve approach would be to simply accept any sequence of digits and then filter the results with JavaScript:

"123 456 7890"
    .match(/\d+/g)                  // ["123, 456", "7890"]
    .filter(s => s.at(-1) !== "0"); // ["123, 456"]
3
Sly_cardinal On

No, there is no difference in how the regular expression engine assertion treats digits or other characters.

Your digit match is too "greedy": the \d+ is matching all of the digits (including 0) before it checks the negative lookahead (?!0).

So it does something like this:

  • Does 7890 match \d+? yes
  • Is 7890 followed by (?!0)? no (because there are no remaining digits)
  • Therefore 7890 is successfully matched.

You can try this out by going to: https://regex101.com/
Enter your regular expression and test strings, then choose the "Regex Debugger" from the left-hand sidebar menu (under "TOOLS").

regular-expressions.info is another great resource with a really good explanation of lookahead and lookbehind assertions.

There are a couple of alternative patterns that might do what you want.

\b[1-9]+(?!0)\b

  • exclude 0 from the digit match allows the negative lookahead will come into play
  • adding word boundary \b checks at the start and end allows it will match whole groups (avoiding partial matches like 78)
  • however, this will never match any group that contains 0 (which may not be what you want)

Results:

"123 456 7890" -> [123, 456]
"123 456 789" -> [123, 456, 789]
"1023 456 789" -> [456, 789]
"000 1230 0456" -> []

\b\d+(?<!0)\b

  • this uses a negative lookbehind assertion
  • "match every group of digits that doesn't end with 0"
  • this allows 0 at the start or middle of the group

Results:

"123 456 7890" -> [123, 456]
"123 456 789" -> [123, 456, 789]
"1023 456 789" -> [1023, 456, 789]
"000 1230 0456" -> [0456]

Note that consistent browser/engine support for negative lookbehind is only relatively new (at time of writing).

It's been available in Chrome since 2017-10, nodejs since 2018-03, Firefox since 2020-06, Safari since 2023-03.

0
The fourth bird On

The reason is that your pattern \d+(?!0) could be written as just \d+ as it matches any digits 1 or more times.

The lookahead at the end will always be true as it just matched all digits.


For your example data, you don't need any lookarounds.

\b[1-9](?:\d*[1-9])?\b

The pattern matches:

  • \b A word boundary to prevent a partial word match
  • [1-9] Match a digit 1-9
  • (?:\d*[1-9])? Optionally match optional digits and a single digit 1-9
  • \b A word boundary

See a regex demo

const text14 = "123 456 7890";
const pattern12 = /\b[1-9](?:\d*[1-9])?\b/g;
console.log(text14.match(pattern12));

You could make use of a lookbehind assertion if that is supported, but I would suggest to first use the word boundary before the assertion to prevent unnecessarily firing the lookbehind while backtracking if there is no match:

\b\d+\b(?<!0)

See another regex demo