Problems with assertions, specifically the negative lookahead assertion 'x(?!y)'

Question

Problems with assertions, specifically the negative lookahead assertion 'x(?!y)'

76 Views Asked by yzkael At 18 December 2023 at 23:19

So I have been practicing and reading about assertions, and ran into this problem:

const text14 = "123 456 7890";
const pattern12 = /\d+(?!0)/g;

let matches12 = text14.match(pattern12);
console.log(matches12);

the output is supposed to be ['123', '456'] Yet it isn't. its ['123', '456', '7890']

After tinkering with it a bit I realized that when I put a space on the assertion as well as on the string itself, it removed, yet only the 9.

const text14 = "123 456 789 0";
const pattern12 = /\d+(?! 0)/g;

let matches12 = text14.match(pattern12);
console.log(matches12);

Ouput:

['123', '456', '78', '0']

This made me believe that there is a different way in which assertion works with numbers. The desired outcome I've been trying to get is to turn the original "123 456 7890" into ['123', '456'] using the negative lookahead assertion: 'x(?!y)'.

Original Q&A

There are 3 best solutions below

**Domino** · Answer 1 · 2023-12-19T00:13:35.473000

The regular expression /\d+(?!0)/g will match all substrings that:

begin with a sequence of digits (as many as possible, at least one)
are not followed by a 0

The problem is that 0 is itself a digit. The regex keeps accepting digits including zeroes until it encounters a character that isn't a digit, and only then does it check that the character after that is not a zero. So the negative lookahead never comes into play.

You might be tempted to simply use negative lookbehind instead, like so:

"123 456 7890".match(/\d+(?<!0)/g); // ["123", "456", "789"]

But in such a case the regular expression will simply stop before the zero instead, and not discard the entire sequence as you wished. Instead, you should first match a sequence of digits that ends in a nonzero digit, then make sure there isn't another digit after that.

"123 456 7890".match(/\d*[1-9](?!\d)/g); // ["123", "456"]

Keep in mind that the way you write a regular expression can affect its performance. I would not expect this one to be very efficient. A more naïve approach would be to simply accept any sequence of digits and then filter the results with JavaScript:

"123 456 7890"
    .match(/\d+/g)                  // ["123, 456", "7890"]
    .filter(s => s.at(-1) !== "0"); // ["123, 456"]

**Sly_cardinal** · Answer 2 · 2023-12-19T00:14:33.243000

No, there is no difference in how the regular expression engine assertion treats digits or other characters.

Your digit match is too "greedy": the \d+ is matching all of the digits (including 0) before it checks the negative lookahead (?!0).

So it does something like this:

Does 7890 match \d+? yes
Is 7890 followed by (?!0)? no (because there are no remaining digits)
Therefore 7890 is successfully matched.

You can try this out by going to: https://regex101.com/
Enter your regular expression and test strings, then choose the "Regex Debugger" from the left-hand sidebar menu (under "TOOLS").

regular-expressions.info is another great resource with a really good explanation of lookahead and lookbehind assertions.

There are a couple of alternative patterns that might do what you want.

\b[1-9]+(?!0)\b

exclude 0 from the digit match allows the negative lookahead will come into play
adding word boundary \b checks at the start and end allows it will match whole groups (avoiding partial matches like 78)
however, this will never match any group that contains 0 (which may not be what you want)

Results:

"123 456 7890" -> [123, 456]
"123 456 789" -> [123, 456, 789]
"1023 456 789" -> [456, 789]
"000 1230 0456" -> []

\b\d+(?<!0)\b

this uses a negative lookbehind assertion
"match every group of digits that doesn't end with 0"
this allows 0 at the start or middle of the group

Results:

"123 456 7890" -> [123, 456]
"123 456 789" -> [123, 456, 789]
"1023 456 789" -> [1023, 456, 789]
"000 1230 0456" -> [0456]

Note that consistent browser/engine support for negative lookbehind is only relatively new (at time of writing).

It's been available in Chrome since 2017-10, nodejs since 2018-03, Firefox since 2020-06, Safari since 2023-03.

**The fourth bird** · Answer 3 · 2023-12-19T09:17:53.890000

The reason is that your pattern \d+(?!0) could be written as just \d+ as it matches any digits 1 or more times.

The lookahead at the end will always be true as it just matched all digits.

For your example data, you don't need any lookarounds.

\b[1-9](?:\d*[1-9])?\b

The pattern matches:

\b A word boundary to prevent a partial word match
[1-9] Match a digit 1-9
(?:\d*[1-9])? Optionally match optional digits and a single digit 1-9
\b A word boundary

See a regex demo

const text14 = "123 456 7890";
const pattern12 = /\b[1-9](?:\d*[1-9])?\b/g;
console.log(text14.match(pattern12));

You could make use of a lookbehind assertion if that is supported, but I would suggest to first use the word boundary before the assertion to prevent unnecessarily firing the lookbehind while backtracking if there is no match:

\b\d+\b(?<!0)

See another regex demo

Problems with assertions, specifically the negative lookahead assertion 'x(?!y)'

There are 3 best solutions below

Related Questions in JAVASCRIPT

Related Questions in REGEX

Related Questions in STRING

Related Questions in REGEX-LOOKAROUNDS

Related Questions in ASSERTION

Trending Questions

Popular # Hahtags

Popular Questions