Regex negative look behind does not work as intended, how do I make it work

115 Views Asked by At

I am not trying to accomplish anything except understand why this specific regex does not work as intended:

/\bty(?!t)\b/i

Intended match:

  • any string starting with 'ty' and not ending with 't'

From what I understood about negative lookbehind, it should match something not followed by something, and here i want to match tyX (X can be any character) as long as X is not 't'.

Should match:

tya

Should not match:

tyt

Using a negated set solves this easily, but I don't understand why a negative lookahead doesn't work.

2

There are 2 best solutions below

0
trincot On BEST ANSWER

From what I understood about negative lookbehind, it should match something not followed by something, and here i want to match tyX (X can be any character) as long as X is not 't'.

  • (?!t) is not a negative look behind, but a negative look ahead. You got that right further on in your question.
  • (?!t) asserts that there is no t at this position, without moving the current position.
  • \bty(?!t)\b cannot match tyX, as your regex requires a \b immediately after ty. There is no provision for a third character in this pattern.
  • The ending \b asserts that the next character is not an alphanumerical, which already excludes "t", and so your regex is actually doing the same as \bty\b and so only the word ty can be matched (case insensitive).

To provide for longer words, you should add \w* after ty. To exclude words ending in a "t", you could use a negative look behind so to assert that the last character matched is not a "t":

\bty\w*(?<!t)\b

This will match all of the following:

ty
typ
type
typical
tyranny
typesetting

It will not match:

typeset
tyrant
typist
typescript
typologist
0
Andrei Odegov On

Your problem can be solved without using of negative look behind.

\bty(?:[a-z]*[a-su-z])?\b

The regex can be broken down as follows.

\b         the boundary between a word char (\w) and
           something that is not a word char
-----------------------------------------------------
ty         'ty'
-----------------------------------------------------
(?:        group, but do not capture (optional
           (matching the most amount possible)):
-----------------------------------------------------
  [a-z]*     any character of: 'a' to 'z' (0 or more
             times (matching the most amount
             possible))
-----------------------------------------------------
  [a-su-z]   any character of: 'a' to 's', 'u' to 'z'
-----------------------------------------------------
)?         end of grouping
-----------------------------------------------------
\b         the boundary between a word char (\w) and
           something that is not a word char