What is the equivalent regex expression \b written using ^ and $?

83 Views Asked by At

How can I rewrite my anchor to be more general and correct in all situations? I have understood that using \b as an anchor is not optimal because it is implementation-dependent.

My goal is to match some type of word in a text file. For my question, the word to match is not of importance.

Assume \b is the word boundary anchor and a word character is [a-zA-Z0-9_] I constructed two anchors, one for the left and one for the right side of the regex. Notice how I handle the underscore, as I don't want it to be a word character when I read my text file.

  • (?<=\b|_) positive lookbehind
  • (?=\b|_) positive lookahead

What would be the equivalent anchor constructs but using the more general caret ^ and $ dollar sign to get the same effect?

2

There are 2 best solutions below

4
ikegami On BEST ANSWER

[The OP did not specify which regex language they are using. This answer uses Perl's regex language, but the final solution should be easy to translate into other languages. Also, I use whitespace as if the x flag was provided, but that is also easily adjusted.]


With the help of a comment made by the OP, the following is my understanding of the question:

I have something like \b\w+\b, but I want to exclude _ the definition of a word.

You can use the following:

(?<! [^\W_] ) [^\W_]+ (?! [^\W_] )

An explanation follows.


\b is equivalent to (?: (?<!\w)(?=\w) | (?<=\w)(?!\w) ).

\b \w+ \b is therefore equivalent to (?<!\w) \w+ (?!\w) (after simplification).

So now we just need a pattern that matches everything \w matches but _. There are a few approaches that can be taken.

  • Set difference: (?[ \w - [_] ])
  • Look-ahead: (?!_)\w
  • Look-behind: \w(?<!_)
  • Double negation: [^\W_]

Even though it's the least readable, I'm going to use the last one since it's the best supported.

We now have

(?<! [^\W_] ) [^\W_]+ (?! [^\W_] )
4
Barmar On

You can match a non-word character or the beginning/end anchor:

(?:^|\W)(\w+)(?:\W|$)

If you want to select something other than a single word, replace \w+ with the pattern you're looking for. Capture group 1 will contain what you're looking for.