Sed regexp Match only non-valid c++ identifier characters to rename a variable

Question

Sed regexp Match only non-valid c++ identifier characters to rename a variable

127 Views Asked by Ichwerdennauchsonst At 03 August 2023 at 17:08

I want to use sed to rename variable names (identifiers). I want to do it for c++ however for other languages it will be similar. Say we have a code sample like that here: example.cpp

int hi;
int bye;
...//a lot of code with many occurences of n

Assumed for any reason I want to rename hi in hello. The problem is hi can occur as a part of other words. In C++ a valid identifiers has the following receipt :[[[:alpha:]]_]+[[[:alnum:]]_] (Putting extended characters like ä or 龍 aside. I do not know if alnum includes these but if they are no problem expect extended punctuation characters maybe but who uses them)

There must be a character not pertaining to this expression next to a valid identifier to distinguish it from other identifiers. So before and behind n an [[[:alnum:]]_] is not allowed while any other character may. Another problem are string in "". This all only works if strings are always on-liners. Then we must check for odd occurences of unescpaped " and it may be a mathematical issue if we can do this with regular expressions however I did not come to this point trying this the first time without string recognising:

sed -i -e 'hi/\([^[[:alnum:]]_]\)hello\([^[[:alnum:]]_]\)/\1r\2/g' example.cpp

It doesnt changed anything

Original Q&A

There are 1 best solutions below

**stevesliva** · Answer 1 · 2023-08-08T03:16:12.080000

Your sed is garbled -- there's no s/// substitution.

Anyways all that you need are word boundaries (\b) in the match side of the substitution:

sed 's/\bhi\b/hello/' example.cpp

Above does almost the same as this:

sed -E 's/([^[:alnum:]_])hi([^[:alnum:]_])/\1hello\2/' example.cpp

... except that above depends upon the match groups being nonzero size.

More discussion of word boundary here.

Note also that your character classes have more square brackets than needed. The negation of [[:alnum:]] is [^[:alnum:]], so your non-word character class should be [^[:alnum:]_]. And that is equivalent to \W in extended regexp (ERE), so you can also do this with sed -E:

sed -E 's/(\W)hi(\W)/\1hello\2/' example.cpp

... again with the caveat that hi has to have a nonword character before or after (which is maybe a safe assumption for a C variable).

To fix that, you can add the line beginning ^ and end $ cases to this, too, which allows a zero-size match in those cases:

sed -E 's/(^|\W)hi(\W|$)/\1hello\2/' example.cpp

(Above likely works perfectly well, same as sed 's/\bhi\b/hello/')

Or you can use perl regex (PCRE) to make the match groups nonconsuming lookbehind (?<=) and lookahead (?=):

perl -pe 's/(?<=\W)hi(?=\W)/hello/' example.cpp

Same as this, inverting the char groups and negating the lookbehind and lookahead:

perl -pe 's/(?<!\w)hi(?!\w)/hello/' example.cpp

As you climb the scale of GNU regex feature set, you could test the matching for all with grep:

$ grep --color '\bhi\b' example.cpp
$ grep -E --color '(^|\W)hi(\W|$)' example.cpp
$ grep -P --color '(?<!\w)hi(?!\w)' example.cpp

... so you will see hi highlighted in color using basic, extended (ERE), and perl (PCRE) regex, all supported by grep. (The ERE above also highlights the nonword chars, if any, before or after)

But all regexp styles support the always-convenient zero-size match of \b for word boundaries. So, use it.

Sed regexp Match only non-valid c++ identifier characters to rename a variable

There are 1 best solutions below

Related Questions in REGEX

Related Questions in SED

Related Questions in IDENTIFIER

Trending Questions

Popular # Hahtags

Popular Questions