I want to use sed to rename variable names (identifiers). I want to do it for c++ however for other languages it will be similar. Say we have a code sample like that here: example.cpp
int hi;
int bye;
...//a lot of code with many occurences of n
Assumed for any reason I want to rename hi in hello. The problem is hi can occur as a part of other words. In C++ a valid identifiers has the following receipt :[[[:alpha:]]_]+[[[:alnum:]]_]
(Putting extended characters like ä or 龍 aside. I do not know if alnum includes these but if they are no problem expect extended punctuation characters maybe but who uses them)
There must be a character not pertaining to this expression next to a valid identifier to distinguish it from other identifiers. So before and behind n an [[[:alnum:]]_] is not allowed while any other character may. Another problem are string in "". This all only works if strings are always on-liners. Then we must check for odd occurences of unescpaped " and it may be a mathematical issue if we can do this with regular expressions however I did not come to this point trying this the first time without string recognising:
sed -i -e 'hi/\([^[[:alnum:]]_]\)hello\([^[[:alnum:]]_]\)/\1r\2/g' example.cpp
It doesnt changed anything
Your sed is garbled -- there's no
s///substitution.Anyways all that you need are word boundaries (
\b) in the match side of the substitution:Above does almost the same as this:
... except that above depends upon the match groups being nonzero size.
More discussion of word boundary here.
Note also that your character classes have more square brackets than needed. The negation of
[[:alnum:]]is[^[:alnum:]], so your non-word character class should be[^[:alnum:]_]. And that is equivalent to\Win extended regexp (ERE), so you can also do this withsed -E:... again with the caveat that
hihas to have a nonword character before or after (which is maybe a safe assumption for a C variable).To fix that, you can add the line beginning
^and end$cases to this, too, which allows a zero-size match in those cases:(Above likely works perfectly well, same as
sed 's/\bhi\b/hello/')Or you can use perl regex (PCRE) to make the match groups nonconsuming lookbehind
(?<=)and lookahead(?=):Same as this, inverting the char groups and negating the lookbehind and lookahead:
As you climb the scale of GNU regex feature set, you could test the matching for all with grep:
... so you will see
hihighlighted in color using basic, extended (ERE), and perl (PCRE) regex, all supported by grep. (The ERE above also highlights the nonword chars, if any, before or after)But all regexp styles support the always-convenient zero-size match of
\bfor word boundaries. So, use it.