Substituting multiple occurrences of a character inside a grep match

276 Views Asked by danieleghisi At 04 October 2018 at 23:04

I am trying to use TextWrangler to take a bunch of text files, match everything within some angle-bracket tags (so far so good), and for every match, substitute all occurrences of a specific character with another.

For instance, I'd like to take something like

xx+xx <f>bar+bar+fo+bar+fe</f> yy+y <f>fee+bar</f> zz

match everything within <f> and </f> and then substitute all +'s with, say, *'s (but ONLY inside the "f" tag).

xx+xx <f>bar*bar*fo*bar*fe</f> yy+y <f>fee*bar</f> zz

I think I can easily match "f" tags containing +'s with an expression like

<f>[^<]*\+[^<]*</f>

but I have no idea on how to substitute only a subclass of character for each match. I don't know a priori how many +'s there are in each tag. I think I should run a regular expression for all matches of the first regular expression, but I am not really sure how to do that.

(In other words, I would like to match all +'s but only inside specific angle-bracket tags).

Does anyone have a hint?

Thanks a lot, Daniele

Original Q&A

There are 1 best solutions below

Ed Morton On 05 October 2018 at 00:05 BEST ANSWER

In case you're OK with an awk solution:

$ awk '{
    while ( match($0,/<f>[^<]*\+[^<]*<\/f>/) ) {
        tgt = substr($0,RSTART,RLENGTH)
        gsub(/\+/,"*",tgt)
        $0 = substr($0,1,RSTART-1) tgt substr($0,RSTART+RLENGTH)
    }
    print
}' file
xx+xx <f>bar*bar*fo*bar*fe</f> yy+y <f>fee*bar</f> zz

The above will work using any awk in any shell on any UNIX box. It relies on there being no < within each <f>...</f> as indicated by your sample code. If there can be then include that in your example and we can tweak the script to handle it:

$ awk '{
    gsub("</f>",RS)
    while ( match($0,/<f>[^\n]*\+[^\n]*\n/) ) {
        tgt = substr($0,RSTART,RLENGTH)
        gsub(/\+/,"*",tgt)
        $0 = substr($0,1,RSTART-1) tgt substr($0,RSTART+RLENGTH)
    }
    gsub(RS,"</f>")
    print
}' file
xx+xx <f>bar*bar*fo*bar*fe</f> yy+y <f>fee*bar</f> zz

Substituting multiple occurrences of a character inside a grep match

There are 1 best solutions below

Related Questions in REGEX

Related Questions in GREP

Related Questions in TEXTWRANGLER

Trending Questions

Popular # Hahtags

Popular Questions