Want to match a string exactly, despite variants, and remove only that string

173 Views Asked by At

I have the need to remove a specific, exact string from a file. This is being utilized as part of a clean-up process that I'm implementing. The problem is, there are variants that are similar to, but not exactly the same as the specially exact string that I want to remove.

For example, here is a sample of the file "sample":

tmp2
tmp3
tmp0
tmp1
tmp3
tmp3
tmp3
tmp1.1
tmp3
tmp2
tmp3
tmp1.2
tmp4

I want to remove only "tmp1", not "tmp1.1" or "tmp1.2".

I am using a single-lined Perl command:

perl -i -nle 'print if !/tmp1/' ./sample

Obviously, the single-lined script isn't cutting. Sure, it's removing "tmp1", but, it's Aalso removing "tmp1.1" and "tmp1.2" as well.

Any ideas?

5

There are 5 best solutions below

5
Bork On BEST ANSWER

Use anchors. ^ for beginning of line, and $ for end of line.

$ perl -i -nle 'print if !/^tmp1$/' ./sample
7
Ed Morton On

Using any awk in any shell on every Unix box, here's a full-line string comparison that'll remove the line that matches that string:

$ awk '$0 != "tmp1"' sample
tmp2
tmp3
tmp0
tmp3
tmp3
tmp3
tmp1.1
tmp3
tmp2
tmp3
tmp1.2
tmp4

or using a variable:

$ awk -v str='tmp1' '$0 != str' sample
tmp2
tmp3
tmp0
tmp3
tmp3
tmp3
tmp1.1
tmp3
tmp2
tmp3
tmp1.2
tmp4

See How do I use shell variables in an awk script? for more info on that.

Note that the above is doing a literal string comparison so it'll work even if your target string contains regexp metacharacters, e.g.:

$ cat file
foo.bar1
foo.bar
foo bar

$ awk '$0 != "foo.bar"' file
foo.bar1
foo bar
0
Naval On

You can try with python too apart from shell and perl based commands.

data = ["temp1", "temp1.1", "temp1.2", "temp2", "another_temp1.1"]
filtered_data = [item for item in data if "temp1" not in item]
print(filtered_data)`

Output will be: ['temp1.1', 'temp1.2', 'other_temp', 'another_temp1.1']

0
jubilatious1 On

Using Raku (formerly known as Perl_6)

via Raku's m/…/ match operator:

~$ raku -ne '.put unless m/^ tmp1 $/;' sample.txt > tmp

#OR:

~$ raku -e 'for lines() {.put unless m/^ tmp1 $/};' sample.txt > tmp

via Raku's grep:

~$ raku -e 'given lines() {.grep(none /^ tmp1 $/).join("\n").put };' sample.txt > tmp

via Raku's S/// "big-S" substitution operator:

~$ raku -e 'for lines.join("\n") {S:g/ [[^ | ^^] tmp1 $$ \n] | [\n ^^ tmp1 $] //.put};'   sample.txt > tmp

Raku is a programming language in the Perl-family that provides high-level support for Unicode, built-in. Above are four answers showing that like Perl itself--TMTOWTDI applies.

As mentioned in other answers, the key here is using zero-width anchors such as: ^ start-of-string, $ end-of-string, ^^ start-of-line, $$ end-of-line. More Regex advice at the bottom links.

Sample Input:

tmp2
tmp3
tmp0
tmp1
tmp3
tmp3
tmp3
tmp1.1
tmp3
tmp2
tmp3
tmp1.2
tmp4

Sample Output:

tmp2
tmp3
tmp0
tmp3
tmp3
tmp3
tmp1.1
tmp3
tmp2
tmp3
tmp1.2
tmp4

Note: the code examples above work even if the target string occupies a final line not terminated by a \n newline. Since POSIX defines a line as \n newline terminated, you can correct the final line by using slurp() instead of lines.join("\n") in the final answer above.

https://docs.raku.org/language/regexes
https://docs.raku.org/language/regexes-best-practices
https://raku.org

0
Daweo On

If possible I would GNU sed for that following way, let file.txt content be

tmp2
tmp3
tmp0
tmp1
tmp3
tmp3
tmp3
tmp1.1
tmp3
tmp2
tmp3
tmp1.2
tmp4

then

sed '/^tmp1$/d' file.txt

gives output

tmp2
tmp3
tmp0
tmp3
tmp3
tmp3
tmp1.1
tmp3
tmp2
tmp3
tmp1.2
tmp4

Explanation: If line contains startofline followed by tmp1 followed by endofline then delete it (go to next line), otherwise apply default action of printing.

(tested in GNU sed 4.8)

If your rules of engagement coerce you to use perl AT ANY PRICE then you might do

perl -p -0777 -e 's/(?<![^\n])tmp1\n//g' file.txt

output is same

(tested in perl 5, version 34, subversion 0)

Explanation: -p -e engage sed mode, -0777 engage slurp mode (treat whole file as one line), substitute tmp1\n if it is not behind non-newline character, this double negation is not equivalent to behind newline as former will properly delete tmp1 in 1st line. DISCLAIMER perl solution assumes last character of your file is newline, if this does not hold do not use it.