Regexp to remove small numbers and leave large ones

209 Views Asked by At

On text files with many paragraphs, sentences or phrases are numbered.  I use regexp in Perl to remove those numbers.  They are always followed by the first letter of the sentence/phrase or by a space.  But that also matches numbers legitimately part of the text.  If I could limit it to a string of one or two digits, not more, which does not contain a comma, I could manually delete the rare instances of an unwanted three-digit number, or reinsert the rare two-digit number that shouldn't have been deleted.

I haven't been able to figure out a regexp with those limitations.  How can that be done?

Example:

perl -p -i -e 's:(\D)\d{1,2}(\w):\1\2:g;
               s:\d+-\d+::g;
               s:^\d{1,2} ?::g;' {filenames}

removed the markers, but it also removed the digits from "337,000" leaving the comma.

1

There are 1 best solutions below

0
ikegami On

If I could limit it to a string of one or two digits, not more, which does not contain a comma

The following matches a sequence of one or two digits that's neither preceded nor followed by digits.

(?<!\d)\d{1,2}(?!\d)

(A sequence of one or two digits never contains a comma.)

Demo:

$ perl -pe's/(?<!\d)\d{1,2}(?!\d)/[$&]/g' <<'.'
xxxxxx
xxx1xxx
xxx12xxx
xxx123xxx
xxx1234xxx
xxx37,000xxx
xxx337,000xxx
.
xxxxxx
xxx[1]xxx
xxx[12]xxx
xxx123xxx
xxx1234xxx
xxx[37],000xxx
xxx337,000xxx