I am trying to search for a regex with lookahead its not working in pcregrep or grep
I want to search for bits of sections
- which may span over multiple lines,
- which start with PQXY at the beginning of a line and
- end with OFEJ at the end of the line and
- does not contain either PQXY or OFEJ in between
Generall i use the following in sublime text find and works well
(?s)(^PQXY(?:(?!PQXY|OFEJ).)*OFEJ\n)
Now i want to find the count of such occurences so i am trying to use grep or pcergrep, both are not working.
pcregrep -c "(?s)(^PQXY(?:(?!PQXY|OFEJ).)*OFEJ\n)" file.txt
zsh: event not found: PQXY|OFEJ).)
and with grep
$ grep -c -zoP "(?s)(^PQXY(?:(?!PQXY|OFEJTRANS).)*OFEJTRANS\n)" CB_raw_testing_21_feb_CORRECTIONS_0002.txt
zsh: event not found: PQXY|OFEJTRANS).)
How can i do this
Answer based on @paxdiablo and @anubha.
The main error was the single quotes as addressed by @paxdiablo
$ pcregrep -c -M '(^PQXY(?:(?!PQXY|OFEJ).)*OFEJ\n)' file.txt
0
The regex solution is to add (?s) based on @anubha. Ofcourse \n also works instead of (\R|\z)
$ pcregrep -c -M '(?s)(^PQXY(?:(?!PQXY|OFEJ).)*OFEJ\n)' file.txt
11726
Since this is
zshraising the error, it's almost certainly because it's trying to process the stuff within the double quotes. To protect it from that, you should use single quotes, such as:I don't have
pcregrepinstalled but here's a transcript showing the problem with justecho:In terms of solving the problem rather than using a specific tool, I would actually opt for
awk(a) in this case. You can do something like:This works by using a string and flag to control lines collected and state, initially they are an empty string and zero.
Then, for each line:
OFEJand you're collecting, output the collected section and stop collecting, then go to next input line.I've tested this with some limited test data and it seems to work okay. Here's the
bashscript(b) I used for testing, you can add as many test cases as you need to be comfortable it solves your problem.Here's the output so you can see it in action:
(a) In my experience, if you've tried three things with a
grep-style regex without success, it's usually faster to move to a more advanced tool :-)(b) Yes, I know it's written in
bashrather thanzshbut that's because:awkworks, hence the language used is irrelevant; andbashtahnzsh:-)