Reading text between multiple newline characters and whitespaces using regex

59 Views Asked by At

I'm trying to read these underlined headings using regex.

These headings have more than two newline characters and more than two whitespace characters before the start of the heading. It has ONE whitespace and two newline characters after the heading. The heading is in all CAPITAL letters.

I tried with r"(\n{2,}\s{2,})(?:([A-Z]+)\s([A-Z]*))" but it did not work.

enter image description here

Any help is greatly appreciated! Thanks in advance.

1

There are 1 best solutions below

0
ChrisFreeman On BEST ANSWER

This appears to work.

print(re.findall(r'\n{2,}\s{2,}([A-Z\s]+)\s\n', data, re.X))

based on the snippet above, returns:

['ROBOT ', 'TRAFFIC LIGHT ', 'TRAFFIC LIGHT ']