tags, without returning the tags themselves, using regular expressions?

43 Views Asked by hengxin At 10 March 2023 at 02:35

I want to match the titles of h1 to h6 in an HTML file, without returning the h tags themselves, using regular expressions.

Consider the following piece of an HTML file. I want to match "Welcome to my Homepage", "SQL", "RegEx", but not "This is not a valid HTML" (which is surrounded by a pair of unmatched tags).

<body>
  <H1>Welcome to my Homepage</H1>
  Content is divided into two sections:<br/>
  <h2>SQL</h2>
  Information about SQL.
  <h2>RegEx</h2>
  Information about Regular Expressions.
  <h3>This is not a valid HTML</h4>
</body>

I use (?<=<[hH]([1-6])>).*?(?=<\/[hH]\1>) at regex101.com. However, it also mathes the numbers 1, 2 in the tags <H1> and <h2>.

How to fix it?

Original Q&A

There are 1 best solutions below

Bergi On 10 March 2023 at 02:54 BEST ANSWER

it also matches the numbers 1, 2 in the tags <H1> and <h2>.

Not really. The match itself captures only the content. The number comes from the capturing group in your lookbehind. You can just ignore that.

How to match only the titles between <h></h> tags, without returning the tags themselves, using regular expressions?

There are 1 best solutions below

Related Questions in REGEX

Related Questions in HTML-PARSING

Related Questions in REGEX-LOOKAROUNDS

Related Questions in NON-GREEDY

Trending Questions

Popular # Hahtags

Popular Questions