In between a HTML code:
...<div class="..."><a class="..." href="...">I need this String only</a></div>...
How do I write Regular Expression (for Rainmeter which uses Perl RegEx) such that:
-required string "I need this String only" is grouped to be extracted,
-the HTML link tag <a>...</a> might be
absent or present & can be present in between the required string and multiple times as well.
My attempt:
(?siU) <div class="...">.*[>]{0,1}(.*)[</a>]{0,1}</div>
where:
.*= captures every characters except newline{<a class ... "}
[>]{0,1}= accepts 0 or 1 times presence of > {upto >}
(.*)= captures my String
[</a>]{0,1}= accepts 0 or 1 times presence of </a>
this, of course, doesn't work as I want, This gives output with HTML linking preceding my string so my question is
How to write a better(and working) RegEx?
Even though I agree with the advice to use a real parser for this problem, this regular expression should solve your problem:
Logic:
<div ...>at the beginning and</div>at the end.<a ...>before the matched text arbitrarily many times</a>after the matched text arbitrarily many times<a ...>with[^<>]*in front of it. Using.*would also work, but then it would skip all text arbitrarily up to the last instance of<a ...>in your string.[^<>]*instead of.*to match non-tag text in a protected way, since literal<and>are not allowed.(?:...)to group without capturing. If that is not supported in your programming language, just use(...)instead, and adjust which match you use.Caveat: this won't be fully general but should work for your problem as described.