Xpath to find URLs that don't start with http/s

53 Views Asked by At

I am trying to write an Xpath to extract URLs used in both @href or @src attributes that are relative (URLs that don't start with http:// or https://).

I have used the below but it's not working:

//*[not(starts-with(@src, 'https:')) and not(starts-with(@href, 'https:'))]

Example node:

<script async="" src="//d.impactradius-event.com/A2421746-f56c-44ad-9e09-bcf28112e9951.js"></script>

I wish to pull src URL. Can someone please help? Thanks.

1

There are 1 best solutions below

2
zx485 On BEST ANSWER

You can try the following XPath-1.0 expression. It checks both attributes for both strings and then merges the output with the | operator.

//*[not(starts-with(@src, 'https:')) and not(starts-with(@src, 'http:'))]/@src | //*[not(starts-with(@href, 'https:')) and not(starts-with(@href, 'http:'))]/@href

This expression could be simplified with RegEx'es, but XPath-1.0 doesn't support this.