I'm saving an offline copy of a website and using WinHttrack to do it. On each html page, some of the images, pdf's, etc. are hosted on CDN's and those images load using query string tokens, id's and most annoyingly a unix timestamp expiration. Imagine something like this:
<img src="https://my.image.cdn.com/wp-content/uploads/a-really-cool-image.jpg?Expires=1643140853&Signature={token}&Key-Pair-Id={id}" />
This url is only valid for a few minutes after the page renders. Trying to GET this image say 5 minutes later will result "Access Denied".
In Httrack, I assume the url to these images are captured early in the process when the page is first crawled and by the time Httrack tries to actually download this image file, the result is an "Access Denied" xml response.
My first test was to just find all these Access Denied files, manipulate the Expire timestamp and download again. However, the expiration timestamp appears to be tied together with the other query string values and validated.
Is there a way to force Httrack to complete one html page, download ALL images/pdf's on that page before moving onto the next one? Or perhaps, is there a way to force httrack to download those images immediately after downloading the html to capture them before the URL expires? Or even initiate a mirror that JUST downloads images/PDF's in such a way I can merge into the full mirror?