Prevent DOMDocument from closing tags

306 Views Asked by At

I need to process multiple files that are very old SHTML files that have been written using some malform HTML tags.

As an example, a given page will follow this structure

<!--#include virtual="../includes/header.shtml"-->

<title>Welcome</title>
<div class="fudgeLeft">
    <div class="mainContent">
        <link rel="stylesheet" href="../css/style.css">
        <img src="hockeyflag.jpg" alt="">
        <p>text
        <p>text
        <p>more text
    </div>

<!--#include virtual="../includes/footer.shtml"-->
  • The header.shtml includes the opening tags of an HTML document up to and including the <body> tag.
  • The footer.shtml includes the closing </div>s, </body>, and </html>.
  • Notice that each tag between the header and footer appears on different line and some tags are not closed properly.

[I honestly don't know what the original developer was thinking (or smoking) when he structured these pages.]

Anyways, I have written a script that scrubs these pages using DOMDocument, converts one specific tag, and saves the updated document as a new file.

The problem I am having is that the newly-created file has changed more than it should.

<!--#include virtual="../includes/header.shtml"--><title>Welcome</title><div class="fudgeLeft">

<div class="mainContent">
    <link rel="stylesheet" href="../css/style.css" />
    <img src="hockeyflag.jpg" alt="" />
    <p>text</p>
    <p>text</p>
    <p>more text</p>
</div>

<!--#include virtual="../includes/footer.shtml"--></div>
  • Notice now that some lines have been glued (not a big deal) but the tags have been closed. As well, one of the closing tags comes after the footer.

So my question is there a way to configure DOMDocument to leave the malform HTML as-is? My goal is to only change the one tag but keep the ugly document as it currently is.

My script is quite long but in short

$doc = new DOMDocument();
@$doc->loadHTMLFile('path-to-shtml-file', LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

// convert one tag

$doc->saveHTMLFile('path-to-new-shtml-file');

And I am running PHP 7.

0

There are 0 best solutions below