I am trying to extract all the link text and hrefs from an HTML string, but the source string is Unicode, and nodeValue doesn't seem to cope with this?
$links = array();
$titles = array();
$dom = new DOMDocument();
$dom->loadHTML( $str );
$hrefs = $dom->getElementsByTagName("a");
foreach ($hrefs as $href) {
$links[] = $href->getAttribute("href");
$titles[] = $href->nodeValue;
}
My source string looks like this:
<p><a href='uploads/root/tr_62.pdf'>Türkiye</a></p>
But my output for $titles[0] looks like this:
Türkiye
How can I make nodeValue respect the Unicode characters?
Thanks for looking!
You much using mb_convert_encoding