I'm trying to parse a json file with BaseX 10.7 thanks to the json:parse function.
My file presents some values with html characters, for example like this in "text" value:
"order": 2,
"page_id": 27,
"text": "<p><strong>Présentation générale</strong></p>\r\n<p>L’ambon également nommé <em>pulpitium</em> (estrade) est une sorte de tribune élevée d’où sont proclamés les textes saints. Il est placé dans le chœur de l’église, généralement, du côté gauche.</p>\r\n<p>Dès la fin du IV<sup>e</sup> siècle, ce type de tribune, appelé <em>analogium</em>...<em>Bernard Berthod</em></h4>"
But before I even try to parse my file, when I open it in BaseX, I can see in the output window that some characters (ex : <) have been replaced by their encoding sign (becomes <).
<order type="number">2</order><page__id type="number">27</page__id><text><p><strong>Présentation générale</strong></p>
<p>L’ambon également nommé <em>pulpitium</em> (estrade) est une sorte de tribune élevée d’où sont proclamés les textes saints. Il est placé dans le chœur de l’église, généralement, du côté gauche.</p>
..>
I suppose that I have to tell BaseX to accept html characters?
I tried to play with the parser options (json and html), but nothing changed...
When you use
json:parse, strings in the JSON structure……will be adopted as string values in the converted XML:
The reason why the returned string representation of the XML document contains
<and>is that the characters<,>are returned as “entity references”. Otherwise, strings with</>and elements could not be distinguished anymore.What (I assume) you want is that XML strings in the JSON are converted to XML:
This can be done by performing an update on the generated XML document: The string value of the
contentelement is replaced with the parsed XML structure:Please note that this requires that the string is well-formed XML (which is not the case for the snippet that you presented in your question).