Correct DTD for XML, analyzing Intellij generated DTD

36 Views Asked by At

There is an XML containing elements like:

<localHeight>
<localFeet>5</localFeet>
<localInches>10</localInches>
</localHeight> 

When I generated a DTD using Intellij it came like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE localHeight[
<!ELEMENT localHeight (localFeet |  localInches)*>  
<!ELEMENT localFeet (#PCDATA)>  
<!ELEMENT localInches (#PCDATA)>  

]> 

This will even work for this XML :

<localHeight>
</localHeight> 

Which doesn't have any of the other elements.
Using a | means it contains a localFeet or localInches tag.
I came up with the following DTD:

<!DOCTYPE localHeight[
<!ELEMENT localHeight (localFeet, localInches)>  
<!ELEMENT localFeet (#PCDATA)>  
<!ELEMENT localInches (#PCDATA)>  

]>  

This is also correct but probably expresses it more precisely than the IntelliJ-generated DTD. Need some input as I am not sure if I am thinking on the correct lines or not here

1

There are 1 best solutions below

0
Michael Kay On

A schema (or DTD) describes the common characteristics of a class of documents. Inferring a schema from a single document -- especially one containing only three elements -- therefore involves a large amount of guesswork. Imagine trying to work out the specification of HTML from one very simple web page! The idea in your question that there is a "correct DTD" for an XML document is completely mistaken; if you only have one document, there are any number of correct DTDs.

In particular, when you infer a schema or DTD from one document, or even from a set of documents, the result will generally be either too precise (it excludes things that are valid: for example it assumes that because every sample document has an even number of paragraphs, then that is a necessary characteristic) or it will be insufficiently precise (for example, it fails to spot that a numeric attribute must always be less than 200). Working out the general rules from a limited set of examples is in general impossible.