Very Strange XML Schema Issue

323 Views Asked by At

I'm trying to parse custom XML file formats with PyXB. So, I first wrote the following XML schema:

<?xml version="1.0"?>                                                           
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">                         
    <xs:element name="outertag" minOccurs="0" maxOccurs="1">                    
        <xs:complexType>                                                        
            <xs:all>                                                            
                <xs:element name="innertag0"                                    
                            minOccurs="0"                                       
                            maxOccurs="unbounded"/>                             
                <xs:element name="innertag1"                                    
                            minOccurs="0"                                       
                            maxOccurs="unbounded"/>                             
            </xs:all>                                                           
        </xs:complexType>                                                       
    </xs:element>                                                               
</xs:schema>

I used the following pyxbgen command to generate the Python module's source, py_schema_module.py:

pyxbgen -m py_schema_module -u schema.xsd

I then wrote the following script for parsing an XML file I call example.xml:

#!/usr/bin/env python2.7                                                        

import py_schema_module                                                         

if __name__ == "__main__":                                                      
    with open("example.xml", "r") as f:                                         
        py_schema_module.CreateFromDocument(f.read())

I use that script to determine the legality of example.xml's syntax. For instance, the following example.xml file has legal syntax per the schema:

<outertag>                                                                      
    <innertag0></innertag0>                                                     
    <innertag1></innertag1>                                                     
</outertag>

So does this:

<outertag>                                                                      
    <innertag1></innertag1>                                                     
    <innertag0></innertag0>                                                     
</outertag>

However, the following syntax is illegal:

<outertag>                                                                      
    <innertag1></innertag1>                                                     
    <innertag0></innertag0>                                                     
    <innertag1></innertag1>                                                     
</outertag>

So is this:

<outertag>                                                                      
    <innertag0></innertag0>                                                     
    <innertag1></innertag1>                                                     
    <innertag0></innertag0>                                                     
</outertag>

I am able to write innertag0 and then innertag1. I am also able to write innertag1 and then innertag0. I can also repeat the instances of innertag0 and innertag1 arbitrarily (examples not shown for the sake of brevity). However, what I cannot do is switch between innertag0 and innertag1.

Let's assume I want the format to support this functionality. How should I alter my XML schema file?

2

There are 2 best solutions below

2
Yitzhak Khabinsky On

The following XML Schema (XSD) 1.0 should cover your use case regardless of the sequential order of the innertag(0|1) element. Default value for both minOccurs and maxOccurs is 1.

Useful link: XML schema, why xs:group can't be child of xs:all?

XML

<outertag>
    <innertag1></innertag1>
    <innertag0></innertag0>
</outertag>

XSD

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
    <xs:element name="outertag">
        <xs:complexType>
            <xs:all>
                <xs:element name="innertag0" type="xs:string"/>
                <xs:element name="innertag1" type="xs:string"/>
            </xs:all>
        </xs:complexType>
    </xs:element>
</xs:schema>
2
Michael Kay On

Your schema processor doesn't seem to be doing very careful checking against the spec.

If I try to process your schema as an XSD 1.0 schema with Saxon, it tells me there are four errors:

Error at xs:element on line 3 column 59 of test.xsd:
  Attribute @minOccurs is not allowed on element <xs:element>
Error at xs:element on line 3 column 59 of test.xsd:
  Attribute @maxOccurs is not allowed on element <xs:element>
Error at xs:all on line 5 column 15 of test.xsd:
  Within <xs:all>, an <xs:element> must have @maxOccurs equal to 0 or 1
Error at xs:all on line 5 column 15 of test.xsd:
  Within <xs:all>, an <xs:element> must have @maxOccurs equal to 0 or 1
Schema processing failed: 4 errors were found while processing the schema

The first two say that minOccurs and maxOccurs are not allowed on a global element declaration.

The second two say that maxOccurs must be 1 within xs:all - XSD 1.0 doesn't allow an element to repeat when the content model is xs:all. Your processor told you it was an error in the XML instance, but it's actually an error in your schema.

XSD 1.1 does allow multiple occurrences within xs:all. If I correct the global element declaration by deleting @minOccurs and @maxOccurs, the schema is now valid under XSD 1.1, and allows the interleaved instance examples that you were having trouble with.