I have to parse some xml, i decided to use xml-conduit for that task and use the streaming part of it.
The structure of the xml is given by an xsd-file that contains elements and how often they may occur. But not in which order they are expected.
How do I parse all possible reorderings of children of an xml-structure using Text.XML.Stream.Parse ?
The problem
Say we have an xml description like
Root
/ \
A B
then both <Root><A>atext</A><B>btext</B></Root> and <Root><B>btext</B><A>atext</A></Root> are valid instances of this xml-structure.
But parsing in the streaming setup needs an ordering to succeed.
I thought of using something like parseRoot1 <|> parseRoot2 but then I have to implement the Alternative instance and write all the possibilities by hand, which I really don't want to.
Here is a minimal sample haskell program.
Example.hs
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE Rank2Types #-}
module Main where
import Control.Exception
import Control.Monad.Trans.Resource (MonadThrow)
import Text.XML.Stream.Parse
import Data.Monoid ((<>))
import Data.Maybe
import Data.Text (Text)
import Data.XML.Types (Event)
import Data.Conduit (ConduitM, Consumer, yield, ($=), ($$))
data Root = Root {a :: A, b :: B} deriving (Show, Eq)
data A = A Text deriving (Show, Eq)
data B = B Text deriving (Show, Eq)
ex1, ex2 :: Text
ex1 = "<Root>"<>
"<A>Atest</A>"<>
"<B>Btest</B>"<>
"</Root>"
ex2 = "<Root>"<>
"<B>Btest</B>"<>
"<A>Atest</A>"<>
"</Root>"
ex :: Root
ex = Root {a = A "Atest", b = B "Btest"}
parseA :: MonadThrow m => ConduitM Event o m (Maybe A)
parseA = tagIgnoreAttrs "A"
$ do result <- content
return (A $ result)
parseB :: MonadThrow m => ConduitM Event o m (Maybe B)
parseB = tagIgnoreAttrs "B"
$ do result <- content
return (B result)
parseRoot1 :: MonadThrow m => ConduitM Event o m (Maybe Root)
parseRoot1 = tagIgnoreAttrs "Root" $ do
a' <- fromMaybe (error "error parsing A") <$> parseA
b' <- fromMaybe (error "error parsing B") <$> parseB
return $ Root{a = a', b = b'}
parseRoot2 :: MonadThrow m => ConduitM Event o m (Maybe Root)
parseRoot2 = tagIgnoreAttrs "Root" $ do
b' <- fromMaybe (error "error parsing B") <$> parseB
a' <- fromMaybe (error "error parsing A") <$> parseA
return $ Root{a = a', b = b'}
parseTxt :: Consumer Event (Either SomeException) (Maybe a)
-> Text
-> Either SomeException (Maybe a)
parseTxt p inTxt = yield inTxt
$= parseText' def
$$ p
main :: IO ()
main = do putStrLn "Poor Mans Test Suite"
putStrLn "===================="
putStrLn "test1 Root -> A - B " -- works
print $ parseTxt parseRoot1 ex1
putStrLn "test1 Root -> B - A " -- fails
print $ parseTxt parseRoot1 ex2
putStrLn "test2 Root -> A - B " -- fails
print $ parseTxt parseRoot2 ex1
putStrLn "test2 Root -> B - A " -- works again
print $ parseTxt parseRoot2 ex2
note
example.cabal
[...]
build-depends: base >=4.8 && <4.9
, conduit
, resourcet
, text
, xml-conduit
, xml-types
[...]
Here's my idea...
First some definitions:
We start with a conduit which accepts either an A or B tag, ignores the attributes and returns the name and content:
Then we use that to write a conduit which parses two tags, makes sure that one is an A and the other is a B:
Update
You can reduce the boilerplate in
parseABby using theMaybeTtransformer:And if you have several constructors, I'd consider doing something like this:
The
manycombinator comes from Tet.XML.Stream.Parse.