I'm trying to get performance increases in a program I have that parses XML. The program can parse multiple XML files so I thought that I could make this run in parallel, but all my attempts have resulted in lower performance!
For XML parsing, I am using HXT.
I have a run function defined like this:
run printTasks xs = pExec xs >>= return . concat >>= doPrint printTasks 1
'pExec' is given a list of file names and is defined as:
pExec xs = do
ex <- mapM exec xs
as <- ex `usingIO` parList rdeepseq
return as
where 'exec' is defined as:
exec = runX . process
threadscope shows only one thread e ver being used (until the very end).
Can anyone explain why I have failed so miserably to parallelise this code?
In case it helps:
exec :: FilePath -> [CV_scene]
pExec :: [FilePath] -> IO [[CV_scene]]
data CV_scene = Scene [CV_layer] Time deriving (Show)
data CV_layer = Layer [DirtyRects] SourceCrop deriving (Show)
data Rect = Rect Int Int Int Int deriving (Show)-- Left Top Width Height
instance NFData CV_scene where
rnf = foldScene reduceScene
where reduceScene l t = rnf (seq t l)
instance NFData CV_layer where
rnf = foldLayer reduceLayer
where reduceLayer d s = rnf (seq s d)
instance NFData Rect where
rnf = foldRect reduceRect
where reduceRect l t w h = rnf [l,t,w,h]
type SourceCrop = Rect
type DirtyRect = Rect
type Time = Int64
Thanks in advance for your help!
First, it looks like you mislabeled the signature of
exec, which should probably be:Now for the important part. I've commented inline on what I think you think is going on.
Note that line A does not happen in paralell, which you might think is okay since it will just set up the parsing thunks which are forced in parallel in B. This is a fair assumption, and a clever use of laziness, but the results pull that into question for me.
I suspect that the implementation of
execforces most of the parsing before line B is even reached so that the deep seq doesn't do much. That fits pretty well with my experince parsing and the profiling supports that explanation.Without the ability to test your code, I can only make the following suggestions. First try separating the parsing of the file from the IO and put the parsing in the parallel execution strategy. In that case lines A and B become something like:
with
exec'the portion ofexecafter the file is read from disk.Also, you may not even need
rdeepSeqafter this change.As an alternative, you can do the IO and parsing in parallel using Software Transactional Memory. STM approaches are normally used for separate IO threads which act more like services, rather than pure computations. But if for some reason you cant get the strategies based approach to work, this might be worth a try.
Good luck!