How can I run `boost::iostreams::gzip_decompressor` in a background thread?

24 Views Asked by At

I'm doing streaming parses of very large (terabyte) JSON files using RapidJSON in a context where performance is important. The input JSON files are gzipped on disk, and while I can gunzip them before processing, this is slow enough to cause a consequential delay.

I'm experimenting with streaming-gunzipping the files while I streaming rapidjson-parse them. I've written basic gunzip stream code using boost::iostreams::gzip_decompressor() with boost::iostreams::filtering_istream:

    ifstream jsonFile(filename, ios::binary);
    boost::iostreams::filtering_istream gzipStream;
    gzipStream.push(boost::iostreams::gzip_decompressor());
    gzipStream.push(jsonFile);

    char readBuffer[1048576];
    rapidjson::IStreamWrapper iStreamWrapper(gzipStream, readBuffer, sizeof(readBuffer));

    rapidjson::Reader reader;
    ParseResult ok = reader.Parse(iStreamWrapper, handler);

This works OK functionally, but because my JSON parse is single-thread CPU bound, and GUnzip is also a CPU bound task, adding an in-main-thread GUnzip to the parse slows things down by ~30%. Ideally, I'd rather gunzip in another thread.

In principle, it seems like I could have one thread doing the gunzip, and feeding a stream in the other thread doing the JSON parse. Streams seem like a construct that could in principle be amenable to this. Is there a sane way to do this with boost filtering_istream?

0

There are 0 best solutions below