Running Dask distributed with tens of nodes, I found the computation round-trip time reduced from around 10s to 5s if I compress my data with zlib before sending to Dask. With compression, the data size per task could be reduced from 14,536 bytes to 8,981, where around 30 tasks were sent per computation.
I used default Client setting:
client = dd.Client(master)
So I wonder the default Dask serialization does not incur any compression.
However, as discussed in issue#30, it seems Dask already supports compression. So how can I enable it?
Answering my question after some search... Dask Configuration does support tuning of compression:
distributed.comm.compressionHowever, as stated above, one has to install lz4 or manually set compression setting to already available zlib to turn on compression.