Config Dask Distributed serialization to enable compression?

19 Views Asked by At

Running Dask distributed with tens of nodes, I found the computation round-trip time reduced from around 10s to 5s if I compress my data with zlib before sending to Dask. With compression, the data size per task could be reduced from 14,536 bytes to 8,981, where around 30 tasks were sent per computation.

I used default Client setting:

client = dd.Client(master)

So I wonder the default Dask serialization does not incur any compression.

However, as discussed in issue#30, it seems Dask already supports compression. So how can I enable it?

1

There are 1 best solutions below

0
Jimmy Chen On

Answering my question after some search... Dask Configuration does support tuning of compression:

distributed.comm.compression

The compression algorithm to use. 'auto' defaults to lz4 if installed, otherwise to snappy if installed, otherwise to false. zlib and zstd are only used if explicitly requested here.

However, as stated above, one has to install lz4 or manually set compression setting to already available zlib to turn on compression.