I have larger-than-memory uniform (regularly gridded) 2d binary data which I am trying to interactively plot using any combination of Dask, Datashader and Holoviews. I am open to using other python-based tools, but the internet has led me to these ones for now.
The data files are ~11 GB and consist of a (600000, 4800) array of float32s.
I want to plot them on a different aspect ratio (1000x1000 px), and have a callback handle the dataloading/shading on zoom/pan. I am serving to a browser instead of using notebooks.
Within a 1000x1000px datashader canvas I have plotted:
- 4800x4800 points (which filled the canvas)
- 600000x4800 points (which filled only the bottom few pixels of the canvas, since the colored pixels had an aspect ratio of 600000/4800)
Neither were interactive.
What I have to far using python3.10 is:
import numpy as np
import datashader as ds
from datashader import transfer_functions as tf
import xarray as xr
import holoviews as hv
import panel as pn
hv.extension('bokeh', logo=False)
hv.output(backend="bokeh")
filename = 'path/to/binary/datafile'
arr = np.memmap(filename, shape=(4800,600000), offset=0, dtype=np.dtype("f4"), mode='r')
arr = xr.DataArray(arr, dims=("x", "y"), coords={'x': np.arange(4800), "y": np.arange(600000)})
cvs = ds.Canvas(plot_width=1000, plot_height=1000, x_range=(0, 4800), y_range=(0, 4800))
# the following line works too but does not fill the canvas
# cvs = ds.Canvas(plot_width=1000, plot_height=1000, x_range=(0, 4800), y_range=(0, 600000))
agg = cvs.raster(arr)
sh = tf.shade(agg)
pn.Row(sh).show()
Any advice is appreciated!
I'm not sure precisely what the ask is here, but the HoloViz way of approaching this problem would be to use dask without
.persist()or.compute(). The np.memmap approach may also work.And then you'd use holoviews as described at https://examples.pyviz.org/census/census.html, or hvplot as described at https://hvplot.holoviz.org . Without having the actual data or a synthesized version of it it's hard to be more specific than that.
BTW, I think you have x and y switched in your x_range and y_range above, since a Numpy shape of 4800,600000 corresponds to a y_range of 0,4800 and an x_range of 0,600000 (since NumPy shapes are row, column while row is on y and column is on x).