Trouble with setting PBS Cluster using dask that finds my own modules

194 Views Asked by At

I am running into some errors when trying to set up my own client using jobqueue PBS Cluster instead of using a default local cluster (i.e., client = Client()).

When setting the default, my own modules were recognized, but I realized my workers in the PBS Cluster could not find them. This page and other research was helpful in understanding what I might be able to do.

I organized my modules into a package and used pip install -e . since I'll still be developing it. I confirmed my python environment site-packages directory has my package (via an .egg-link file).

I hoped installing the package would make my modules available, but I received the same error when I run my code after setting up a basic PBS Cluster:

cluster = PBSCluster(cores=x,memory=y)
cluster.scale(n)
client=Client(cluster)

Is my basic idea of installing the modules as a package not enough?

I looked into client.upload_file based on this answer as another means to make the reference to my module file explicit. Will I need to do something like this still to install modules directly on the workers?

Apologies for length, I am very new to both dask and operating on a HPC.

Thanks for any help.

1

There are 1 best solutions below

1
Stuart Berg On

First, just a sanity check: When using an HPC cluster, there is typically a shared filesystem, which all workers can access (and so can your client machine). Is that the case for your cluster? If so, make sure your conda environment is in a shared location that all workers can access.

I organized my modules into a package and used pip install -e .

That should work, as long as your source code is also on the shared filesystem. The directory pointed to by the .egg-link file should be accessible from the worker machines. Is it?