Jupyterhub notebook image shared python libraries

45 Views Asked by At

I manage a jupyterhub server with about 50 users. Each user gets a base notebook image that starts up when they login.

We are having difficult time managing this image, because the users insist that we pre install all the python libraries that they need. That means that this 1 image has about 100+ libraries installed, and most of the libraries are not designed to work together. It also makes pip run very slowly because it is searching for dependencies in the 100+ libraries. We finally ran into an issue where two libraries can't run together. Library A requires library Z at v 1, but library B requires Z at v 2.

I feel like the best course of action is to force each user to set up their own libraries on each container. Even if it takes them a bit more time.

I was wondering how other systems handle this? I want to use an image management process that similar to how the community generally manages jupyterhub systems.

thanks. myles.

1

There are 1 best solutions below

0
Dom On

When things get complex, it is not feasible to have only one image for all users. JupyterHub allows you to specify profiles with different images that a user can choose from when logging in. Check the Customizing User Environment section in the Jupyterhub documentation for that.

One approach to building these images is in layers, starting with a base layer that contains only a base OS image and some configurations and installations that apply to all other images, for example, ca-certificates or tools like wget.

So if you already have an image that contains everything, it might be a good idea to reverse engineer it into several layers of images that you can then stack on top of each other to create specific user images without dependency conflicts.

Depending on how complex you want to get with this process, you can also work with variables, where you specify different versions or packages when building the images.

I don't know how you currently build your images, but I suggest you take a look at how the Jupyter community itself manages its images, or this project I've been personally involved in. Both use GitHub Actions to build and publish their images and in the second project I linked, we also involved users on GitHub to suggest and implement features in the images. Jinja is also useful in this context as a templating engine.