I have an existing dockerfile that runs a python program involving netCDF4. Here's a simplified version:
ARG BASE_IMG=python:3.11-slim
ARG VENV="/opt/venv"
# ------------------------------ #
FROM $BASE_IMG
ARG VENV
RUN apt-get update && \
apt-get upgrade && \
apt-get install -y python3-dev libhdf5-dev libnetcdf-dev
RUN python -m venv $VENV
ENV PATH="$VENV/bin:$PATH"
RUN pip install numpy~=1.23.5 netcdf4~=1.6.4 h5py~=3.9.0
COPY test.py test.py
ENTRYPOINT ["python", "-m", "test"]
My full dockerfile involves some c++ compilation as well, and I want to covert this into a multistage build so the compilation tools don't end up in my final image. While I'm at it, I figured I could also pip install my python packages in the compile stage as well, and move the whole venv over to the final stage like so:
ARG BASE_IMG=python:3.11-slim
ARG VENV="/opt/venv"
FROM $BASE_IMG as compile-image
ARG VENV
RUN apt-get update && \
apt-get upgrade && \
apt-get install -y python3-dev libhdf5-dev libnetcdf-dev
RUN python -m venv $VENV
ENV PATH="$VENV/bin:$PATH"
RUN pip install numpy~=1.23.5 netcdf4~=1.6.4 h5py~=3.9.0
# ------------------------------ #
FROM $BASE_IMG
ARG VENV
RUN apt-get update && \
apt-get upgrade && \
apt-get install -y libhdf5-dev libnetcdf-dev
COPY --from=compile-image $VENV $VENV
ENV PATH="$VENV/bin:$PATH"
COPY test.py test.py
ENTRYPOINT ["python", "-m", "test"]
This works great, except copying the netCDF4 package over this way seems to result in a large slow down in netcdf read/write operations. I can make an identical Dockerfile to the one above where I just install netCDF4 directly in the final stage, and I don't see this slow down, so I'm thinking there is some sort of external c lib the netCDF4 package is using that I also need to copy over. Does anyone know how to determine whether netCDF4 has linked to all its libs correctly, or what I need to copy over specifically to make this work?
Using test_echam_spectral-deflated.nc as a test file. Not sure what you are doing with the data, but my test script loads all variables from the
.ncfile:test.pyThe data are shared with a container via a volume mount.
I don't see a significant difference in load times. Here's your first
Dockerfile:And this is the second, multistage
Dockerfile.Can you please provide more information which can be used to replicate the issue?