I'm deploying a RAG application on Google Cloud Run using a Docker container, integrating Flask, Streamlit, and Chainlit services. Despite configuring my Dockerfile to set up the environment and start services via a bash script, I encounter connection issues with Ollama, preventing my RAG app from functioning properly.
Error raised by inference endpoint: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/embeddings (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x3e19f5a96ef0>: Failed to establish a new connection: [Errno 111] Connection refused'))
Questions:
How can I ensure Ollama starts correctly and is accessible to my RAG application in Google Cloud Run? Is a single Docker image approach viable for this multi-service application, or should I consider splitting services into separate images?
Thanks for your time, attention and help (in advance). Truly appreciated.
DockerFile
FROM python:3.10-slim
WORKDIR /app
COPY . .
RUN pip install --upgrade pip
RUN pip install -r requirements.txt
COPY start_services.sh .
RUN chmod +x start_services.sh
ENV OLLAMA_HOST="0.0.0.0:11434"
EXPOSE 5000 11434
CMD ["./start_services.sh"]
start_services.sh:
#!/bin/bash ollama & ollama_pid=$! chainlit run rag.py trap "kill $ollama_pid" EXIT
Attempts to install and start Ollama both via pip and directly in the Dockerfile have been unsuccessful. I'm considering separating services into different Docker images but prefer a single-image solution if feasible. Also docker image fails locally to run properly. Or drop ollama...to use closed source llms, which would be sad..