I have a Python workflow that reads local folders, uploads pix to google cloud for processing, and returns json files back to other local folders. This workflow worked fine.
I used this script:
#!/bin/bash
# Create and activate a Python virtual environment
python -m venv dg_ocr_env
source dg_ocr_env/bin/activate
# Deactivate any existing Conda environment
conda deactivate
# Upgrade pip
python -m pip install --upgrade pip
# Install packages from requirements.txt
pip install -r /Users/XXXX/Desktop/ocv/00_setup_VENV_dg_ocr/requirements.txt
...and I was then able to run my main.py script without error.
Then, I tried dockerizing and hit this:
ModuleNotFoundError: No module named 'indexer'
I thought this might've had something to do with Docker only…whether individual build or docker compose, rebuilding with no cache (and every other way I have found scouring the Internet), I got the following error or something like it:
docker run ocr-image
Traceback (most recent call last):
File "/app/unified_ocr_2_json.py", line 15, in <module>
from spellchecker import SpellChecker
File "/usr/local/lib/python3.9/site-packages/spellchecker/__init__.py", line 2, in <module>
from spellchecker.core import Spellchecker,getInstance
File "/usr/local/lib/python3.9/site-packages/spellchecker/core.py", line 26, in <module>
from indexer import DictionaryIndex
ModuleNotFoundError: No module named 'indexer'
what I tried: In trying to get some clarity on the process, I wiped another computer, set up a clean Monterey operating system, and installed nothing in it except for Anaconda, Brew, and sublime.txt.
I then tried to recreate from my directory of py scripts, etc., i,e., the workflow that had a very short time ago worked just fine locally:
I ran my .sh (shown above): …and now here is the most recent traceback from the clean computer:
python /Users/aiki/Desktop/ocv/unified_ocr_2_json.py
Traceback (most recent call last):
File "/Users/aiki/Desktop/ocv/unified_ocr_2_json.py", line 11, in <module>
from spellchecker import SpellChecker
File "/Users/aiki/opt/anaconda3/lib/python3.9/site-packages/spellchecker/__init__.py", line 2, in <module>
from spellchecker.core import Spellchecker,getInstance
File "/Users/aiki/opt/anaconda3/lib/python3.9/site-packages/spellchecker/core.py", line 26, in <module>
from indexer import DictionaryIndex
ModuleNotFoundError: No module named 'indexer'
My script.py calls:
from spellchecker import SpellChecker
Here is pip show spellchecker:
(dg_ocr_env) aware@awares-MacBook-Pro spellchecker % ls
__init__.py core.py indexer.py langdetect.py resources templates
__pycache__ dicts info.py py.typed spellchecker.py utils.py
(dg_ocr_env) aware@awares-MacBook-Pro spellchecker %
and nano core.py:
#
import os
import string
import codecs
import inexactsearch
import urllib
from indexer import DictionaryIndex
from langdetect import _detect_lang
__all__ = ['Spellchecker', 'getInstance']
I have looked inside core.py (indexer is there...), I have checked dependency conflicts, I have checked requirements.txt and what spellchecker says it requires is listed, And so so much more -
I have no idea what changed, all I know is: what used to work on one computer locally, now no longer works on either computer locally, let alone in any kind of docker container.
Any help at all will be greatly appreciated, thank you!
I found the answer — John's comment above made me go down this particular road again…the key points found being this:
You would think that a pyspellchecker module would have required an import statement that would read something like this:
But it doesn't. Instead, the correct version is:
But that doesn't match the name of the module, does it? And further, there (really was) another module called spellchecker. Once I uninstalled both and reinstalled only pyspellchecker into my conda venv, things worked again.
But here are the things that led to that final discovery, as hopefully useful to someone else in troubleshooting this type of error:
1 — I started by printing the list of packages where I thought they might be to check for conflicts:
This didn't really help, so I discovered that I could:
2 — Verify the correct Python executable: In my .py script, I added print(sys.executable) to print the location of the Python executable. It (should) point to the Python executable inside my myenv environment.
3 — Then I learned that I could troubleshoot better by directly by running a python session in terminal. But first, to be clean about it:
4 — Started a new Python interactive session by typing python and hitting enter.
The Python interpreter path (from sys.executable) should point to the Python interpreter inside myenv environment, and the Python version (from sys.version) should be the one I expected for this environment.
5 — Once I knew the path, and still in the same python session direct in terminal:
This prints the list of directories Python is using to look for site-packages (third-party libraries).
6 — Exit Python, exit() and list the contents of the site-packages directory for myenv environment:
This listed all the packages installed in myenv. I checked if pyspellchecker was in this list. It was. And something else was also…
There was also a spellchecker directory listed there.
7 — So I tried to import using the spellchecker name. Run a python terminal session again and:
If the import succeeds, this prints the path to the module that was imported. I will be able to see if it corresponds to pyspellchecker or something else, but it failed, and that meant these were conflicting, right?
The module that Python is finding when I import spellchecker is different from pyspellchecker, which is why there's a conflict when I try to import pyspellchecker.
8 — So maybe the best course of action is to uninstall both packages and then reinstall only pyspellchecker.
9 — Remember to exit the Python interpreter when you're done testing by typing exit() or hitting Ctrl+D. And then:
After (all) this, everything works.
Thanks to all who helped me down this road!