I am writing a Python package that relies on some expensive to generate and large data file in order to properly work.
This large file only needs to be generated once, so I have configured the package's __init__.py to generate this file upon import. This file could also be generated at any other point (as soon as it's needed). The point is that this file is not part of the package as it would be too large to distribute.
The file is saved into the package directory, using code such as:
data_dir = os.path.join(os.path.dirname(__file__), "data")
# save file to directory
which would point to some directory such as: /Users/me/anaconda3/envs/my-package/lib/python3.11/site-packages/my_package. It's assumed the user has write permission to the python package directory (should be using a virtual environment).
I want users to be able to install and uninstall the package using pip. The data files should be deleted on uninstall since they take too much space.
Even though the data files are inside the package directory (this directly gets removed in a regular uninstall), pip is raising a warning about files inside the package directory not belonging to the package and it will not remove it.
❯ pip uninstall my-package
Found existing installation: my_package 0.0.1
Uninstalling my_package-0.0.1:
Would remove:
/Users/me/anaconda3/envs/my-package/lib/python3.11/site-packages/my_package-0.0.1.dist-info/*
/Users/me/anaconda3/envs/my-package/lib/python3.11/site-packages/my_package/*
Would not remove (might be manually added):
/Users/me/anaconda3/envs/my-package/lib/python3.11/site-packages/my_package/data/file
Proceed (Y/n)?
After running the uninstall will remove all files expect these manually added files.
I guess this could be solved by marking these files as not manually added, so pip knows it can remove them, but I don't know if this is possible. Another solution would be to automatically run some script just-before or after uninstall.
I could not find any satisfactory solution to this problem online so I am probably missing something, what is the established solution for this problem? Is this achievable by modifying the pyproject.toml file?