What causes my conda environment to prefer .local packages *before* the conda packages, causing errors?

48 Views Asked by At

I'm trying to run automatic machine learning model benchmarking code from the pycaret python package, which in its turn uses scikit-learn, among others.

However, my conda environment seems to run the scikit-learn dependency installed in the .local folder in my own home directory, rather than from my conda environment's packages.

That is not what I expected and it leads to the called pycaret code crashing because the interface of the loaded scikit-learn' is not the interface it expected.

I've found a way to change that behavior by editing sys.path (see below), but I don't understand why my conda environment apparently prefers .local folder's packages first, instead of the installed conda environment packages.

I have checked the sys.path of a conda environment created at a different organization and there the order is not like the above, so it clearly prefers the conda environment packages first.

This weird behavior clearly causes runtime errors and I don't want to edit sys.path in every Jupyter notebook that I start here at this organization. Can someone tell me which configuration sets this behavior, so I understand and can avoid having to indeed edit sys.path everytime?

A minimal code example of what I'm running is:

import pandas as pd
from pycaret.regression import *

df_basetable = pd.read_csv('df_basetable.csv')
random_seed = 14
regr_exp1 = setup(
    data=df_basetable[df_basetable["split_level1"]=="full train"], 
    target="my_prediction_target", 
    ignore_features=["customer_id"],
    numeric_features=[col for col in df_basetable.columns if col not in ["my_prediction_target", "customer_id"],
    test_data=df_basetable[df_basetable["split_level1"]=="validation"],
    fold_strategy = 'kfold',
    fold=5,
    fold_shuffle=True,
    n_jobs=5,
    session_id=random_seed, # for reproducibility
)

That results in the following error:

File ~/.local/lib/python3.9/site-packages/sklearn/base.py:211, in BaseEstimator.get_params(self, deep)
    209 out = dict()
    210 for key in self._get_param_names():
--> 211     value = getattr(self, key)
    212     if deep and hasattr(value, "get_params") and not isinstance(value, type):
    213         deep_items = value.get_params().items()

AttributeError: 'Simple_Imputer' object has no attribute 'fill_value_categorical'

What strikes me is that the interpreter is crashing on File ~/.local/lib/python3.9/site-packages/sklearn/base.py:211: I expected to see the path listed here of the sklearn package installed on the conda environment that I'm using.

I am running the above python code within a Jupyter Lab notebook with the conda environment activated that I want to use (it's called model_dashboard). Proof of that is:

import sys
print(sys.executable)

That prints '/applis/xyz/.envs/model_dashboard/bin/python'.

The python paths involved however, are the following:

import sys
sys.path

===>

['/applis/abc/notebooks',
 '/applis/xyz/.envs/model_dashboard/lib/python39.zip',
 '/applis/xyz/.envs/model_dashboard/lib/python3.9',
 '/applis/xyz/.envs/model_dashboard/lib/python3.9/lib-dynload',
 '',
 '/home/users/a12345/.local/lib/python3.9/site-packages',
 '/applis/xyz/.envs/model_dashboard/lib/python3.9/site-packages']

That seems weird to me. I would expect at least to see the packages from my conda environment ( '/applis/xyz/.envs/model_dashboard/lib/python3.9/site-packages') appearing before my .local python install's packages ('/home/users/a12345/.local/lib/python3.9/site-packages'), not after. I'm even wondering why the .local packages are in the python path at all, I think I don't really need this.

So I tried restarting the kernel, then putting the .local last instead:

['/applis/abc/notebooks',
 '/applis/xyz/.envs/model_dashboard/lib/python39.zip',
 '/applis/xyz/.envs/model_dashboard/lib/python3.9',
 '/applis/xyz/.envs/model_dashboard/lib/python3.9/lib-dynload',
 '',
 '/applis/xyz/.envs/model_dashboard/lib/python3.9/site-packages',
 '/home/users/a12345/.local/lib/python3.9/site-packages'
]

When printing sys.path, I see that the modification has worked, .local will now provide packages as last option.

With the kernel restarted and the above sys.path changes, when I run the pycaret code again, I no longer have the sklearn error.

So, good news, but a question remains: what is the .local folder meant for - I suppose just as the default python installation - and what configuration causes a system to prefer .local packages before a conda environment's packages in every Jupyter notebook that I start? It must be some configuration that I don't know about, since I could see in a different organization that the .local path was not even in ´sys.path´ when I'm printing it in a Jupyter notebook there.

I'd like to modify this configuration so I don't need to modify sys.path in every notebook.

0

There are 0 best solutions below