Inconsistent MLPRegressor results across environments

36 Views Asked by Boon At 12 October 2023 at 12:47

I have trained an MLPRegressor with scikit-learn in Python. When the model is trained it is exported to ONNX format. The model is trained locally on an ARM M1 processor and later in production the model is deployed in a container on x86 and executed using an ONNX runtime (CPU-only). In some cases, the model gives wildly different results between environments, not attributable to simple round-off errors or small variations in floating point implementations. Some observations:

When the model is trained from scratch at runtime in production using same training data , it produces the same results as the local model. Transferring the locally trained model in the ONNX file to production produces wildly different results.
I have trained multiple instances of the model with different training data sets, and in a small number of cases (2 out of approx. 50) the model always gives the same results in all environments, whether training again at runtime in production or transferring the locally trained ONNX files.
There is no correlation between identical model files and identical results. With some instances, the model produces identical results in both environments when the ONNX files are different (as determined by a simple SHA checksum). In other cases the model produces different results when the ONNX files are identical between environments.
There is no correlation between the architecture on which the model is trained and run, and producing consistent results. In some cases, training on ARM and running on ARM produces the same result as training on ARM and running on x86. In other cases, training on ARM and running on ARM produces different results than training on ARM and running on x86.
I have replaced ONNX with plain Python Pickle, but no difference is observed.

What is going on here? Why can I not get consistent results with my models across environments? The only way to get consistent results is to train the model again at runtime, which is clearly not a practical or scalable solution. I have confirmed that the files being deployed in the container are indeed correct, i.e. they are the output of the local training process.

Original Q&A

Inconsistent MLPRegressor results across environments

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in MACHINE-LEARNING

Related Questions in SCIKIT-LEARN

Related Questions in ONNX

Related Questions in MLP

Trending Questions

Popular # Hahtags

Popular Questions