Context
Images are uploaded to a folder in Blob storage. An AzureML Data asset is created pointing to that folder, and it is used as data input when invoking a batch inference endpoint in AzureML. The minimal code for this looks like this:
my_data = Data(
path=DATA_ASSET_PATH,
type=AssetTypes.URI_FOLDER,
name=f"notebook_{temp_uuid}",
)
ml_client.data.create_or_update(my_data)
job = ml_client.batch_endpoints.invoke(
endpoint_name=cfg["batch_endpoint_name"],
inputs={
"input": Input(path=DATA_ASSET_PATH, type=AssetTypes.URI_FOLDER),
},
output_path={
"score": Input(path=os.path.join(DATA_ASSET_PATH, "output"), type=AssetTypes.URI_FOLDER)
},
output_file_name="predictions.csv",
params_override=[
{"mini_batch_size": "20"},
{"compute.instance_count": "1"},
],
)
The endpoint is configured to use as environment the Docker image of the project, in which a Poetry environment is specified.
Problem
It seems that depending on how/where the endpoint invocation takes place, the endpoint is unable to process the data.
The error is
UserErrorException:
Message: Failed to load dataset definition with azureml-dataprep==4.12.9.
Please install the latest version with "pip install -U azureml-dataprep".
InnerException None ErrorResponse { "error": { "code": "UserError", "message": "Failed to load dataset definition with azureml-dataprep==4.12.9. Please install the latest version with \"pip install -U azureml-dataprep\"." } }
What is interesting to consider is the different scenarios where the error occurs or not:
If I execute the above code from a local notebook, it WORKS (i.e. no error, the job runs successfully)
If this code is executed from a FastAPI API running in App Service or in an Azure Function (running locally or in the cloud), it FAILS (see above error message)
If the code is limited to just creating the Data asset (skipping the endpoint invocation) and I then trigger the job manually myself in AzureML studio, it WORKS
So it looks like the environment from which the endpoint is invoked has an impact, but I have tried switching environments in the notebook and in the Azure function without successfully moving the needle.
I updated the Docker environment to use a more recent version of azureml-dataprep (now set to 5.1.4 in the poetry.lock file), after which the error message changed from mentioning azureml-dataprep==4.12.9 to mentioning instead azureml-dataprep==4.12.10 (I was expecting it to become 5.1.4) . However in that case the error message changed a bit, to include a stack trace:
Traceback (most recent call last):
File "/tmp/ceddbe60-62dc-42d1-910e-519169532113/prs_prod/lib/python3.8/site-packages/azureml_sys/parallel_run/azureml_core_helper.py", line 64, in input_datasets_info
input_datasets_lineage = self._run_context.run.get_details().get("inputDatasets")
File "/tmp/ceddbe60-62dc-42d1-910e-519169532113/prs_prod/lib/python3.8/site-packages/azureml/core/run.py", line 1280, in get_details
update_output_lineage(self.experiment.workspace, output_datasets)
File "/tmp/ceddbe60-62dc-42d1-910e-519169532113/prs_prod/lib/python3.8/site-packages/azureml/data/_dataset_lineage.py", line 74, in update_output_lineage
value['dataset'] = _Dataset._get_by_id(workspace, value['identifier']['savedId'])
File "/tmp/ceddbe60-62dc-42d1-910e-519169532113/prs_prod/lib/python3.8/site-packages/azureml/data/abstract_dataset.py", line 938, in _get_by_id
dataset = _saved_dataset_dto_to_dataset(workspace, result)
File "/tmp/ceddbe60-62dc-42d1-910e-519169532113/prs_prod/lib/python3.8/site-packages/azureml/data/_dataset_rest_helper.py", line 115, in _saved_dataset_dto_to_dataset
return _init_dataset(workspace=workspace,
File "/tmp/ceddbe60-62dc-42d1-910e-519169532113/prs_prod/lib/python3.8/site-packages/azureml/data/_dataset_rest_helper.py", line 147, in _init_dataset
dataset._dataflow._rs_dataflow_yaml = None # still want an attribute even if not fetched successfully
File "/tmp/ceddbe60-62dc-42d1-910e-519169532113/prs_prod/lib/python3.8/site-packages/azureml/data/_loggerfactory.py", line 132, in wrapper
return func(*args, **kwargs)
File "/tmp/ceddbe60-62dc-42d1-910e-519169532113/prs_prod/lib/python3.8/site-packages/azureml/data/abstract_dataset.py", line 228, in _dataflow
raise UserErrorException('{}. Please install the latest version with "pip install -U '
UserErrorException: UserErrorException:
Message: Failed to load dataset definition with azureml-dataprep==4.12.10. Please install the latest version with "pip install -U azureml-dataprep".
InnerException None
ErrorResponse
{
"error": {
"code": "UserError",
"message": "Failed to load dataset definition with azureml-dataprep==4.12.10. Please install the latest version with \"pip install -U azureml-dataprep\"."
}
}
Worrying is that this trace mentions python3.8 whereas the Python version used in the Docker image is 3.10.
TL;DR it seems something fishy is going on regarding environment and dependencies and I have no clue how to debug it further or fix it!
Below is the Docker image I used.
mcr.microsoft.com/azureml/curated/azureml-automl:156conda.yml
and azureml-dataprep==4.12.8 version
It is working successfully.
In your code for output path, you need to give
Outputin theoutputsparameter, notInputinoutput_path.Use the code below:
Refer to the documentation below to provide outputs in batch endpoint invocation.
Create jobs and input data for batch endpoints - Azure Machine Learning | Microsoft Learn