AzureML Batch endpoint invocation: Failed to load dataset definition with azureml-dataprep==4.12.9

147 Views Asked by At

Context

Images are uploaded to a folder in Blob storage. An AzureML Data asset is created pointing to that folder, and it is used as data input when invoking a batch inference endpoint in AzureML. The minimal code for this looks like this:

my_data = Data(
    path=DATA_ASSET_PATH,
    type=AssetTypes.URI_FOLDER,
    name=f"notebook_{temp_uuid}",
)
ml_client.data.create_or_update(my_data)

job = ml_client.batch_endpoints.invoke(
    endpoint_name=cfg["batch_endpoint_name"],
    inputs={
        "input": Input(path=DATA_ASSET_PATH, type=AssetTypes.URI_FOLDER),
    },
    output_path={
        "score": Input(path=os.path.join(DATA_ASSET_PATH, "output"), type=AssetTypes.URI_FOLDER)
    },
    output_file_name="predictions.csv",
    params_override=[
        {"mini_batch_size": "20"},
        {"compute.instance_count": "1"},
    ],
)

The endpoint is configured to use as environment the Docker image of the project, in which a Poetry environment is specified.

Problem

It seems that depending on how/where the endpoint invocation takes place, the endpoint is unable to process the data.

The error is

UserErrorException:     
Message: Failed to load dataset definition with azureml-dataprep==4.12.9. 
Please install the latest version with "pip install -U azureml-dataprep".   
InnerException None     ErrorResponse { "error": { "code": "UserError", "message": "Failed to load dataset definition with azureml-dataprep==4.12.9. Please install the latest version with \"pip install -U azureml-dataprep\"." } }

What is interesting to consider is the different scenarios where the error occurs or not:

If I execute the above code from a local notebook, it WORKS (i.e. no error, the job runs successfully)

If this code is executed from a FastAPI API running in App Service or in an Azure Function (running locally or in the cloud), it FAILS (see above error message)

If the code is limited to just creating the Data asset (skipping the endpoint invocation) and I then trigger the job manually myself in AzureML studio, it WORKS

So it looks like the environment from which the endpoint is invoked has an impact, but I have tried switching environments in the notebook and in the Azure function without successfully moving the needle.

I updated the Docker environment to use a more recent version of azureml-dataprep (now set to 5.1.4 in the poetry.lock file), after which the error message changed from mentioning azureml-dataprep==4.12.9 to mentioning instead azureml-dataprep==4.12.10 (I was expecting it to become 5.1.4) . However in that case the error message changed a bit, to include a stack trace: Traceback (most recent call last):

  File "/tmp/ceddbe60-62dc-42d1-910e-519169532113/prs_prod/lib/python3.8/site-packages/azureml_sys/parallel_run/azureml_core_helper.py", line 64, in input_datasets_info
    input_datasets_lineage = self._run_context.run.get_details().get("inputDatasets")
  File "/tmp/ceddbe60-62dc-42d1-910e-519169532113/prs_prod/lib/python3.8/site-packages/azureml/core/run.py", line 1280, in get_details
    update_output_lineage(self.experiment.workspace, output_datasets)
  File "/tmp/ceddbe60-62dc-42d1-910e-519169532113/prs_prod/lib/python3.8/site-packages/azureml/data/_dataset_lineage.py", line 74, in update_output_lineage
    value['dataset'] = _Dataset._get_by_id(workspace, value['identifier']['savedId'])
  File "/tmp/ceddbe60-62dc-42d1-910e-519169532113/prs_prod/lib/python3.8/site-packages/azureml/data/abstract_dataset.py", line 938, in _get_by_id
    dataset = _saved_dataset_dto_to_dataset(workspace, result)
  File "/tmp/ceddbe60-62dc-42d1-910e-519169532113/prs_prod/lib/python3.8/site-packages/azureml/data/_dataset_rest_helper.py", line 115, in _saved_dataset_dto_to_dataset
    return _init_dataset(workspace=workspace,
  File "/tmp/ceddbe60-62dc-42d1-910e-519169532113/prs_prod/lib/python3.8/site-packages/azureml/data/_dataset_rest_helper.py", line 147, in _init_dataset
    dataset._dataflow._rs_dataflow_yaml = None  # still want an attribute even if not fetched successfully
  File "/tmp/ceddbe60-62dc-42d1-910e-519169532113/prs_prod/lib/python3.8/site-packages/azureml/data/_loggerfactory.py", line 132, in wrapper
    return func(*args, **kwargs)
  File "/tmp/ceddbe60-62dc-42d1-910e-519169532113/prs_prod/lib/python3.8/site-packages/azureml/data/abstract_dataset.py", line 228, in _dataflow
    raise UserErrorException('{}. Please install the latest version with "pip install -U '
UserErrorException: UserErrorException:
    Message: Failed to load dataset definition with azureml-dataprep==4.12.10. Please install the latest version with "pip install -U azureml-dataprep".
    InnerException None
    ErrorResponse 
{
    "error": {
        "code": "UserError",
        "message": "Failed to load dataset definition with azureml-dataprep==4.12.10. Please install the latest version with \"pip install -U azureml-dataprep\"."
    }
}

Worrying is that this trace mentions python3.8 whereas the Python version used in the Docker image is 3.10.

TL;DR it seems something fishy is going on regarding environment and dependencies and I have no clue how to debug it further or fix it!

1

There are 1 best solutions below

0
JayashankarGS On

Below is the Docker image I used.

mcr.microsoft.com/azureml/curated/azureml-automl:156

conda.yml

and azureml-dataprep==4.12.8 version

It is working successfully.

In your code for output path, you need to give Output in the outputs parameter, not Input in output_path.

Use the code below:

job = ml_client.batch_endpoints.invoke(
    endpoint_name="endpoint_name",
    inputs={
        "input": Input(path="DATA_ASSET_PATH", type=AssetTypes.URI_FOLDER),
    },
    outputs={
        "score": Output(path=os.path.join("DATA_ASSET_PATH","output"), type=AssetTypes.URI_FOLDER)
    }

Refer to the documentation below to provide outputs in batch endpoint invocation.

Create jobs and input data for batch endpoints - Azure Machine Learning | Microsoft Learn