I am currently trying to set up a self-managed instance of mlflow to evaluate Azure OpenAI. I set up the following code, just from the demos and starter code I have been finding in documentation:
system_prompt = (
"The following is a conversation with an AI assistant."
+ "The assistant is helpful and very friendly."
)
example_questions = pd.DataFrame(
{
"question": [
"How do you create a run with MLflow?",
"How do you log a model with MLflow?",
"What is the capital of France?",
]
}
)
#start a run
with mlflow.start_run() as run:
mlflow.autolog()
mlflow.log_param("system_prompt", system_prompt)
# Create a question answering model using prompt engineering
# with OpenAI. Log the model to MLflow Tracking
logged_model = mlflow.openai.log_model(
model="gpt-3.5-turbo",
task=openai.ChatCompletion,
artifact_path="model",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": "{question}"},
],
)
mlflow.evaluate(
model=logged_model.model_uri,
model_type="question-answering",
data=example_questions,
)
Whenever I run this, I get the following exception: MlflowException: 3 tasks failed. See logs for details. It seems like the logs say, "Consider running at a lower rate."
I am confused on how to lower the rate as I can't find any documentation for it, or if there is something entirely else that I am missing.
If you have a look to
mlflowgithub project for this error, you will find it here: https://github.com/mlflow/mlflow/blob/8a723062c79d1f6382cf2c1139487df903d14c67/mlflow/openai/api_request_parallel_processor.py#L353If you check how these
status_tracker.num_rate_limit_errorsare set:=> It's incremented when "rate limit" is part of the error message, and as stated in the comment at the beginning of this block, the errors can be seen here.
So basically, you will get this kind of error when you make too many requests at the same time to your Azure OpenAI models (so the API will send 429 status code as a result). It can be due to:
To fix that: