Pydantic: How to validate json string that has an inner json string?

149 Views Asked by At

I have the following string that my API is receiving:

'{"data": 123, "inner_data": "{\\"color\\": \\"RED\\"}"}'

My goal is to build a pydantic model that can validate the outer and inner data fields. So I built the following models:

from pydantic import BaseModel

class InnerData(BaseModel):
    color: str

class Expected(BaseModel):
    data: int
    inner_data: InnerData

But when I run the following:

incoming_json_string = '{"data": 123, "inner_data": "{\\"color\\": \\"RED\\"}"}'
expected = Expected.model_validate_json(incoming_json_string)

I get:

Traceback (most recent call last):
  File ".../site-packages/pydantic/main.py", line 532, in model_validate_json
    return cls.__pydantic_validator__.validate_json(json_data, strict=strict, context=context)
pydantic_core._pydantic_core.ValidationError: 1 validation error for Expected
inner_data
  Input should be an object [type=model_type, input_value='{"color": "RED"}', input_type=str]
    For further information visit https://errors.pydantic.dev/2.5/v/model_type

The link in the traceback doesn't help because it tells me the data is a string but should be a model. But that's what I'm trying to conjure up when I do inner_data: InnerData. What should I try?

1

There are 1 best solutions below

0
larsks On BEST ANSWER

The way you've constructed your models, you can validate a nested JSON object, like this:

{
  "data": 123,
  "inner_data": {
    "color": "RED"
  }
}

Pydantic will happily consume that JSON into your Expected and InnerData classes:

>>> incoming_json_string = '{"data": 123, "inner_data": {"color": "RED"}}'
>>> expected = Expected.model_validate_json(incoming_json_string)
>>> expected
Expected(data=123, inner_data=InnerData(color='RED'))

But if you want inner_data to receive a JSON string rather than an object, you would need to explicitly handle that situation. You could use a BeforeValidator, like this:

from pydantic import BaseModel, BeforeValidator
from typing import Annotated

class InnerData(BaseModel):
    color: str

class Expected(BaseModel):
    data: int
    inner_data: Annotated[InnerData, BeforeValidator(InnerData.model_validate_json)]

incoming_json_string = '{"data": 123, "inner_data": "{\\"color\\": \\"RED\\"}"}'
expected = Expected.model_validate_json(incoming_json_string)

Given a JSON object containing a nested JSON string, like this:

{
  "data": 123,
  "inner_data": "{\"color\": \"RED\"}"
}

The validator will decode the JSON string so that the unserialized result matches what Pydantic expects for InnerData:

>>> incoming_json_string = '{"data": 123, "inner_data": "{\\"color\\": \\"RED\\"}"}'
>>> expected = Expected.model_validate_json(incoming_json_string)
>>> expected
Expected(data=123, inner_data=InnerData(color='RED'))

I don't know anything about the problem you're trying to solve, but in most cases you actually want to keep your code the way you've got it in your question and avoid embedded JSON encoded data inside a JSON object.