Weaviate - push records in batch errors with JSONDecodeError

216 Views Asked by At

I’m trying to add records in a batch, but after I add my objects to the batch, I always get a JSONDecodeError when I assume the batch is being sent to my Weaviate class.

client.batch.configure(batch_size=100, dynamic=False, timeout_retries=3,
                       callback=weaviate.util.check_batch_result,
consistency_level=weaviate.data.replication.ConsistencyLevel.ALL)
with client.batch as batch:
     for el_idx, el in enumerate(send_to_weaviate):
         batch.add_data_object(el, "MyClass")

Records look like this:

send_to_weaviate[0]
{'my_id': '3c2466b7e7da201c66f42ea362874343',
 'post_timestamp': ['1644883202000', '1644883242000'],
 'dist_metric': [0, 0]}

Schema looks like this:

class_obj = {
        "class": "MyClass",
        "description": "Description",
        "properties": [{
            "dataType": ["text"],
            "description": "ID",
            "name": "my_id"
        },  {
            "dataType": ["text[]"],
            "description": "Timestamps",
            "name": "post_timestamp"
        },  {
            "dataType": ["int[]"],
            "description": "Description",
            "name": "dist_metric"
        }]
    }

Error message:

File ~/opt/anaconda3/envs/scripts/lib/python3.9/site-packages/weaviate/batch/crud_batch.py:644, in Batch._create_data(self, data_type, batch_request)
    642     connection_count += 1
    643 else:
--> 644     response_json = response.json()
    645     if (
    646         self._weaviate_error_retry is not None
    647         and batch_error_count < self._weaviate_error_retry.number_retries
    648     ):
    649         batch_to_retry, response_json_successful = self._retry_on_error(
    650             response_json, data_type
    651         )

File ~/opt/anaconda3/envs/scripts/lib/python3.9/site-packages/requests/models.py:975, in Response.json(self, **kwargs)
    971     return complexjson.loads(self.text, **kwargs)
    972 except JSONDecodeError as e:
    973     # Catch JSON-related errors and raise as requests.JSONDecodeError
    974     # This aliases json.JSONDecodeError and simplejson.JSONDecodeError
--> 975     raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

All of the Googling I've done has indicated that the JSONDecodeError occurs when your payload is not json serializable. Weaviate supports lists of common data types as properties (e.g., the int[] and text[] data types), but in general JSON doesn't like lists of ints or lists of strings in its values. However when I try to JSON serialize my send_to_weaviate variable, I have no problems, so this may not be the true cause?

import json
json.loads(json.dumps(send_to_weaviate))  # No errors

Can anyone help me figure out why my batch fails to add to Weaviate?

EDIT:: Here is a small reproducible example that reproduces my issue. I'm using weaviate-client v 3.22.1 in a python 3.9 conda environment.

import weaviate
client = weaviate.Client(URL_TO_WEAVIATE_ENDPOINT)
class_obj = {
        "class": "MyClass",
        "description": "Description",
        "properties": [{
            "dataType": ["text"],
            "description": "ID",
            "name": "my_id"
        },  {
            "dataType": ["text[]"],
            "description": "Timestamps",
            "name": "post_timestamp"
        },  {
            "dataType": ["int[]"],
            "description": "Description",
            "name": "dist_metric"
        }]
    }
client.schema.create_class(class_obj)
client.batch.configure(batch_size=100, dynamic=False, timeout_retries=3)
# Just try to add 1 doc
doc = {'my_id': '3c2466b7e7da201c66f42ea362874343','post_timestamp': ['1644883202000', '1644883242000'], 'dist_metric': [0, 0]}
with client.batch() as batch:
    batch.add_data_object(doc, "MyClass")
2

There are 2 best solutions below

0
lrthistlethwaite On BEST ANSWER

When I set batch_size = 10, things seemed to work again. I think because my records were quite character-rich, batches needed to be smaller than the default=100 setting. You can set batch_size using client.batch.configure(batch_size=10) or in the context manager itself with client.batch(batch_size=10) as batch:, etc.

Also, weirdly, deleting the "client" object and reestablishing the connection seemed to help, but I can't understand why.

6
hsm207 On

Can anyone help me figure out why my batch fails to add to Weaviate?

You can start by using manual batching and printing the el_idx before creating the object in weaviate to make sure you do not have any malformed elements.

If all the elements are what you expected, then you'll need to provide a minimal reproducible example for further assistance.