How to load a folder of Json files in Langchain?

13.2k Views Asked by At

I am trying to load a folder of JSON files in Langchain as:

loader = DirectoryLoader(r'C:...')
documents = loader.load()

But I got such an error message:

ValueError: Json schema does not match the Unstructured schema

Can anyone tell me how to solve this problem?

I tried using glob='**/*.json', but it is not working. The documentation on the Langchain website is limited as well.

1

There are 1 best solutions below

1
BuffK On

If you want to read the whole file, you can use loader_cls params:

from langchain.document_loaders import DirectoryLoader, TextLoader

loader = DirectoryLoader(DRIVE_FOLDER, glob='**/*.json', show_progress=True, loader_cls=TextLoader)

also, you can use JSONLoader with schema params like:

from langchain.document_loaders.json_loader import JSONLoader

DRIVE_FOLDER = "/content/drive/MyDrive/Colab Notebooks/demo"

loader = DirectoryLoader(DRIVE_FOLDER, glob='**/*.json', show_progress=True, loader_cls=JSONLoader, loader_kwargs = {'jq_schema':'.content'})

documents = loader.load()

print(f'document count: {len(documents)}')
print(documents[0] if len(documents) > 0 else None)

jq_schema You can follow this: https://github.com/hwchase17/langchain/blob/master/langchain/document_loaders/json_loader.py#L10

more usage for DirectoryLoader: https://github.com/hwchase17/langchain/blob/master/langchain/document_loaders/directory.py