I am using LayoutLM to read receipts and get text from the invoices. I am using this model from HuggingFace "philschmid/lilt-en-funsd". Given below is the code snippet:
def run_inference(image_path, model=model, processor=processor, output_image=True):
# Load image from the path
image = Image.open(image_path).convert("RGB")
# get predictions
encoding = processor(image, return_tensors="pt")
del encoding["pixel_values"]
outputs = model(**encoding)
predictions = outputs.logits.argmax(-1).squeeze().tolist()
labels = [model.config.id2label[prediction] for prediction in predictions]
boxes = encoding["bbox"][0].tolist()
model_name = model.name_or_path.split('/')[-1]
if output_image:
image_with_boxes = draw_boxes(image, encoding["bbox"][0], labels)
b_answer_boxes = [encoding["bbox"][0][i].detach().numpy() for i, label in enumerate(labels) if label == "B-ANSWER"]
b_answer_texts = extract_text_from_boxes(image, b_answer_boxes, image_path, model_name)
return draw_boxes(image, encoding["bbox"][0], labels), b_answer_texts
else:
return draw_boxes(image, encoding["bbox"][0], labels), []
The issue is that, it does extract the "B-ANSWER" tags correctly but they are split into multiple boxes as shown in the image below:
I would like to only extract the items, quantity, and price from the receipt. Any help on this would be much appreciated, thanks!
