Get full text from a LayoutLM

94 Views Asked by At

I am using LayoutLM to read receipts and get text from the invoices. I am using this model from HuggingFace "philschmid/lilt-en-funsd". Given below is the code snippet:

def run_inference(image_path, model=model, processor=processor, output_image=True):
    # Load image from the path
    image = Image.open(image_path).convert("RGB")

    # get predictions
    encoding = processor(image, return_tensors="pt")
    del encoding["pixel_values"]
    outputs = model(**encoding)
    predictions = outputs.logits.argmax(-1).squeeze().tolist()
    labels = [model.config.id2label[prediction] for prediction in predictions]
    boxes = encoding["bbox"][0].tolist()
    model_name = model.name_or_path.split('/')[-1]

    if output_image:
        image_with_boxes = draw_boxes(image, encoding["bbox"][0], labels)
        b_answer_boxes = [encoding["bbox"][0][i].detach().numpy() for i, label in enumerate(labels) if label == "B-ANSWER"]
        b_answer_texts = extract_text_from_boxes(image, b_answer_boxes, image_path, model_name)
        return draw_boxes(image, encoding["bbox"][0], labels), b_answer_texts
    else:
        return draw_boxes(image, encoding["bbox"][0], labels), []

The issue is that, it does extract the "B-ANSWER" tags correctly but they are split into multiple boxes as shown in the image below:

enter image description here

I would like to only extract the items, quantity, and price from the receipt. Any help on this would be much appreciated, thanks!

0

There are 0 best solutions below