Azure AI search get bounding box or coordinates of search result

68 Views Asked by Priyanka V At 31 January 2024 at 12:48

I am storing the pdf data along with embeddings in azure search. Can i also save the bounding box details of the scanned data . And while querying get the bounding box details along with the search result

Original Q&A

There are 1 best solutions below

Rishabh Meshram On 02 February 2024 at 10:09

PDF files do not inherently contain bounding box information for text. It's the text extraction process that calculates bounding box information. In the context of Azure Cognitive Search, you can use the OCR capabilities of Azure Cognitive Services to extract text and its bounding box information from PDF files.

To store PDF data along with bounding box details of the scanned data in Azure Search and retrieve them while querying, you can refer to below steps:

Extract text and bounding box information: You can use Azure Cognitive Services (Computer Vision Read API or Form Recognizer Layout API) to extract text and bounding box information from your PDF files.
- Computer Vision Read API documentation
- Form Recognizer Layout API documentation
Create a custom index schema: Define an index schema in Azure Cognitive Search that includes fields for the bounding box information.

{  
  "name": "pdf-index",  
  "fields": [  
    { "name": "id", "type": "Edm.String", "key": true, "searchable": false },  
    { "name": "content", "type": "Edm.String", "searchable": true, "filterable": false, "sortable": false, "facetable": false },  
    { "name": "boundingBox", "type": "Collection(Edm.String)", "searchable": false, "filterable": false, "sortable": false, "facetable": false }  
  ]  
}

Index the extracted data: Index your PDF data along with the bounding box details in the corresponding fields of the custom index schema.

By following these steps, you can store bounding box details of the scanned data in Azure Search and retrieve them along with the search result.

You can also check Azure Cognitive Search OCR skill to extract text from images that showcase scenarios with PDFs and Visualize bounding boxes.

Azure AI search get bounding box or coordinates of search result

There are 1 best solutions below

Related Questions in PYTHON-3.X

Related Questions in AZURE-FORM-RECOGNIZER

Related Questions in AZURE-AI-SEARCH

Trending Questions

Popular # Hahtags

Popular Questions