Azure AI search get bounding box or coordinates of search result

68 Views Asked by At

I am storing the pdf data along with embeddings in azure search. Can i also save the bounding box details of the scanned data . And while querying get the bounding box details along with the search result

1

There are 1 best solutions below

0
Rishabh Meshram On

PDF files do not inherently contain bounding box information for text. It's the text extraction process that calculates bounding box information. In the context of Azure Cognitive Search, you can use the OCR capabilities of Azure Cognitive Services to extract text and its bounding box information from PDF files.

To store PDF data along with bounding box details of the scanned data in Azure Search and retrieve them while querying, you can refer to below steps:

  1. Extract text and bounding box information: You can use Azure Cognitive Services (Computer Vision Read API or Form Recognizer Layout API) to extract text and bounding box information from your PDF files.

  2. Create a custom index schema: Define an index schema in Azure Cognitive Search that includes fields for the bounding box information.

{  
  "name": "pdf-index",  
  "fields": [  
    { "name": "id", "type": "Edm.String", "key": true, "searchable": false },  
    { "name": "content", "type": "Edm.String", "searchable": true, "filterable": false, "sortable": false, "facetable": false },  
    { "name": "boundingBox", "type": "Collection(Edm.String)", "searchable": false, "filterable": false, "sortable": false, "facetable": false }  
  ]  
}  
  1. Index the extracted data: Index your PDF data along with the bounding box details in the corresponding fields of the custom index schema.

By following these steps, you can store bounding box details of the scanned data in Azure Search and retrieve them along with the search result.

You can also check Azure Cognitive Search OCR skill to extract text from images that showcase scenarios with PDFs and Visualize bounding boxes.