I'm trying to extract the information of the table of content of a book. Given one or multiple images of a table of content I need to understand the chapters name, the pages, etc. Basically the information of the table of the content of the book.
This is an example of a possible input:
The output should be a dictionary (stupid but structured example):
chapter1: {
name: "The Basics of Blender"
page: 1
content: [
name: "What ..."
]
}
I only found a couple of papers but I did not understand them. Do you know if there are some pertained models for this task?
A simple ocr is not a possible solutions since I need to understand the context maybe with a document layout analysis. There aren't many documentations about this specific topic.
Papers:
