I have an archive papers in a company representing different business operation form different sections. I want to scan all these documents and after that I want a way to classify all these scanned document into different category and sub-category based on custom preference such as (name, age, section, ..etc).
I want the end result to be digital files categorized according to the preferences that I set.
How can I do this using Python NLP or any other machine learning approach
I think that this can be a basic pipeline:
opencv+ text extraction using some OCR libraries (pytesseract,easyOCR);Spacypandas.