how to classified and digitalized huge amount of paper using python

121 Views Asked by At

I have an archive papers in a company representing different business operation form different sections. I want to scan all these documents and after that I want a way to classify all these scanned document into different category and sub-category based on custom preference such as (name, age, section, ..etc).

I want the end result to be digital files categorized according to the preferences that I set.

How can I do this using Python NLP or any other machine learning approach

1

There are 1 best solutions below

0
Alessandro Togni On

I think that this can be a basic pipeline:

  • Scanning part: papers images preprocessing with opencv + text extraction using some OCR libraries (pytesseract, easyOCR);
  • Topic extraction: get the desired information to classify the documents using e.g. Spacy
  • Cathegorize using simply python, maybe pandas.