How to separate the rectangular stamp from the image/pdf in python?

184 Views Asked by At

I have a specific task where I need to extract stamps from both PDF and image files. However, the challenge is that the stamps are not consistently located in the same region in every file. The stamp's position varies significantly across different documents, making it challenging to automate the extraction process efficiently.

I have attempted to preprocess the files by applying techniques such as thresholding and contour detection. The idea behind this is to identify the stamps' boundaries by detecting contours in the images, and then create bounding boxes around these regions.

How do I proceed? Thank you in advance. These are example images, there are several images and stamp location varies (https://i.stack.imgur.com/V2O6e.jpg) (https://i.stack.imgur.com/VU6DQ.jpg)

1

There are 1 best solutions below

1
K J On

The Law Manual is quite specific that these stamps should be controlled, as potentially penalties may be imposed when uncontrolled.

thresholding methods are problematic with such a B&W source. enter image description hereenter image description here

So your best approach is to train Google CV exactly what object is to be matched by using a clean one. This is then used in "Template Matching"

enter image description here

Open CV has many different matching methods so for example https://docs.opencv.org/2.4/doc/tutorials/features2d/table_of_content_features2d/table_of_content_features2d.html

enter image description here