I want to transform images of written text (taken by a mobile phone camera with different 3D rotation, scale etc.) so that they look "orthogonal" (by this I mean as if you read them from a ebook reader). My next step will be to OCR these images and I want them not to be affected by camera rotation.
All input images have a header and footer which is identical to all input images. That is, there is an identical logo at the top of each image. Let's say these are title, logo etc. In between header and footer there is various text content whose height (i.e. text rows) varies.
Each page looks like this,
HEADER HEADER LOGO
-------------------
content |
content | height of content varies
... |
content |
------------------
FOOTER FOOTER LOGO
I have created two "templates" by taking one such input image page and applied manual transform to "orthogonalize" it and then isolated the header and footer by removing all text content in between.
So, I now have 2 template images TH and TF with no skew and correct rotation etc. These two template images contain only the header or footer and so they are much smaller in size than the input images which contain header, footer and text content in between.
Then, for each input image that needs to be registered I am calculating Feature descriptors (AKAZE/SIFT, etc.), using OpenCV, and matching those with features from the template TH (optionally with TF separately for robustness). A Homography Matrix (H) is calculated (via findHomography()) and applied to the input image via warpPerspective().
The features are calculated correctly, there are matches between input image and TH (or TF).
The problem is when I apply the homography to the input image to unskew it. It shrinks it since the template only contains the header and footer and not text content and therefore it is tiny compared to the input page.
Ideally, I would like the Homography Matrix to not contain any scaling or translating because the template is so tiny. All I want it to contain is the rotation/skew information. For me, it would be enough to rotate/"unskew" the input image for better results of the next stage which is OCR.
I am using OpenCV (python or C++ not a problem).
Can I remove some items from the Homography Matrix in order to keep only rotation?
Or is the proposed pipeline flawed?
I guess the most generic question is: how can I register a large image using a much smaller image as a reference (which contains say, just a logo).
EDIT: I have remove the sample code, please use the code in the answer I posted below: https://stackoverflow.com/a/77834722/385390 .
The problem I am trying to solve here is to fix the orientation of the 1st image (the input image) by registering it onto the 2nd image (the reference). The 1st image will always contain that logo contained in the 2nd image. This is a special case of registering two images using feature extraction but when the 2nd image is tiny and contains only 1 small portion of the 1st image in the correct orientation.
I am trying to solve this problem by passing both images through a feature extractor (say SIFT/AKAZE etc.), match the features, create a homography from the matched features and finally unwarp the 1st image.
The 3rd image shows the matches using SIFT feature extractor. It works quite well.
But there is a problem with the unwarped image (the 4th image). It is visible only below and to the right of the matched image area (matched to the reference image, the 2nd image). Everything on the left and above of that matched area is not visible.
This is the problem I was encountering when I posted my question.
My solution was to find the coordinates of the top-left matched feature in the 1st image space and subtract all 1st image's matched features' coordinates by that amount.
Another solution is to modify the Homography Matrix (H) and remove the x,y translation components. These are H[0,2] and H[1,2] (translation along x and y axes respectively). With this I am not sure if there are side-effects.
I can observe that the orientation of the final image is far from perfect. I am not sure if this a side-effect of this solution or just because that's how unwarping worked.
Here is the basic code to reproduce this workflow. Comments show where the solution is: