Register image onto a smaller, partial, image but with some common features

68 Views Asked by At

I want to transform images of written text (taken by a mobile phone camera with different 3D rotation, scale etc.) so that they look "orthogonal" (by this I mean as if you read them from a ebook reader). My next step will be to OCR these images and I want them not to be affected by camera rotation.

All input images have a header and footer which is identical to all input images. That is, there is an identical logo at the top of each image. Let's say these are title, logo etc. In between header and footer there is various text content whose height (i.e. text rows) varies.

Each page looks like this,

HEADER HEADER LOGO
-------------------
content   |
content   | height of content varies
...       |
content   |
------------------
FOOTER FOOTER LOGO

I have created two "templates" by taking one such input image page and applied manual transform to "orthogonalize" it and then isolated the header and footer by removing all text content in between.

So, I now have 2 template images TH and TF with no skew and correct rotation etc. These two template images contain only the header or footer and so they are much smaller in size than the input images which contain header, footer and text content in between.

Then, for each input image that needs to be registered I am calculating Feature descriptors (AKAZE/SIFT, etc.), using OpenCV, and matching those with features from the template TH (optionally with TF separately for robustness). A Homography Matrix (H) is calculated (via findHomography()) and applied to the input image via warpPerspective().

The features are calculated correctly, there are matches between input image and TH (or TF).

The problem is when I apply the homography to the input image to unskew it. It shrinks it since the template only contains the header and footer and not text content and therefore it is tiny compared to the input page.

Ideally, I would like the Homography Matrix to not contain any scaling or translating because the template is so tiny. All I want it to contain is the rotation/skew information. For me, it would be enough to rotate/"unskew" the input image for better results of the next stage which is OCR.

I am using OpenCV (python or C++ not a problem).

Can I remove some items from the Homography Matrix in order to keep only rotation?

Or is the proposed pipeline flawed?

I guess the most generic question is: how can I register a large image using a much smaller image as a reference (which contains say, just a logo).

EDIT: I have remove the sample code, please use the code in the answer I posted below: https://stackoverflow.com/a/77834722/385390 .

1

There are 1 best solutions below

0
bliako On

the input image, skewed etc.

this is the reference image, it has correct orientation and it will be matched with some part of the input image

The problem I am trying to solve here is to fix the orientation of the 1st image (the input image) by registering it onto the 2nd image (the reference). The 1st image will always contain that logo contained in the 2nd image. This is a special case of registering two images using feature extraction but when the 2nd image is tiny and contains only 1 small portion of the 1st image in the correct orientation.

I am trying to solve this problem by passing both images through a feature extractor (say SIFT/AKAZE etc.), match the features, create a homography from the matched features and finally unwarp the 1st image.

The 3rd image shows the matches using SIFT feature extractor. It works quite well.

But there is a problem with the unwarped image (the 4th image). It is visible only below and to the right of the matched image area (matched to the reference image, the 2nd image). Everything on the left and above of that matched area is not visible.

This is the problem I was encountering when I posted my question.

Matches between input and reference images using SIFT

The result looks to have been unwarped quite well and its orientation has been corrected BUT the part of it from the left and top of the matched area is missing

My solution was to find the coordinates of the top-left matched feature in the 1st image space and subtract all 1st image's matched features' coordinates by that amount.

Another solution is to modify the Homography Matrix (H) and remove the x,y translation components. These are H[0,2] and H[1,2] (translation along x and y axes respectively). With this I am not sure if there are side-effects.

I can observe that the orientation of the final image is far from perfect. I am not sure if this a side-effect of this solution or just because that's how unwarping worked.

The output image is now totally visible but the efficacy of the unwarping is not good

Here is the basic code to reproduce this workflow. Comments show where the solution is:

import numpy as np
import cv2 as cv

img_ref = cv.imread('SOsubmit/testref.jpg', cv.IMREAD_GRAYSCALE)
# sensedImage
img_inp = cv.imread('SOsubmit/testskewed.jpg', cv.IMREAD_GRAYSCALE)

# Initiate SIFT detector 
sift_detector = cv.SIFT_create()

# Find the keypoints and descriptors with SIFT on the lower resolution images
kp_ref, des_ref = sift_detector.detectAndCompute(img_ref, None)
kp_inp, des_inp = sift_detector.detectAndCompute(img_inp, None)
if des_ref is None or des_inp is None:
    print("failed with descriptors")
    exit(1)

# BFMatcher with default params
bf = cv.BFMatcher()
matches = bf.knnMatch(des_ref, des_inp, k=2)

# Filter out poor matches
good_matches = []
for m,n in matches:
    if m.distance < 0.75*n.distance:
        good_matches.append(m)

matches = good_matches
points_ref = np.zeros((len(matches), 2), dtype=np.float32)
points_inp = np.zeros((len(matches), 2), dtype=np.float32)

img_matches = np.empty((max(img_ref.shape[0], img_inp.shape[0]), img_ref.shape[1]+img_inp.shape[1], 3), dtype=np.uint8)
cv.drawMatches(img_ref, kp_ref, img_inp, kp_inp,
    matches, img_matches, flags=cv.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS
)
cv.imwrite("matches.jpg", img_matches);

# if matches = bf.knnMatch(des_ref, des_inp, k=2)
# then queryIdx is the index of kp_ref (corresponding to des_ref)
# and trainIdx to kp_inp
for i, match in enumerate(matches):
    points_ref[i, :] = kp_ref[match.queryIdx].pt
    points_inp[i, :] = kp_inp[match.trainIdx].pt
    print(i, ") ", kp_ref[match.queryIdx].pt, " -> ", kp_inp[match.trainIdx].pt)

# find the bounding box of the matches on the target image img_inp
ximg_inp=points_inp[0][0]
yimg_inp=points_inp[0][1]
for p in points_inp:
    if p[0] < ximg_inp: ximg_inp = p[0]
    if p[1] < yimg_inp: yimg_inp = p[1]
print("matched points on img_inp start at coordinates: ", ximg_inp, yimg_inp)
# shift the matched points of on img_inp
for p in points_inp: p[0] -= ximg_inp; p[1] -= yimg_inp

# Find homography
#H, mask = cv.findHomography(points_ref, points_inp, cv.RANSAC)
#print("Homography 1:\n", H)
H, mask2 = cv.findHomography(points_inp, points_ref, cv.RANSAC)
print("Homography 2:\n", H)
# alternative solution: remove the translation component of H
# Does it affect the transform in other ways?
#H[0][2] = 0
#H[1][2] = 0
# Warp image 1 to align with image 2
img_ref_unwarped = cv.warpPerspective(
    img_inp,
    H,
    # width is shape[1], height is shape[0]
    (img_inp.shape[1]+int(ximg_inp), img_inp.shape[0]+int(yimg_inp))
)
print("unwarped image size: ", img_ref_unwarped.shape)

cv.imwrite('output.jpg', img_ref_unwarped)
print("unwarped image saved to 'output.jpg'")