I am solving a problem of finding the objects on image given template.
Example of image:

Example of template
So far I've come up with the following approach:
- Use some detected, e.g.
siftfor finding keypoints - Match keypoints
- Cluster them
It looks like
sift = cv2.SIFT_create()
# find the keypoints and descriptors with SIFT
kp1, des1 = sift.detectAndCompute(img,None)
kp2, des2 = sift.detectAndCompute(query,None)
# BFMatcher with default params
bf = cv2.BFMatcher()
matches = bf.knnMatch(des1,des2,k=2)
# Apply ratio test
good = []
for m,n in matches:
if m.distance < 0.5*n.distance:
good.append([m])
# cv.drawMatchesKnn expects list of lists as matches.
img3 = cv2.drawMatchesKnn(img,kp1,query,kp2,good,None,flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)
plt.imshow(img3)
plt.show()
with the outcome
But I am stuck here. How could I use these matches to actually find the bboxes of objects present on image. I've tried to create grid, based on keypoints and size of template:
And then using cv2.matchTemplate find the objects in area around each cell (window shifting), but it didn't work quite well. How should I deal with it?



I hope it is not too late, but it would be a good idea to close this question.
I have tried to develop a piece of code for solving your problem following your approach.
First I have created a mask to identify the whiter zones.
Then, I have thresholded the v channel of the HSV color-space and joined it with the other mask.
Then, I find all the connected components of the mask.
Then, I compute the SIFT descriptor to both input image and the query image. On the good matches, I find the position of the keypoint to link it with the connected component at that position.
And the last step is to draw the BBox of each connected components which has a keypoint asigned.
I have tried other methods as
cv2.matchTemplate, but it did not work. Furthermore, I think the result could be better since I had to screenshot the images from your answer and I obtained less good keypoints. However, the drink cartons are extremely hard to individually segment, but if you find a better method to segment them it will work perfectly.Hope it works!