I'm working on a project where I need to preprocess a region of interest (ROI) on my screen to retrieve text more accurately using Pytesseract. I've managed to capture the screen and define ROIs based on template matching with OpenCV, but I'm struggling to preprocess these ROIs effectively for text recognition.
Here's the main function I'm using to preprocess the images and print the results from pytesseract.image_to_string() :
### images is a dict and the frame is a numpy array containing the frames captured with pyautogui.screenshot() ###
def draw_boxes(images, frame):
rectangle_color = (0, 255, 0)
### SEARCH FOR TEMPLATE ON SCREEN USING THE IMAGES FROM images.items() ###
for name, img in images.items():
gray_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
result = cv2.matchTemplate(gray_frame, gray_img, cv2.TM_CCOEFF_NORMED)
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(result)
### IF TEMPLATE IS FOUND IN SCREEN DEFINE A REGION OF INTEREST ###
if max_val > 0.5:
### MAKES A RECTANGLE ON CV2 OUTPUT FOR DEBUG ###
top_left = max_loc
bottom_right = (top_left[0] + img.shape[1], top_left[1] + img.shape[0])
cv2.rectangle(frame, top_left, bottom_right, rectangle_color, 2)
### DEFINING THE REGION OF INTEREST ###
x, y, w, h = max_loc[0], max_loc[1], img.shape[1], img.shape[0]
roi = frame[y:y+h, x:x+w]
### PRE PROCESSING THE REGION OF INTEREST ###
roi_gray = cv2.cvtColor(roi, cv2.COLOR_BGR2GRAY)
blurred_image = cv2.medianBlur(roi_gray, 1, 5)
denoised_image = cv2.fastNlMeansDenoising(roi_gray, None, 12, 7, 21)
roi_processed = denoised_image
### APPLYING BINARY THRESHOLD FOR PYTESSERACT TO READ FROM ###
#inverted_image = cv2.bitwise_not(roi_processed)
otsu_threshold_value, thresh_image = cv2.threshold(roi_processed, 1, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
### SHOW OCR VIEW FOR DEBUG AND DEVELOPMENT ###
cv2.imshow('OCR View', thresh_image)
cv2.waitKey(1)
custom_config = r'--oem 3 --psm 11'
text = pytesseract.image_to_string(thresh_image, lang='eng', config=custom_config)
try:
pass
print(text)
except UnboundLocalError:
print('unbound')
Here's an example of the console output I get from Pytesseract:
Buy Offer
Ectoplasm
Gx
:
A substance from thase passing through the Underworld.
2
2
3379
Buy limit per 4 hours: 25,000
Quantity
Your price per item:
25,000
**bais**
85,375,000 coins
You have bought | so far for 3,415 coins.
125,000
I've noticed that sometimes numbers like "3,415" are misread as "bais" by Pytesseract. Below are images showing the ROI before and after preprocessing. The only thing I can think of is to make the processed image clearer, but can't figure out how.
Here is the ROI without pre processing:

Here is the ROI with pre processing:

I'm working on a project where I need to preprocess a region of interest (ROI) on my screen to retrieve text more accurately using Pytesseract. I've managed to capture the screen and define ROIs based on template matching with OpenCV, but I'm struggling to preprocess these ROIs effectively for text recognition.
Okay, after trying out denoising, dilating, sharpening, you name it. I tried resizing the image passed in the:
I added some blur and it seems to be reading the ROI pretty accurately here is new console output:
I feel like this made it much more reliable but I have more testing to do.
To wrap this up. If the text in the image is really small, resize the image or ROI in this case.
Here is how I did it:
This makes the image as a whole bigger which also makes the text on it bigger. The blur is optional.