I am currently discovering image recognition and my goal is to get all the informations from the board of a mobile game that look like this : board
As you can see, there are 5 different dices, their level can be between 1 and 7.
I know that AI recognition could solve this but I think it may be too much. Another idea i have is to make an average of lets say 20 frames so the projectiles will be erased (not really sure)
For example, projectiles can look like this : dice with projectiles
I want to get all the information from the board. It would be a 5*3 array with dice type and level for every position
I tried using pyautogui to look on the screen for each possible appearance of a dice but I encountered two problems:
- Detection time: with 1-2 images it was quite fast but once I added 14 (so two dice with every level), It became way too slow.
- Projectiles: Dices are firing projectiles that go over them, theses are not really annoying on early dice levels as their attack speed is slow and I correctly detect dices types and levels, but on higher level, they fire a lot faster and detection start to struggle.
I did all my test on the top-left position of the board with this:
def computeBoard():
saw = False
for dice in dicesList:
try:
if pyautogui.locateOnScreen(dice, region=(940, 580, 65, 65), confidence=0.90) is not None:
print("I can see " + dice)
saw = True
break
except pyautogui.ImageNotFoundException:
pass
if saw == False:
print("I can't see any dice")
Lowering the tolerance helps the detection, but the program confounds different levels together.
(dicesList is simply a list of all images path)
Also I selected python because it's used on the tutorial I saw on youtube, but I can switch without problems if needed.
Are there solutions to my problem ?
Thanks in advance :)
Yes, absolutely, you can easily "erase" the moving objects.
But rather than Average, prefer Median. Averaging will leave behind faint ghost traces of projectiles. Median OTOH is a robust statistic that will completely discard such outliers.
Suppose that if we're given N frames, a projectile will reliably enter then exit a given point. Then computing each point's median brightness value across slightly more than 2 × N frames will reliably remove any projectile artifacts. (Now, if another projectile follows very closely upon its heels, you will need additional input frames.)