I am trying to create a parking assistance program that will use machine vision to map out which parking spots in a parking lot are open. To do this, I am using the YOLOv8 default image detection model, which is very broad as it is able to detect many more things than just cars. To optimize this process I have been looking for other pre-trained models compatible with YOLO but I have not had much success other than being pointed in the direction of the Stanford Cars Dataset. My problem arises when trying to convert 16,000+ images into a useful file to put into this model. The dataset also contains annotation data in the form of a matlab file but I have no idea how I get to a finished model from here. Are there any resources that can help streamline this process? The model I'm using now is in the form of a .pt file. How do I get there from 16,000 images and a .mat file?

This is my first time attempting something like this, especially for a senior design project. Thanks in advance!

I've tried using the default model provided by ultralytics but it is too broad and it isn't able to recognize cars from more difficult angles like from straight up. I've looked around the internet for tutorials on how to use this dataset but a lot of them skim over the use of things like Google Colab and Roboflow so I have no idea what I'm supposed to do for that.

1

There are 1 best solutions below

0
Stéphane On

Part of the problem is you're mistaking a demonstration for a usable product. When you say:

I am using the YOLOv8 default image detection model, which is very broad as it is able to detect many more things than just cars.

What I'm guessing you mean is you've downloaded the MSCOCO pre-trained weights. There are 80 classes detected when you train with the MSCOCO dataset. It is not meant to be a final product, but a demo of what can be done. If what you happen to need is covered by this, then great. But you definitely should not rely on MSCOCO dataset.

Instead, with all the YOLO frameworks, you are meant to train your own network. This can be especially useful if your camera takes images that are different from what is included in MSCOCO. For example, if your parking area uses fisheye cameras. Or your camera is mounted high on a ceiling, a wall, or a poll. Because MSCOCO doesn't have those type of images.

If I look at MSCOCO, out of 163957 images, when I filter for car (index #2), motorcycle (index #3), and truck (index #7), I end up with 17385 images. These images look like this:

airport tarmac train station street at night

As you can see, what is labelled as "car" or "truck" is probably of limited use to you for use in a parking lot or parking garage.

Training your own network is actually relatively simple, and will get you much better results. Here is an example how-to video showing how you can both annotate a multi-class network and train in less than 30 minutes: https://www.youtube.com/watch?v=ciEcM6kvr3w

Getting from there to Google colab is relatively simple. Colab just means someone else's computer that happens to run a linux distro. But the build and training steps are more-or-less the same. The Darknet/YOLO discord server has a channel with several notebooks showing exactly how to install and use the Darknet framework to use YOLO models on colab. You can get to the Darknet/YOLO discord server here: https://discord.gg/zSq8rtW

While I don't yet have a YouTube tutorial showing how to use Darknet/YOLO on colab, you've now given me an idea for my next video. I will try to get that done tonight. Take a look at the video and channel I linked to 2 paragraphs up.