How to Train a YOLO Model with Locally Downloaded Open Images Dataset?

512 Views Asked by At

I have recently downloaded the Open Images dataset to train a YOLO (You Only Look Once) model for a computer vision project. However, I am facing some challenges and I am seeking guidance on how to proceed. Here are the details of my setup:

Dataset Download: I have downloaded the Open Images dataset, including test, train, and validation data. The dataset is organized into three folders: test, train, and validation. ** Absence of .yaml File:** YOLO requires a .yaml file for configuring the dataset. However, I couldn't find any .yaml files in the downloaded dataset. Do I need to create one and if so, how can I create a .yaml file for YOLO?

Dataset Structure: Inside the train folder there are three subfolders: data, labels, and metadata. The labels folder contains three large files: classification, detection, and segmentation. Each file has a size in gigabytes. Is this normal or did I download the dataset incorrectly? It is for only 150 images

I would greatly appreciate any assistance or guidance on how to properly configure the dataset for YOLO training.I am working on my college project and at the moment I am stuck at this point having no idea of what should I do.

I have downloaded the Open Images dataset, including test, train, and validation data. The dataset is organized into three folders: test, train, and validation. To train custom YOLO model I need to give t a .yaml file. But the downloaded dataset have no .yaml file

1

There are 1 best solutions below

0
hanna_liavoshka On

The labels folder contains three large files: classification, detection, and segmentation. Each file has a size in gigabytes. Is this normal or did I download the dataset incorrectly? It is for only 150 images

The complete Open Images V7 dataset comprises 1,743,042 training images and 41,620 validation images, requiring approximately 561 GB of storage space upon download, as stated in the Ultralytics YOLOv8 Docs. I suppose you have downloaded only a subset of it, but the label files still can contain information for the full version.

I couldn't find any .yaml files in the downloaded dataset. Do I need to create one and if so, how can I create a .yaml file for YOLO?

The dataset you have is not in YOLO format now, so yes, you need to create a dataset.yaml file manually. Fortunately, it is not a big deal: a dataset.yaml file contains information about where the dataset is located and what classes it has. The example is here.

# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/coco8  # dataset root dir
train: images/train  # train images (relative to 'path') 4 images
val: images/val  # val images (relative to 'path') 4 images
test:  # test images (optional)

# Classes (80 COCO classes)
names:
  0: person
  1: bicycle
  2: car
  # ...
  77: teddy bear
  78: hair drier
  79: toothbrush

How to properly configure the dataset for YOLO training?

To train a YOLO model, it is necessary to have a dataset in YOLO format, either download it already formatted or format it yourself. The first case is to find the right source, for instance, you can find open-source datasets at https://universe.roboflow.com/, and most of them can be downloaded in different versions of YOLO format. The second case is to format the dataset yourself. Here are some steps:

  1. Choose the task you need: image classification, object detection, segmentation, or other: https://docs.ultralytics.com/tasks/.
  2. Explore the task-related format of the labels here. For instance, the object detection label format is described here: https://docs.ultralytics.com/datasets/detect/#ultralytics-yolo-format
  3. Explore the dataset format you have, how to iterate through its annotations, and how to find the relevant information for your custom yolo dataset.
  4. You can try to find a ready-made solution for how to translate one dataset format to another. For instance, https://github.com/ibaiGorordo/OpenImages-Yolo-converter can be useful.
  5. If there are no ready-made solutions, iterate through the dataset annotations and obtain the relevant information (for the object detection task it will be bounding box coordinates and class names of the detected objects for all images you have). If needed, transform the obtained information to the YOLO form, for instance: Xmin, Xmax, Ymin, Ymax bounding box coordinates to x-centre, y-centre, w, h (there are ready-made solutions for this kind of task, google it!).
  6. Write the obtained and transformed information in yolo annotation .txt files.
  7. Structure your custom YOLO dataset for the training. For the object detection task, the working solution is:
dataset/
|-- train/
| |-- images/
| |-- labels/
|-- val/
| |-- images/
| |-- labels/
|-- test/
| |-- images/
| |-- labels/