Playing audio file in YOLOv8 in real-time detection

37 Views Asked by At

I am working in YOLOv8 project to detect drowsiness and play a alarm audio file if drowsiness is detected. The problem i am facing is i can't able to play the audio in real-time because my detections are stored in results at first. Once I close the detection window then it access the data stored in the results and continuously playing the audio. How can i solve this??

import os
from ultralytics import YOLO

import torch
import matplotlib
import numpy as np
import cv2
import pygame  

pygame.init()
sound_to_play = pygame.mixer.Sound(r'D:\ML\Syncronised vigilance for driver\alarm.wav')  
sound_to_play.play()

model = YOLO(r'C:\Users\HP\Downloads\last.pt')

cap = cv2.VideoCapture(0) 
while True:
    ret, frame = cap.read()

    results = model.predict(source="0",show=True)  
    for r in results:
        if len(r.boxes.cls)>0:
            dclass=r.boxes.cls[0].item()
            print(dclass)
            if dclass==2.0:
              sound_to_play.play()
    if cv2.waitKey(1) == ord('q'):
        break

pygame.quit()
cap.release()
cv2.destroyAllWindows()

Problem is my code first do detection and store it in result then go into the for loop. Expected output is it has detect as well as check the class value at the same time.

1

There are 1 best solutions below

0
hanna_liavoshka On

As you correctly noted, the current code performs the detection on a whole source video first, as it was set in results = model.predict(source="0",show=True). What you need is to pass every frame separately to the predict() function to run the rest of the logic. You can do this in two ways. First is to use stream=True, which utilizes a generator and only keeps the results of the current frame in memory. Look at the docs for an example. The code will look like this (may need some correction for the exact case):

model = YOLO(r'last.pt')
results = model.predict(source="0", show=True, stream=True)
for r in results:
    # logic for playing sound if the condition was fulfilled
    if len(r.boxes.cls)>0:
            dclass=r.boxes.cls[0].item()
            print(dclass)
            if dclass==2.0:
              sound_to_play.play()
    # you may need to use this statement to activate show=True option
    #next(results)

Another way is to manually iterate through the frames (as you already do) and pass them to the predict() function one by one, source=frame:

cap = cv2.VideoCapture(0) 
while True:
    ret, frame = cap.read()
    results = model.predict(source=frame, show=True)  
    for r in results:
        if len(r.boxes.cls)>0:
            dclass=r.boxes.cls[0].item()
            print(dclass)
            if dclass==2.0:
              sound_to_play.play()

You may need to add additional code to perform show=True option correctly for each of these cases.