for a project, i converted a Yolov8 segmentation .pt model to .mlpackage, so that i can run it. everything runs fine, items of interests are detected on the video, but inference speed is the problem. it takes like 280 ms per item, extremely slow. i run the same unconverted model as .pt on colab or on my laptop, takes only a few ms.
i set the model.compute_unit to ALL or CPU_and_GPU, or any other options, still no GPU is used (you can check from the terminal output, GPU is not active). here is the code:
import coremltools as ct
import cv2
import numpy as np
from PIL import Image
import time
import os
def is_gpu_active():
# Run the ioreg command and parse the output
result = os.popen('ioreg -l | grep "performanceState"').read()
return "performanceState\" = 2" in result
def letterbox_image(image, size):
"""Resize image with unchanged aspect ratio using padding."""
ih, iw = image.shape[:2]
w, h = size
# Compute scale
scale = min(w/iw, h/ih)
nw = int(iw * scale)
nh = int(ih * scale)
# Resize the image using the computed scale
image_resized = cv2.resize(image, (nw, nh))
# Compute padding values
top = (h - nh) // 2
bottom = h - nh - top
left = (w - nw) // 2
right = w - nw - left
# Add padding to make the image square
image_padded = cv2.copyMakeBorder(image_resized, top, bottom, left, right, cv2.BORDER_CONSTANT, value=[0, 0, 0])
return image_padded
# Load the Core ML model
model = ct.models.MLModel('vhssegmentation.mlpackage')
# Set the preferred device
model.compute_units = ct.ComputeUnit.ALL#
# Open the video file
cap = cv2.VideoCapture('VID_20230927_202037.mp4')
while cap.isOpened():
print("----")
ret, frame = cap.read()
if not ret:
break
# Time the letterboxing operation
start_time = time.time()
frame = letterbox_image(frame, (640, 640))
print(f"Letterboxing Time: {time.time() - start_time:.4f} seconds")
# Time the conversion to PIL Image
start_time = time.time()
pil_image = Image.fromarray(frame)
print(f"Conversion to PIL Image Time: {time.time() - start_time:.4f} seconds")
# Time the prediction step
start_time = time.time()
output = model.predict({'image': pil_image})
print(f"Prediction Time: {time.time() - start_time:.4f} seconds")
# Time the post-processing step
start_time = time.time()
predictions = output['var_1279']
mask = np.any(predictions[0, 4:7, :] > 0.5, axis=0)
filtered_predictions = predictions[0, :, mask]
for row in filtered_predictions:
x, y, w, h = row[:4]
x1 = int(x - w / 2)
y1 = int(y - h / 2)
x2 = int(x + w / 2)
y2 = int(y + h / 2)
classes = ['class0', 'class1', 'class2']
detected_class = classes[np.argmax(row[4:7])]
cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
cv2.putText(frame, detected_class, (int(x), int(y)), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 2)
print(f"Post-Processing Time: {time.time() - start_time:.4f} seconds")
# Display the processed frame
cv2.imshow('Frame', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
if is_gpu_active():
print("GPU is active")
else:
print("GPU is not active")
cap.release()
cv2.destroyAllWindows()
and here is the sample view from remote Mac i use. and it does the yolo style detection

and here are the sample times, from the terminal output. as you can see prediction/inference too slow:
Letterboxing Time: 0.0009 seconds
Conversion to PIL Image Time: 0.0006 seconds
Prediction Time: 0.2791 seconds
Post-Processing Time: 0.0013 seconds
GPU is not active
Letterboxing Time: 0.0010 seconds
Conversion to PIL Image Time: 0.0006 seconds
Prediction Time: 0.2839 seconds
Post-Processing Time: 0.0015 seconds
GPU is not active
Letterboxing Time: 0.0009 seconds
Conversion to PIL Image Time: 0.0006 seconds
Prediction Time: 0.2821 seconds
Post-Processing Time: 0.0010 seconds
GPU is not active
Edit: Adding the model specs, and some hardware information as asked model.get_spec().description.metadata info
shortDescription: "Ultralytics YOLOv8m-seg model trained on /content/dataset.yaml"
versionString: "8.0.153"
author: "Ultralytics"
license: "AGPL-3.0 https://ultralytics.com/license"
userDefined {
key: "batch"
value: "1"
}
userDefined {
key: "com.github.apple.coremltools.source"
value: "torch==2.0.1+cu118"
}
userDefined {
key: "com.github.apple.coremltools.version"
value: "7.0b1"
}
userDefined {
key: "date"
value: "2023-08-13T15:19:08.788039"
}
userDefined {
key: "imgsz"
value: "[640, 640]"
}
userDefined {
key: "names"
value: "{0: \'topvhs\', 1: \'frontvhs\', 2: \'sidevhs\'}"
}
userDefined {
key: "stride"
value: "32"
}
userDefined {
key: "task"
value: "segment"
}
some platform and os info
Darwin
('10.16', ('', '', ''), 'x86_64')
posix.uname_result(sysname='Darwin', nodename='perceptundrymbp.home', release='22.6.0', version='Darwin Kernel Version 22.6.0: Fri Sep 15 13:39:52 PDT 2023; root:xnu-8796.141.3.700.8~1/RELEASE_X86_64', machine='x86_64')
subprocess.check_output("system_profiler SPDisplaysDataType", shell=True) information about the GPU on the mac
Graphics/Displays:
Intel HD Graphics 630:
Chipset Model: Intel HD Graphics 630
Type: GPU
Bus: Built-In
VRAM (Dynamic, Max): 1536 MB
Vendor: Intel
Device ID: 0x591b
Revision ID: 0x0004
Automatic Graphics Switching: Supported
gMux Version: 4.0.29 [3.2.8]
Metal Support: Metal 3
Displays:
Color LCD:
Display Type: Built-In Retina LCD
Resolution: 2880 x 1800 Retina
Framebuffer Depth: 24-Bit Color (ARGB8888)
Main Display: Yes
Mirror: Off
Online: Yes
Automatically Adjust Brightness: Yes
Connection Type: Internal
Radeon Pro 555:
Chipset Model: Radeon Pro 555
Type: GPU
Bus: PCIe
PCIe Lane Width: x8
VRAM (Total): 2 GB
Vendor: AMD (0x1002)
Device ID: 0x67ef
Revision ID: 0x00c7
ROM Revision: 113-C980AJ-927
VBIOS Version: 113-C9801AP-A02
EFI Driver Version: 01.A0.927
Automatic Graphics Switching: Supported
gMux Version: 4.0.29 [3.2.8]
Metal Support: Metal 3
some more info about the remote mac i'm working with
some activity monitor while the python code is running
Adding more detailed GPU usage, as tadman suggested, with the gpu usage plots



