coremltools GPU usage with mlpackage on macOS, very slow inference/prediction

182 Views Asked by At

for a project, i converted a Yolov8 segmentation .pt model to .mlpackage, so that i can run it. everything runs fine, items of interests are detected on the video, but inference speed is the problem. it takes like 280 ms per item, extremely slow. i run the same unconverted model as .pt on colab or on my laptop, takes only a few ms.

i set the model.compute_unit to ALL or CPU_and_GPU, or any other options, still no GPU is used (you can check from the terminal output, GPU is not active). here is the code:

import coremltools as ct
import cv2
import numpy as np
from PIL import Image
import time
import os

def is_gpu_active():
    # Run the ioreg command and parse the output
    result = os.popen('ioreg -l | grep "performanceState"').read()
    return "performanceState\" = 2" in result

def letterbox_image(image, size):
    """Resize image with unchanged aspect ratio using padding."""
    ih, iw = image.shape[:2]
    w, h = size

    # Compute scale
    scale = min(w/iw, h/ih)
    nw = int(iw * scale)
    nh = int(ih * scale)

    # Resize the image using the computed scale
    image_resized = cv2.resize(image, (nw, nh))

    # Compute padding values
    top = (h - nh) // 2
    bottom = h - nh - top
    left = (w - nw) // 2
    right = w - nw - left

    # Add padding to make the image square
    image_padded = cv2.copyMakeBorder(image_resized, top, bottom, left, right, cv2.BORDER_CONSTANT, value=[0, 0, 0])
    return image_padded

# Load the Core ML model
model = ct.models.MLModel('vhssegmentation.mlpackage')

# Set the preferred device
model.compute_units = ct.ComputeUnit.ALL#

# Open the video file
cap = cv2.VideoCapture('VID_20230927_202037.mp4')

while cap.isOpened():
    print("----")
    ret, frame = cap.read()
    
    if not ret:
        break

    # Time the letterboxing operation
    start_time = time.time()
    frame = letterbox_image(frame, (640, 640))
    print(f"Letterboxing Time: {time.time() - start_time:.4f} seconds")
    
    # Time the conversion to PIL Image
    start_time = time.time()
    pil_image = Image.fromarray(frame)
    print(f"Conversion to PIL Image Time: {time.time() - start_time:.4f} seconds")

    # Time the prediction step
    start_time = time.time()
    output = model.predict({'image': pil_image})
    print(f"Prediction Time: {time.time() - start_time:.4f} seconds")

    # Time the post-processing step
    start_time = time.time()
    predictions = output['var_1279']
    mask = np.any(predictions[0, 4:7, :] > 0.5, axis=0)
    filtered_predictions = predictions[0, :, mask]
    for row in filtered_predictions:
        x, y, w, h = row[:4]
        x1 = int(x - w / 2)
        y1 = int(y - h / 2)
        x2 = int(x + w / 2)
        y2 = int(y + h / 2)
        classes = ['class0', 'class1', 'class2']
        detected_class = classes[np.argmax(row[4:7])]
        cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
        cv2.putText(frame, detected_class, (int(x), int(y)), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 2)
    print(f"Post-Processing Time: {time.time() - start_time:.4f} seconds")

    # Display the processed frame
    cv2.imshow('Frame', frame)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break
    if is_gpu_active():
        print("GPU is active")
    else:
        print("GPU is not active")    


cap.release()
cv2.destroyAllWindows()

and here is the sample view from remote Mac i use. and it does the yolo style detection enter image description here

and here are the sample times, from the terminal output. as you can see prediction/inference too slow:

Letterboxing Time: 0.0009 seconds
Conversion to PIL Image Time: 0.0006 seconds
Prediction Time: 0.2791 seconds
Post-Processing Time: 0.0013 seconds
GPU is not active

Letterboxing Time: 0.0010 seconds
Conversion to PIL Image Time: 0.0006 seconds
Prediction Time: 0.2839 seconds
Post-Processing Time: 0.0015 seconds
GPU is not active


Letterboxing Time: 0.0009 seconds
Conversion to PIL Image Time: 0.0006 seconds
Prediction Time: 0.2821 seconds
Post-Processing Time: 0.0010 seconds
GPU is not active

Edit: Adding the model specs, and some hardware information as asked model.get_spec().description.metadata info

shortDescription: "Ultralytics YOLOv8m-seg model trained on /content/dataset.yaml"
versionString: "8.0.153"
author: "Ultralytics"
license: "AGPL-3.0 https://ultralytics.com/license"
userDefined {
  key: "batch"
  value: "1"
}
userDefined {
  key: "com.github.apple.coremltools.source"
  value: "torch==2.0.1+cu118"
}
userDefined {
  key: "com.github.apple.coremltools.version"
  value: "7.0b1"
}
userDefined {
  key: "date"
  value: "2023-08-13T15:19:08.788039"
}
userDefined {
  key: "imgsz"
  value: "[640, 640]"
}
userDefined {
  key: "names"
  value: "{0: \'topvhs\', 1: \'frontvhs\', 2: \'sidevhs\'}"
}
userDefined {
  key: "stride"
  value: "32"
}
userDefined {
  key: "task"
  value: "segment"
}

some platform and os info

Darwin
('10.16', ('', '', ''), 'x86_64')
posix.uname_result(sysname='Darwin', nodename='perceptundrymbp.home', release='22.6.0', version='Darwin Kernel Version 22.6.0: Fri Sep 15 13:39:52 PDT 2023; root:xnu-8796.141.3.700.8~1/RELEASE_X86_64', machine='x86_64')

subprocess.check_output("system_profiler SPDisplaysDataType", shell=True) information about the GPU on the mac

Graphics/Displays:

    Intel HD Graphics 630:

      Chipset Model: Intel HD Graphics 630
      Type: GPU
      Bus: Built-In
      VRAM (Dynamic, Max): 1536 MB
      Vendor: Intel
      Device ID: 0x591b
      Revision ID: 0x0004
      Automatic Graphics Switching: Supported
      gMux Version: 4.0.29 [3.2.8]
      Metal Support: Metal 3
      Displays:
        Color LCD:
          Display Type: Built-In Retina LCD
          Resolution: 2880 x 1800 Retina
          Framebuffer Depth: 24-Bit Color (ARGB8888)
          Main Display: Yes
          Mirror: Off
          Online: Yes
          Automatically Adjust Brightness: Yes
          Connection Type: Internal

    Radeon Pro 555:

      Chipset Model: Radeon Pro 555
      Type: GPU
      Bus: PCIe
      PCIe Lane Width: x8
      VRAM (Total): 2 GB
      Vendor: AMD (0x1002)
      Device ID: 0x67ef
      Revision ID: 0x00c7
      ROM Revision: 113-C980AJ-927
      VBIOS Version: 113-C9801AP-A02
      EFI Driver Version: 01.A0.927
      Automatic Graphics Switching: Supported
      gMux Version: 4.0.29 [3.2.8]
      Metal Support: Metal 3

some more info about the remote mac i'm working with

enter image description here

some activity monitor while the python code is running

enter image description here

enter image description here

Adding more detailed GPU usage, as tadman suggested, with the gpu usage plots enter image description here

0

There are 0 best solutions below