TorchServe - Support GPU and CPU models on one docker container

284 Views Asked by At

I am using TorchServe to serve 3 models on my computer with GPU capacity, Torchserve is deployed as a docker container:

  server:
    image: pytorch/torchserve:test-gpu
    container_name: {container_name}
    restart: always
    volumes:
      - type: bind
        source: ./model_store
        target: /home/model-server/model-store
    healthcheck:
      test:
        [
          "CMD-SHELL",
          "curl --silent --fail http://server:8080/ping || exit 1"
        ]
      interval: 30s
      timeout: 15s
      retries: 5
    ports:
      - 8080:8080
      - 8081:8081
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]

I want to serve one of the models on CPU and 2 others on GPU, now supported by Torchserve 0.8.0. I am following the instruction from the Torch Model Archiver README file:https://github.com/pytorch/serve/tree/master/model-archiver#config-file in order to define our CPU model like the following:

minWorkers: 1
maxWorkers: 2
batchSize: 1
deviceType: CPU

But when I describe my model I see the workers are still on GPU:

[
  {
    "modelName": "{model_name}",
    "modelVersion": "1.1",
    "modelUrl": "{model_name}.mar",
    "runtime": "python",
    "minWorkers": 1,
    "maxWorkers": 2,
    "batchSize": 1,
    "maxBatchDelay": 100,
    "loadedAtStartup": true,
    "workers": [
      {
        "id": "9000",
        "startTime": "2023-06-06T07:26:35.267Z",
        "status": "READY",
        "memoryUsage": 6423101440,
        "pid": 87,
        "gpu": true,
        "gpuUsage": "gpuId::0 utilization.gpu [%]::0 % utilization.memory [%]::0 % memory.used [MiB]::7998 MiB"
      }
    ]
  }
]

So I would like to know if it is possible to deploy both GPU and CPU models on one docker container. Thanks a lot !

I have tried to run 2 TorchServe containers, one for CPU and one for GPU, which is working for now, but with the new version of TorchServe coming out, I still want to try with only one container to avoid crashing my memory.

0

There are 0 best solutions below