I am using TorchServe to serve 3 models on my computer with GPU capacity, Torchserve is deployed as a docker container:
server:
image: pytorch/torchserve:test-gpu
container_name: {container_name}
restart: always
volumes:
- type: bind
source: ./model_store
target: /home/model-server/model-store
healthcheck:
test:
[
"CMD-SHELL",
"curl --silent --fail http://server:8080/ping || exit 1"
]
interval: 30s
timeout: 15s
retries: 5
ports:
- 8080:8080
- 8081:8081
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
I want to serve one of the models on CPU and 2 others on GPU, now supported by Torchserve 0.8.0. I am following the instruction from the Torch Model Archiver README file:https://github.com/pytorch/serve/tree/master/model-archiver#config-file in order to define our CPU model like the following:
minWorkers: 1
maxWorkers: 2
batchSize: 1
deviceType: CPU
But when I describe my model I see the workers are still on GPU:
[
{
"modelName": "{model_name}",
"modelVersion": "1.1",
"modelUrl": "{model_name}.mar",
"runtime": "python",
"minWorkers": 1,
"maxWorkers": 2,
"batchSize": 1,
"maxBatchDelay": 100,
"loadedAtStartup": true,
"workers": [
{
"id": "9000",
"startTime": "2023-06-06T07:26:35.267Z",
"status": "READY",
"memoryUsage": 6423101440,
"pid": 87,
"gpu": true,
"gpuUsage": "gpuId::0 utilization.gpu [%]::0 % utilization.memory [%]::0 % memory.used [MiB]::7998 MiB"
}
]
}
]
So I would like to know if it is possible to deploy both GPU and CPU models on one docker container. Thanks a lot !
I have tried to run 2 TorchServe containers, one for CPU and one for GPU, which is working for now, but with the new version of TorchServe coming out, I still want to try with only one container to avoid crashing my memory.