Multiple containers on a single GPU instance?
Are there any plans to allow multiple docker containers on a single GPU instance? I have workloads which do not utilize the full resources of a single GPU, and I'd like to be organize the workloads using multiple containers sharing a single GPU. I don't believe there is a way to do this currently, the closest is to run multiple processes inside a single docker container, but that is a docker anti-pattern and not very good for workload organization.
9 Replies
Sounds like you are describing Docker in Docker. That would require privileged mode. RunPod doesn't support privileged mode.
You can run DID on CPU instances only, which means you cannot do so with GPU. Here is the link to RunPod tutorial for running DID on CPU: https://docs.runpod.io/tutorials/pods/run-docker-in-docker
Run Docker in Docker on RunPod CPU Instances | RunPod Documentation
This tutorial applies only to RunPod's CPU offering.
What I am describing does indeed bear some resemblance to docker in docker, and it would theoretically be one way of solving this, but it's not the only way (and also I'm pretty doubtful GPU passthrough works under docker in docker). It should conceptually be possible to schedule multiple docker instances onto the same GPU, and should not be difficult to do if the system was designed to support it from the start. But I can imagine it's not easy to retrofit into the system if there is an implicit assumption of 1 docker container to 1 GPU. I am still interested to hear if there's any plans to support this at some point.
https://docs.runpod.io/serverless/workers/handlers/handler-concurrency
you can try serverless and also check this concurrent handlers, it allow to share one GPU.
Concurrent Handlers | RunPod Documentation
RunPod's concurrency functionality enables efficient task handling through asynchronous requests, allowing a single worker to manage multiple tasks concurrently. The concurrency_modifier configures the worker's concurrency level to optimize resource consumption and performance.
I would think that RP would not want to deploy such a product for business concerns. RP pretty much already enables users to scale from 0 and what you are suggesting is effectively a cost reduction for running users payloads.
Also, from a technical standpoint you can design your application to process a variety of payloads. Your only constraints would be the VRAM allocation and storage for models/code. VRAM is a non issue if you are not running your models simultaneously. If your models are small you can have a LOT of them. If they are large then you run into tradeoffs between building them into your image or using a network volume. The maximum network volume is 4TB. So you could, for example, run Stable Diffusion with 4TB worth of models, Loras, etc. But you couldn't run too many of them simultaneously as you would run out of VRAM quickly. It really all comes down to what you design your code to do.
It's really more of a convenience and cleanliness, because the whole point you're making which I'm also making is - if you are leaving the GPU idle, the only gain is programming/orchestration simplicity. If I can run multiple docker containers attached to the same GPU, that is a preferable programming/orchestration approach in some cases, while having zero impact to RP's business concerns. I'm not magically creating more capacity out of nowhere, I'm just using a different way of arranging the workload.
You can't argue both sides, if I can already do it now by writing my app/container differently, then it isn't an impact to RP's business concerns. And I can already do it now, it's just less clean from a code and orchestration perspective.
I know your posting in the pod section but have you thought of using serverless workers? With serverless workers you are only charged when the GPU is executing code and you can have multiple workers built from custom docker images.
I'm actually interested in serverless too but in my testing, it didn't really quite work the way I hoped it would. might do some more testing later.
The line between pods and serverless is very little. They are about to release the ability for serverless workers to open a port on pubic IP. It uses the same runpod proxy method pods do. Once that happens there will be nothing a pod can do that cannot be done on a serverless worker.