Anubhav
RRunPod
•Created by Anubhav on 9/26/2024 in #⚡|serverless
Fixed number of Total Workers - Any work around?
Currently our team has a pool of ~150 workers on RunPod serverless. The GPUs are of type RTX A4000/A5000/A6000.
We have a total of 10 different models deployed on the serverless endpoints that we use at the time of inference. Each model has a different amount of active/max workers depending on the load that they can get, where they are placed in our pipeline, and the nature of the model.
My question is: What are the best practices around runpod serverless, should we deploy multiple models within the same image and do a routing within the handler? This would let me make more endpoints with the given amount of workers. But with this solution one of my models can completely block off the requests for my other models.
2 replies