pxmwxd
pxmwxd
RRunPod
Created by pxmwxd on 7/20/2024 in #⚡|serverless
Serverless doesn't scale
Endpoint id: cilhdgrs7rbzya I have some requests which requrie workers with 4 GTX 4090s. “max worker” of the endpoint is 150 and “Request Count” in Scale type is 1. When I sent 78 requests concurrently, only ~20% of these requests could start in 10s. P80 need to wait for ~600s. Is this because there is not enough GPUs? When stock status “availibity: high”, how many workers can I expect to scale up in the mean time?
18 replies