Maher
RRunPod
•Created by Maher on 5/30/2024 in #⚡|serverless
hanging after 500 concurrent requests
Hi, I loaded llama 8b in serverless with 1 active worker A100, and 1 idle worker, I wanted to benchmark how many requests I can do at the same time so I can go production. But when I send 500 requests at the same the server just hangs and I don't get an error. What could be the issue? how to know how much load 1 gpu can handle and how to optmize it for max concurrency.
6 replies