R
RunPod7mo ago
Maher

hanging after 500 concurrent requests

Hi, I loaded llama 8b in serverless with 1 active worker A100, and 1 idle worker, I wanted to benchmark how many requests I can do at the same time so I can go production. But when I send 500 requests at the same the server just hangs and I don't get an error. What could be the issue? how to know how much load 1 gpu can handle and how to optmize it for max concurrency.
3 Replies
digigoblin
digigoblin7mo ago
@Alpay Ariyak any idea?
nerdylive
nerdylive7mo ago
Hangs? What's it's like No output just running?
Alpay Ariyak
Alpay Ariyak7mo ago
Yeah, could you please expand on how it hangs? Also, Max Job Concurrency is 300 by default, you can change it with MAX_CONCURRENCY env var
Want results from more Discord servers?
Add your server