RunPod•10mo ago

hanging after 500 concurrent requests

Hi, I loaded llama 8b in serverless with 1 active worker A100, and 1 idle worker, I wanted to benchmark how many requests I can do at the same time so I can go production. But when I send 500 requests at the same the server just hangs and I don't get an error. What could be the issue? how to know how much load 1 gpu can handle and how to optmize it for max concurrency.

3 Replies

digigoblin•10mo ago

@Alpay Ariyak any idea?

nerdylive•10mo ago

Hangs? What's it's like No output just running?

Alpay Ariyak•10mo ago

Yeah, could you please expand on how it hangs? Also, Max Job Concurrency is 300 by default, you can change it with MAX_CONCURRENCY env var

Gaming

Programming

hanging after 500 concurrent requests

Did you find this page helpful?