Maher Posts - Answer Overflow

Maher

•Created by Maher on 5/30/2024 in #⚡｜serverless

hanging after 500 concurrent requests

Hi, I loaded llama 8b in serverless with 1 active worker A100, and 1 idle worker, I wanted to benchmark how many requests I can do at the same time so I can go production. But when I send 500 requests at the same the server just hangs and I don't get an error. What could be the issue? how to know how much load 1 gpu can handle and how to optmize it for max concurrency.

6 replies

Gaming

Programming