R
RunPod•10mo ago
norefreshing

Too many failed requests

Hello. I've tried to run casperhansen/mixtral-instruct-awq (https://huggingface.co/casperhansen/mixtral-instruct-awq) on A100 80 GB and A100 SXM 80GB GPUs, sending 10 requests per second using this script https://github.com/vllm-project/vllm/blob/main/benchmarks/benchmark_serving.py. However most of the requests failed with Aborted request log from vLLM. This issue didn't occur on another platform with the same GPU, and same code, so I'm not sure if the problem is with vLLM or with RunPod's internal processing. Could anyone provide guidance on what the cause might be?
Solution:
Why are you using GPU cloud for this? If you want to handle many concurrent requests, you need to use Serverless not GPU cloud. https://github.com/runpod-workers/worker-vllm...
GitHub
GitHub - runpod-workers/worker-vllm: The RunPod worker template for...
The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm
Jump to solution
4 Replies
Solution
ashleyk
ashleyk•10mo ago
Why are you using GPU cloud for this? If you want to handle many concurrent requests, you need to use Serverless not GPU cloud. https://github.com/runpod-workers/worker-vllm
GitHub
GitHub - runpod-workers/worker-vllm: The RunPod worker template for...
The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm
norefreshing
norefreshingOP•10mo ago
Thank you for your reply. I wanted to test how many requests it can manage. I'm still learning about LLMs and how to host them, I wasn't aware that GPU cloud is suitable for handling many concurrent requests. Could you kindly explain a bit more about why serverless is preferable in this context compared to GPU clouds or any documents that I could check for more detailed information?
ashleyk
ashleyk•10mo ago
Overview | RunPod Documentation
An overview to Serverless GPU computing for AI inference and training.
norefreshing
norefreshingOP•10mo ago
Thank you. I appreciate it 🙂
Want results from more Discord servers?
Add your server