norefreshing
Too many failed requests
Hello. I've tried to run casperhansen/mixtral-instruct-awq (https://huggingface.co/casperhansen/mixtral-instruct-awq) on A100 80 GB and A100 SXM 80GB GPUs, sending 10 requests per second using this script https://github.com/vllm-project/vllm/blob/main/benchmarks/benchmark_serving.py.
However most of the requests failed with
Aborted request
log from vLLM. This issue didn't occur on another platform with the same GPU, and same code, so I'm not sure if the problem is with vLLM or with RunPod's internal processing.
Could anyone provide guidance on what the cause might be?7 replies
How can I use ollama Docker image?
Hello. I've been trying to serve ollama on RunPod using ollama Docker image (https://hub.docker.com/r/ollama/ollama) but haven't found a way to run it.
I tried using the
docker run ...
command in the Container Start Command input but I encountered an error: unknown command "docker" for "ollama"
. Does anyone know the correct method to use ollama on RunPod?12 replies