Runpod requests fail with 500
Upgrade faster-whisper version for quick deploy
LoRA path in vLLM serverless template
Wish to split model files with docker, but it slows down significantly when using storage
Intermittent timeouts on requests
Logs are attached - this case is 2 successful requests, then a third request just times out - it seems like the request never gets to the queue (no logs)....
"Failed to return job results. | Connection timeout to host https://api.runpod.ai/v2/91gr..."
HF Cache
Popular Hugging Face models have super fast cold-start times now
We know lots of our developers love working with Hugging Face models. So we decided to cache them on our GPU servers and network volumes.
Popular Hugging Face models have super fast cold-start times now
We know lots of our developers love working with Hugging Face models. So we decided to cache them on our GPU servers and network volumes.
GPU Availability Issue on RunPod – Need Assistance
job timed out after 1 retries
Unable to fetch docker images
error pulling image: Error response from daemon: Get "https://registry-1.docker.io/v2/": context deadline exceeded
2024-11-18T18:10:47Z error pulling image: Error response from daemon: Get "https://registry-1.docker.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
...Failed to get job. - 404 Not Found
vLLM override open ai served model name
OPENAI_SERVED_MODEL_NAME_OVERRIDE
but the name of the model on the openai endpoint is still hf_repo/model name.
The logs show : engine.py: AsyncEngineArgs(model='hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4', served_model_name=None...
and the endpoint returns Error with model object='error' message='The model 'model_name' does not exist.' type='NotFoundError' param=None code=404
...Not using cached worker
What are ttft times we should be able to reach?
80GB GPUs totally unavailable
Not able to connect to the local test API server
What methods can I use to reduce cold start times and decrease latency for serverless functions
Network volume vs baking in model into docker