Stuck vLLM startup with 100% GPU utilization
How to respond to the requests at https://api.runpod.ai/v2/<YOUR ENDPOINT ID>/openai/v1
worker-vllm not working with beam search
length_penalty
not being accepted. Can you please work on a fix for beam search? Thanks!All GPU unavailable
/runsync returns "Pending" response
Kicked Worker
Possible to access ComfyUI interface in serverless to fix custom nodes requirements?
How to truly see the status of an endpoint worker?
How do I calculate the cost of my last execution on a serverless GPU?
runsync
request instead of manually calculating it?Serverless deepseek-ai/DeepSeek-R1 setup?
what is the best way to access more gpus a100 and h100
Guidance on Mitigating Cold Start Delays in Serverless Inference
A40 Throttled very regularly!
SSH info via cli
Can not get a single endpoint to start
All 16GB VRAM workers are throttled in EU-RO-1
worker-vllm: Always stops after 60 seconds of streaming
api.runpod.ai/v2
. This has benefits of being able to get the job_id
and do more things, but I would like to do this with the OpenAI API....It is always getting queued whenever I call API queue always get bigger, how to cancel all jobs
I want to deploy a serverless endpoint with using Unsloth
--trust-remote-code
trust_remote_code=True
in LLM or using the --trust-remote-code
flag in the CLI.; <traceback object at 0x7fecd5a12700>;"...