Jobs Stays in In-Progress for forever
How to Get the Progress of the Processing job in serverless ?
Why is Runsync returning status response instead of just waiting for image response?
Worker Keeps running after idle timeout
May I deploy template ComfyUI with Flux.1 dev one-click to serverless ?emplate
What is the real Serverless price?
Can't find juggernaut on list of models to download in Comfy UI manager
comfy
Incredibly long startup time when running 70b models via vllm
cognitivecomputations/dolphin-2.9.1-llama-3-70b
. I find it even weirder that the request ultimately succeeds. Logs and screenshot of the endpoint and template config are attached - if anyone can spot an issue or knows how to deploy 70b models such that they reliably work I would greatly appreciate it.
Some other observations:
- in support, someone told me that I need to manually set the env BASE_PATH=/workspace
, which I am now always doing
- I sometimes but not always see this in the logs: AsyncEngineArgs(model='facebook/opt-125m', served_model_name=None, tokenizer='facebook/opt-125m'...
, even though I am deploying a completely different model...Mounting network storage at runtime - serverless
Serverless fails when workers arent manually set to active
Chat completion (template) not working with VLLM 0.6.3 + Serverless
qwen2.5 vllm openwebui
Rope scaling JSON not working
First attempt at serverless endpoint - "Initializing" for a long time
(Flux) Serverless inference crashes without logs.
Same request running twice
Why is 125M from facebook loading into VLLM quickdeploy even though another model is specified?