Ardgon Posts - Answer Overflow

Ardgon

•Created by Ardgon on 11/18/2024 in #⚡｜serverless

vLLM override open ai served model name

Overriding the served model name on the vllm serverless pod doesn't seem to take effect. Configuring a new endpoint through the explore page on runpod's interface creates a worker with the env variable OPENAI_SERVED_MODEL_NAME_OVERRIDE but the name of the model on the openai endpoint is still hf_repo/model name. The logs show : engine.py: AsyncEngineArgs(model='hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4', served_model_name=None... and the endpoint returns

Error with model object='error' message='The model 'model_name' does not exist.' type='NotFoundError' param=None code=404

Setting the env variable SERVED_MODEL_NAME shows logs:

engine.py: Engine args: AsyncEngineArgs(model='hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4', served_model_name='model_name'...

yet the endpoint still returns the same error message as above.

1 replies

RRunPod

•Created by Ardgon on 6/18/2024 in #⚡｜serverless

Cancelling job resets flashboot

For some reason whenever we cancel a job the next time the serverless worker cold boots it doesn't use flash boot and instead it reloads the llm model weights into the gpu from scratch. Any idea why cancelling jobs might be causing this problem? Is there maybe a more graceful solution for stopping jobs early than the /cancel/{job_id} endpoint?

4 replies

Gaming

Programming