vLLM override open ai served model name
Overriding the served model name on the vllm serverless pod doesn't seem to take effect. Configuring a new endpoint through the explore page on runpod's interface creates a worker with the env variable
OPENAI_SERVED_MODEL_NAME_OVERRIDE
but the name of the model on the openai endpoint is still hf_repo/model name.
The logs show : engine.py: AsyncEngineArgs(model='hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4', served_model_name=None...
and the endpoint returns Error with model object='error' message='The model 'model_name' does not exist.' type='NotFoundError' param=None code=404
Setting the env variable SERVED_MODEL_NAME
shows logs: engine.py: Engine args: AsyncEngineArgs(model='hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4', served_model_name='model_name'...
yet the endpoint still returns the same error message as above.0 Replies