RUNPOD_API_KEY and MAX_CONTEXT_LEN_TO_CAPTURE
We are also starting a vLLM project and I have two questions:
1) In the environment variables, do I have to define the RUNPOD_API_KEY with my own secret key to access the final vLLM OpenAI endpoint?
2) Isn't MAX_CONTEXT_LEN_TO_CAPTURE now deprecated? Do we still need to provide it, if MAX_MODEL_LEN is already set?
Thank you
14 Replies
After some try and error, I figured out the solution to 1) that the RUNPOD_API_KEY has no effect. We need to use the actual API KEY that can be generated under accounts -> Settings to access the OpenAI Url.
I'm still not quite certain how to set the model length. I'm getting this error right now: Llama-3 supports 8192 tokens, however I was expecting that it would use RoPE to automatically increase it. Is this not how it's done? RoPE scaling is supported in vLLM: https://github.com/vllm-project/vllm/pull/555
I'm still not quite certain how to set the model length. I'm getting this error right now: Llama-3 supports 8192 tokens, however I was expecting that it would use RoPE to automatically increase it. Is this not how it's done? RoPE scaling is supported in vLLM: https://github.com/vllm-project/vllm/pull/555
yes
ValueError: User-specified max_model_len (16384)
set it on your env
max model len to 8192
Oh not sure of how it works
Yeah that is easily done with Aphrodite-engine to increase the model length (by using more memory). vLLM is quite limited.
But based on that PR it must be possible, just not so easy I guess.
You are right @nerdylive , but its called
MAX_MODEL_LEN
I don't see how its possible to set the max_model_len to a value thats higher than whats supported by the model, that doesn't make sense to me @houmie
@Alpay Ariyak is the best person to advise on this.Ill try to add the support for RoPE
In Aphrodite-engine I can set CONTEXT_LENGTH to 16384 and it automatically uses RoPE scaling, in return it requires more memory.
See bullet point 3 (https://github.com/PygmalionAI/aphrodite-engine?tab=readme-ov-file#notes)
I'm using that right now on production. It is really possible 🙂
Guys I really hope you can help me with bullet point 1 about API-KEYS.
Is there a way I could define the API-KEY for vLLM myself instead of having RunPod creating it for me?
This last one is quite urgent due a migration request.
ill try to apply that on vllm worker too
Will you try the image to test if it works
Of course, happy to help.
Alright wait
Thank you. And sorry do you know by any chance about the API-KEY issue? I hope there is a way.
What is the API key issue? You have to generate an API key in the RunPod web console and use it to make requestes, you can't use a custom API key, you have to use a RunPod one for RunPod serverless to function correctly.
This is also pretty clear in the docs: https://github.com/runpod-workers/worker-vllm
GitHub
GitHub - runpod-workers/worker-vllm: The RunPod worker template for...
The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm
I see. Ok, so there is no way to set a custom key. Thanks
Nope, not possible, create your own backend as a proxy to serverless if you want to use custom API keys