Adding parameters to Docker when running Serverless
Hi. I need to add limit_mm_per_prompt to my Serverless Endpoint. How can i do it?
7 Replies
what is that? which app are you using
I try to run Qwen 2.5 VL 7B on VLLM on Serverless Endpoint
And I want to process more than one image per request
So this parameter is needed
i think its not supported directly, but you can modify your code then use your own code repo to build the vllm worker
https://github.com/runpod-workers/worker-vllm/blob/dc6f3239bdd6db0043e87bc3bcbd6830ad23af11/src/engine_args.py#L68
you can edit this file
GitHub
worker-vllm/src/engine_args.py at dc6f3239bdd6db0043e87bc3bcbd6830a...
The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm
guessing, you might add a key called
limit_mm_per_prompt
can I run my repo serverless after that?
i guesss so