R
RunPod5d ago
Nikita

Adding parameters to Docker when running Serverless

Hi. I need to add limit_mm_per_prompt to my Serverless Endpoint. How can i do it?
7 Replies
nerdylive
nerdylive5d ago
what is that? which app are you using
Nikita
NikitaOP5d ago
I try to run Qwen 2.5 VL 7B on VLLM on Serverless Endpoint And I want to process more than one image per request So this parameter is needed
nerdylive
nerdylive5d ago
--limit-mm-per-prompt
For each multimodal plugin, limit how many input instances to allow for each prompt. Expects a comma-separated list of items, e.g.: image=16,video=2 allows a maximum of 16 images and 2 videos per prompt. Defaults to 1 for each modality.
--limit-mm-per-prompt
For each multimodal plugin, limit how many input instances to allow for each prompt. Expects a comma-separated list of items, e.g.: image=16,video=2 allows a maximum of 16 images and 2 videos per prompt. Defaults to 1 for each modality.
i think its not supported directly, but you can modify your code then use your own code repo to build the vllm worker
nerdylive
nerdylive5d ago
GitHub
worker-vllm/src/engine_args.py at dc6f3239bdd6db0043e87bc3bcbd6830a...
The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm
nerdylive
nerdylive5d ago
guessing, you might add a key called limit_mm_per_prompt
Nikita
NikitaOP5d ago
can I run my repo serverless after that?
nerdylive
nerdylive5d ago
i guesss so

Did you find this page helpful?