RunPod•2mo ago

Adding parameters to Docker when running Serverless

Hi. I need to add limit_mm_per_prompt to my Serverless Endpoint. How can i do it?

7 Replies

Jason•2mo ago

what is that? which app are you using

NikitaOP•2mo ago

I try to run Qwen 2.5 VL 7B on VLLM on Serverless Endpoint And I want to process more than one image per request So this parameter is needed

Jason•2mo ago

--limit-mm-per-prompt
For each multimodal plugin, limit how many input instances to allow for each prompt. Expects a comma-separated list of items, e.g.: image=16,video=2 allows a maximum of 16 images and 2 videos per prompt. Defaults to 1 for each modality.

--limit-mm-per-prompt
For each multimodal plugin, limit how many input instances to allow for each prompt. Expects a comma-separated list of items, e.g.: image=16,video=2 allows a maximum of 16 images and 2 videos per prompt. Defaults to 1 for each modality.

i think its not supported directly, but you can modify your code then use your own code repo to build the vllm worker

Jason•2mo ago

https://github.com/runpod-workers/worker-vllm/blob/dc6f3239bdd6db0043e87bc3bcbd6830ad23af11/src/engine_args.py#L68 you can edit this file

GitHub

worker-vllm/src/engine_args.py at dc6f3239bdd6db0043e87bc3bcbd6830a...

The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm

Jason•2mo ago

guessing, you might add a key called limit_mm_per_prompt

NikitaOP•2mo ago

can I run my repo serverless after that?

Jason•2mo ago

i guesss so

Gaming

Programming

Adding parameters to Docker when running Serverless

Did you find this page helpful?