RunPod•10mo ago

worker-vllm list of strings

Hey, I have fine-tuned model that i want to deploy in serverless. I tried the vLLM prompts approach with list of strings (as attached) on T4 Colab and it works really well - response in 0.5 secs. And here is my question - do i need to create my own worker to post input as a list of strings or you handle this in your vllm-worker? -> https://github.com/runpod-workers/worker-vllm Thanks for you reply 😉 #sorry, I accidentally posted on a different channel than #⚡｜serverless

GitHub

GitHub - runpod-workers/worker-vllm: The RunPod worker template for...

The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm

1 Reply

Alpay Ariyak•10mo ago

Hello To do batch inputs, the best way would be to asynchronously make a call for each input at the same time So just start an endpoint with our prebuilt image and make env variable MODEL_NAME as the repo of your model on HF Then do an async for loop to get the responses for all prompts in your list simultaneously

Gaming

Programming

worker-vllm list of strings