worker-vllm list of strings

Hey, I have fine-tuned model that i want to deploy in serverless. I tried the vLLM prompts approach with list of strings (as attached) on T4 Colab and it works really well - response in 0.5 secs. And here is my question - do i need to create my own worker to post input as a list of strings or you handle this in your vllm-worker? -> https://github.com/runpod-workers/worker-vllm Thanks for you reply 😉 #sorry, I accidentally posted on a different channel than #⚡|serverless
GitHub
GitHub - runpod-workers/worker-vllm: The RunPod worker template for...
The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm
No description
1 Reply
Alpay Ariyak
Alpay Ariyak•10mo ago
Hello To do batch inputs, the best way would be to asynchronously make a call for each input at the same time So just start an endpoint with our prebuilt image and make env variable MODEL_NAME as the repo of your model on HF Then do an async for loop to get the responses for all prompts in your list simultaneously
Want results from more Discord servers?
Add your server