worker-vllm list of strings
Hey,
I have fine-tuned model that i want to deploy in serverless. I tried the vLLM prompts approach with list of strings (as attached) on T4 Colab and it works really well - response in 0.5 secs. And here is my question - do i need to create my own worker to post input as a list of strings or you handle this in your vllm-worker? -> https://github.com/runpod-workers/worker-vllm
Thanks for you reply 😉
#sorry, I accidentally posted on a different channel than #⚡|serverless
GitHub
GitHub - runpod-workers/worker-vllm: The RunPod worker template for...
The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm
1 Reply
Hello
To do batch inputs, the best way would be to asynchronously make a call for each input at the same time
So just start an endpoint with our prebuilt image and make env variable MODEL_NAME as the repo of your model on HF
Then do an async for loop to get the responses for all prompts in your list simultaneously