How to Run Text Generation Inference on Serverless?
Hello newbie here, I want to run text generation inference by huggingface on serverless. I use this repo https://github.com/runpod-workers/worker-tgi, I build my own docker image according the readme and deploy on runpod serverless. But when i hit my API I get this error:
can anyone help me?
8 Replies
Most people use this one:
https://github.com/runpod-workers/worker-vllm
GitHub
GitHub - runpod-workers/worker-vllm: The RunPod worker template for...
The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm
is it also support text generation inference?
Yes
hello sorry for late response, I tried to use prebuild docker from this repo the config looks like this. but still no response after hit my api
What response do you get when calling your endpoint?
@Alpay Ariyak may be able to advise.
@Oryza sativa Can you share the worker logs
I am sorry, already get the response, I think becuse I hit the endpoint but my endpoint still on initializing status, isn ot ready yet. and finally get the response with this. thank you @ashleyk
but i just curious, it is using vllm right? so is runpod now support using TGI for deploying model in serverless?
https://github.com/huggingface/text-generation-inference
GitHub
GitHub - huggingface/text-generation-inference: Large Language Mode...
Large Language Model Text Generation Inference. Contribute to huggingface/text-generation-inference development by creating an account on GitHub.