RunPod•8mo ago

text generation inference docker image on serverless?

Hi i have created a template using tgi docker image and in docker commands i have entered --model-id <llama-3-8b> hf repo name and --port 8080 and choose 24gb gpu and ran a serverless instance. But i am not able to connect to this worker what i mean is when i try to ask a question, question is not being sent to the worker, but when i try to ssh into worker and asked a curl request curl 127.0.0.1:8080/generate_stream \ -X POST \ -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \ -H 'Content-Type: application/json' it actually worked, but how do i connect to this serverless endpoint from outside probably from my codebase and make inference to the llm model using TGI

3 Replies

nerdylive•8mo ago

Hmmm use a http client to proxy your request from runpod 's /run So when /run happens you send a request to your localhost ( 127.0.0.1:xxxx/xxx)

underdog__spider3099OP•8mo ago

i have actually tried sending a request using this: using requests, my endpoint is something like this https://api.runpod.ai/v2/{endpoint_id}/run, a job is created but i am not getting response , and in the logs i see no request is being sent

nerdylive•8mo ago

Did you log the request? Runpod's logger doesn't magically reports or log anything that's happening, I think it takes from stdout, stderr Then did the job failed too? Maybe the model might still be loading

Gaming

Programming

text generation inference docker image on serverless?

Did you find this page helpful?