underdog__spider3099
underdog__spider3099
RRunPod
Created by underdog__spider3099 on 7/17/2024 in #⚡|serverless
text generation inference docker image on serverless?
Hi i have created a template using tgi docker image and in docker commands i have entered --model-id <llama-3-8b> hf repo name and --port 8080 and choose 24gb gpu and ran a serverless instance. But i am not able to connect to this worker what i mean is when i try to ask a question, question is not being sent to the worker, but when i try to ssh into worker and asked a curl request curl 127.0.0.1:8080/generate_stream \ -X POST \ -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \ -H 'Content-Type: application/json' it actually worked, but how do i connect to this serverless endpoint from outside probably from my codebase and make inference to the llm model using TGI
8 replies