ab
RRunPod
•Created by ab on 8/8/2024 in #⚡|serverless
Error getting response from a serverless deployment
I tried to create multiple serverless vLLM deployments and even picked the top end GPU. However the requests would always go to a in-progress status and would not respond. I'm building a chatapp and such a slow response isn't acceptable. Is there something else I should do? I had selected all default option for the google/gemma-2b model while creating the deployment. I know the requests from my app hit runpod as I could see the requests and their status but it would never respond back. I was trying to use the OpanAI compatible end points. Would appreciate any help on this
15 replies