R
RunPod2w ago
bggai

Runpod Servelerss really unreliable, delay time is way too high sometimes

I'm using a 24 GB vRAM serverless endpoint, the endpoint is way too unstable, 90% of the times the "QUEUE" takes a couple of seconds and then inference of the Omniparser v2 model takes between 3-8 seconds. This is a replicable result in Google Colab and other GPUs, nonetheless, every once on a while Runpod takes more than 40 seconds ofr even minutes to process a request. This happens when a specific worker bugs and then multiple request goes through it. The worker bugs for no reason and takes multiple minutes to do the job it should do in seconds. This only happens for some workers and when the same worker is used multiple times, it makes no sense and Runpod charges you multiple minutes of DELAY TIME, sometimes it does not even go through, meaning it says "IN_PROGRESS" as seen in the image for multiple minutes without finishing while Runpod charges you every second. In any other environment and even runpod this process takes seconds, the "IN_PROGRESS" print shows between 3-8 times only. This makes the endpoint highly unstable and way too expensive for a model that does not even use half of the vRAM.
No description
2 Replies
bggai
bggaiOP2w ago
Stable endpoints (most of times): Delay: 5-10 seconds, Inference-3-8 seconds Unstable endpoints: Delay (1 min-Endless), Inference (unbkown, sometimes i never finishes, other times we do not know if its slow the loading or the inference) - The workers were tested separately, onyl some of the fail. - The requirements of the model are not even close to the capacity of the machine. - Its only some bugged workers since the others have replicable times, really similar to Google Colab discouting the dealy time - I have multiple endpoints, this seems to be s recent behaviour and its not desired at all, one because it does not make sense accoridn go the hardware the serveless end point has and two because Runpod charges you these errors.
bggai
bggaiOP2w ago
Worst part, the endpoints sometimes runs until timeout charging a whole lot
No description

Did you find this page helpful?