R
RunPod7mo ago
Satpal

Serverless Error Kept Pod Active

I have a LLM deployed on runpod serverless. Due to a certain error that occurred after a request: The pod got stuck on {"requestId": null, "message": "Failed to get job, status code: 502", "level": "ERROR"} This kept pod active and therfore caused me to lose money. Shouldn't pod deactivate automatically in case of errors after certain time?
35 Replies
digigoblin
digigoblin7mo ago
Which version of the RunPod SDK are you using?
nerdylive
nerdylive7mo ago
you can configure them in endpoint settings for timeout ( deactivate after a specific time set ) maybe ask it on the web chat and give them the endpoint id what other errors did you get in the log?
Satpal
SatpalOP7mo ago
No description
Satpal
SatpalOP7mo ago
I used this docker image runpod/worker-vllm:stable-cuda11.8.0 That doesn't work when pod get stuck due to error
digigoblin
digigoblin7mo ago
Oh yeah something looks broken here, use web chat (which is always broken for me), or email support.
nerdylive
nerdylive7mo ago
why? it restarts everytime it errors?
Satpal
SatpalOP7mo ago
I don't know but I had execution timeout on with 300 seconds. You can see request count here. And it was on for quite many hours (continously) until I manually turned off
No description
No description
nerdylive
nerdylive7mo ago
How many requests ran? is it 1? OR failed requests that is retried multiple times? best is to ask on webchat
digigoblin
digigoblin7mo ago
I disagree, web chat never works, email is better
nerdylive
nerdylive7mo ago
ye email works too i dont know yet if they made the webchat better
houmie
houmie7mo ago
I must say this really scares me on serverless that costs will incur when something is crashing and spinning up all workers. I have noticed that too many times during testing in development. I think serverless is great for development. But for production I'm not sure yet. I prefer normal pods where costs are predictable.
nerdylive
nerdylive7mo ago
well if the timeout works it shouldnt be, but i think it works
Satpal
SatpalOP7mo ago
I recieved the refund. Thanks everyone for help
houmie
houmie7mo ago
nice, it's good to know there is support to help out.
gaks2san
gaks2san7mo ago
Hello About 13 hours ago, there was a problem with the serverless service, so I couldn't use it for a while. But after that, I didn't use it much, but I got an e-mail saying that the balance had run out, and I checked and found that a lot of traffic was logged due to errors, and it disappeared by about $10. I've only made about 10 requests in the meantime, and the response by my request is very short in less than 20 seconds. Ile timeout is only 60 seconds. Please check the cause and compensate me if I have lost anything. There are related materials in the attachment. Please check it quickly. Thank you. (For your information, it's 2:30 a.m. here, so you may not be able to contact me afterward.)
No description
digigoblin
digigoblin7mo ago
Email support for a refund
houmie
houmie7mo ago
May I suggest something? I think the majority of people using these kind of servers are using it in a stateless format. Data just needs to be processed and returned. If there are errors it's best to kill the process instead of retrying and retrying it. I had the same experience that enabling Execution Timeout doesn't help if there is an error such as run-out-of-memory etc. My suggestion is to introduce a new option that should aggressively kill the task as soon as an error occurs. Logs should obviously be kept for later.
digigoblin
digigoblin7mo ago
Log in #🧐|feedback
Alpay Ariyak
Alpay Ariyak7mo ago
@flash-singh @Zeen
flash-singh
flash-singh7mo ago
use support chat and we can issue a refund if its valid
gaks2san
gaks2san7mo ago
Thank you for your kind advice. Is your advice for my serverless endpoint? if so, does "kill the process or task" mean "delete endpoint and create new endpoint?" Thank you Thank you
Samuel
Samuel7mo ago
This is a great suggestion by @houmie - error handling for this is crucial. I am considering moving away from runpod due to this issue. Especially out of memory errors due to kv cache filling up with flashboot enabled are a problem. I think this „delete worker and create new one“ should be the default option (would handle OOM errors as well as some other runtime errors) with additional (preferably configurable) options for retry delay, exponential backoff on/off and max retries. For now however, the initial delete and recreate would be a quick and easy win, as this would solve the biggest issues I (and probably many others) have with runpod serverless right now.
Zeen
Zeen7mo ago
We'll figure out if we can enable something like this
Samuel
Samuel7mo ago
Thank you very much! Is there a place to subscribe to updates on this? I already have notifications for the GitHub issue in the worker-vLLM repo toggle on. However, any option to stay in the feedback and notification loop on this would be greatly appreciated!
Zeen
Zeen7mo ago
Not currently since the code is closed source for platform. The blog usually gets product updates every week or so, so that wouldn't be a bad place to start
houmie
houmie7mo ago
What exactly is the purpose of flashboot? Especially if it takes VRAM space, what are the benefits?
digigoblin
digigoblin7mo ago
Flash boot reduces cold start times.
houmie
houmie7mo ago
How much VRAM is it taking, if any?
digigoblin
digigoblin7mo ago
It doesn't take VRAM, your application takes VRAM It is (according to RunPod) some kind of "magic", it keeps your application running in the background so it doesn't have to start up on every single request. I assume they basically just start the container in the background and then pull it in when your worker needs to handle requests. Not sure why they have to be secretive about how it works instead of being open about it.
nerdylive
nerdylive7mo ago
Well every companies does this even like McDonald's, there might be perfect match of their recipe but they won't ever publish theirs Let's say maybe to decrease competitor chances to success applying the same features hahaj
digigoblin
digigoblin7mo ago
There is no such thing, McDonalds makes the worst "food" in the world.
nerdylive
nerdylive7mo ago
What such thing Well it's relative but I'd say it's quite good for me
digigoblin
digigoblin7mo ago
Relative to what? eating dog turd?
nerdylive
nerdylive7mo ago
Everyone's food preference is relative
Samuel
Samuel7mo ago
The container image has been pulled and assigned to the worker once it stopped initializing. What flashboot does is to do some sort of hybernization to the container of the endpoint gets called frequently. I assume they persist the gpu memory to disk (or other storage), so that it can be loaded from there instead of having to fully load the model to gpu again on subsequent requests. That reduces cold start time from about 90 - 120 seconds for mixtral 8x7b on A6000 to less than two seconds for most of my requests. Pretty cool technology, however, not really usable if your whole balance gets drained overnight due to some worker restarting over and over again if for example an OOM error occurs, which will keep occurring as the gpu memory is cached across invocations..
Want results from more Discord servers?
Add your server