RunPod•12mo ago

Serverless Error Kept Pod Active

I have a LLM deployed on runpod serverless. Due to a certain error that occurred after a request: The pod got stuck on {"requestId": null, "message": "Failed to get job, status code: 502", "level": "ERROR"} This kept pod active and therfore caused me to lose money. Shouldn't pod deactivate automatically in case of errors after certain time?

35 Replies

digigoblin•12mo ago

Which version of the RunPod SDK are you using?

Jason•12mo ago

you can configure them in endpoint settings for timeout ( deactivate after a specific time set ) maybe ask it on the web chat and give them the endpoint id what other errors did you get in the log?

redteamOP•12mo ago

redteamOP•12mo ago

I used this docker image runpod/worker-vllm:stable-cuda11.8.0 That doesn't work when pod get stuck due to error

digigoblin•12mo ago

Oh yeah something looks broken here, use web chat (which is always broken for me), or email support.

Jason•12mo ago

why? it restarts everytime it errors?

redteamOP•12mo ago

I don't know but I had execution timeout on with 300 seconds. You can see request count here. And it was on for quite many hours (continously) until I manually turned off

Jason•12mo ago

How many requests ran? is it 1? OR failed requests that is retried multiple times? best is to ask on webchat

digigoblin•12mo ago

I disagree, web chat never works, email is better

Jason•12mo ago

ye email works too i dont know yet if they made the webchat better

houmie•12mo ago

I must say this really scares me on serverless that costs will incur when something is crashing and spinning up all workers. I have noticed that too many times during testing in development. I think serverless is great for development. But for production I'm not sure yet. I prefer normal pods where costs are predictable.

Jason•12mo ago

well if the timeout works it shouldnt be, but i think it works

redteamOP•12mo ago

I recieved the refund. Thanks everyone for help

houmie•12mo ago

nice, it's good to know there is support to help out.

gaks2san•12mo ago

Hello About 13 hours ago, there was a problem with the serverless service, so I couldn't use it for a while. But after that, I didn't use it much, but I got an e-mail saying that the balance had run out, and I checked and found that a lot of traffic was logged due to errors, and it disappeared by about $10. I've only made about 10 requests in the meantime, and the response by my request is very short in less than 20 seconds. Ile timeout is only 60 seconds. Please check the cause and compensate me if I have lost anything. There are related materials in the attachment. Please check it quickly. Thank you. (For your information, it's 2:30 a.m. here, so you may not be able to contact me afterward.)

digigoblin•12mo ago

Email support for a refund

houmie•12mo ago

May I suggest something? I think the majority of people using these kind of servers are using it in a stateless format. Data just needs to be processed and returned. If there are errors it's best to kill the process instead of retrying and retrying it. I had the same experience that enabling Execution Timeout doesn't help if there is an error such as run-out-of-memory etc. My suggestion is to introduce a new option that should aggressively kill the task as soon as an error occurs. Logs should obviously be kept for later.

digigoblin•12mo ago

Alpay Ariyak•12mo ago

@flash-singh @Zeen

flash-singh•12mo ago

use support chat and we can issue a refund if its valid

gaks2san•12mo ago

Thank you for your kind advice. Is your advice for my serverless endpoint? if so, does "kill the process or task" mean "delete endpoint and create new endpoint?" Thank you Thank you

Samuel•12mo ago

This is a great suggestion by @houmie - error handling for this is crucial. I am considering moving away from runpod due to this issue. Especially out of memory errors due to kv cache filling up with flashboot enabled are a problem. I think this „delete worker and create new one“ should be the default option (would handle OOM errors as well as some other runtime errors) with additional (preferably configurable) options for retry delay, exponential backoff on/off and max retries. For now however, the initial delete and recreate would be a quick and easy win, as this would solve the biggest issues I (and probably many others) have with runpod serverless right now.

Zeen•12mo ago

We'll figure out if we can enable something like this

Samuel•12mo ago

Thank you very much! Is there a place to subscribe to updates on this? I already have notifications for the GitHub issue in the worker-vLLM repo toggle on. However, any option to stay in the feedback and notification loop on this would be greatly appreciated!

Zeen•12mo ago

Not currently since the code is closed source for platform. The blog usually gets product updates every week or so, so that wouldn't be a bad place to start

houmie•12mo ago

What exactly is the purpose of flashboot? Especially if it takes VRAM space, what are the benefits?

digigoblin•12mo ago

Flash boot reduces cold start times.

houmie•12mo ago

How much VRAM is it taking, if any?

digigoblin•12mo ago

It doesn't take VRAM, your application takes VRAM It is (according to RunPod) some kind of "magic", it keeps your application running in the background so it doesn't have to start up on every single request. I assume they basically just start the container in the background and then pull it in when your worker needs to handle requests. Not sure why they have to be secretive about how it works instead of being open about it.

Jason•12mo ago

Well every companies does this even like McDonald's, there might be perfect match of their recipe but they won't ever publish theirs Let's say maybe to decrease competitor chances to success applying the same features hahaj

digigoblin•12mo ago

There is no such thing, McDonalds makes the worst "food" in the world.

Jason•12mo ago

What such thing Well it's relative but I'd say it's quite good for me

digigoblin•12mo ago

Relative to what? eating dog turd?

Jason•12mo ago

Everyone's food preference is relative

Samuel•12mo ago

The container image has been pulled and assigned to the worker once it stopped initializing. What flashboot does is to do some sort of hybernization to the container of the endpoint gets called frequently. I assume they persist the gpu memory to disk (or other storage), so that it can be loaded from there instead of having to fully load the model to gpu again on subsequent requests. That reduces cold start time from about 90 - 120 seconds for mixtral 8x7b on A6000 to less than two seconds for most of my requests. Pretty cool technology, however, not really usable if your whole balance gets drained overnight due to some worker restarting over and over again if for example an OOM error occurs, which will keep occurring as the gpu memory is cached across invocations..

Gaming

Programming

Serverless Error Kept Pod Active

Did you find this page helpful?