RunPod•11mo ago

Serverless Endpoint failing occasionally

i'm pretty new to runpod and started off with a serverless endpoint. when calling the api i sometimes get a failed response as return but can not really retrace whats wrong exactly.. calling the same API with the same input again works.. also the logs don't provide more information. How can i figure out, what is causing this error? Is there a best practise to catch the FAILED calls and analyze why they occur? Happy for any help!

24 Replies

karinenavasOP•11mo ago

+: it is my custom endpoint which takes as input a list of text articles and outputs a list of labels.

haris•11mo ago

@karinenavas are you able to provide the request you're making as well as the code you're using, and the response if any.

karinenavasOP•11mo ago

this is the request I am making

karinenavasOP•11mo ago

I get the message "Job failed with status FAILED" and None return

karinenavasOP•11mo ago

also the logs for the worker throwing the error, do not give away too much information

digigoblin•11mo ago

I think he wanted the handler code. You should use a try, except block like this in your handler too. And the return the errors as a string in the error key. It doesn't support a list or a dict.

karinenavasOP•11mo ago

could you give me an example?

nerdylive•11mo ago

Hey

nerdylive•11mo ago

are you using the runpod endpoints (https://www.runpod.io/endpoints)

RunPod Endpoints

AI endpoints for Stable Diffusion, Dreambooth, Whisper, and many more.

nerdylive•11mo ago

or you are using your own models??

karinenavasOP•11mo ago

i am using my own model

karinenavasOP•11mo ago

open to suggestions for improvement 🙏

nerdylive•11mo ago

where is the model stored then? i cant really see the problem there Try to see the logs page too in serverless try sending it here too nah its ok bro the rp start is normal

justin•11mo ago

If you want, you can maybe sanity check and start small: https://blog.runpod.io/serverless-create-a-basic-api/ Btw, if your on mac, make sure you are targeting the right platform of --platform amd64 Yeah haha, just deleted my comments, realized they got it right

nerdylive•11mo ago

oh ya where ru from btw its so late here lol

justin•11mo ago

i fly around lol, currently in west coast

nerdylive•11mo ago

oooh nice

karinenavasOP•11mo ago

Actually i started with a Blog Post Tried to follow the steps so i thought this i a pretty simple api already The model is from hf, works fine if i integrate it in a normal workbook, i just need the runpod gpu Could it be possibly due to memory Overhead? Is there a propor error handling to Catch that?

digigoblin•11mo ago

Impossible to know without a full stack trace of the error

karinenavasOP•11mo ago

My Quick fix for now is to Catch the failed calls and retry after 5 seconds. This works but still Almost every fifth call is a fail

digigoblin•11mo ago

Best to use a try, except block and log the stack trace to your error response Yeah thats not good, going to waste a lot of money like that

karinenavasOP•11mo ago

Try and except the classify_articles call?

digigoblin•11mo ago

Yeah, basically your entire handler function

karinenavasOP•11mo ago

Ill try, thanks for the hint

Gaming

Programming

Serverless Endpoint failing occasionally

Did you find this page helpful?