RunPod•5mo ago

Has anyone experienced issues with serverless /run callbacks since December?

We've noticed that response bodies are empty when using /run endpoints with callbacks in the RunPod serverless environment (occurring sometime after December 2nd). Additional context: - /runsync endpoints are working normally - Response JSON format appears correct in the "Requests" tab of RunPod console under Status - Our last deployment to this endpoint was two months ago Could anyone confirm if there have been any releases since December that might have introduced this issue? We haven't made any changes to our deployment since two months ago, but are now seeing empty response bodies with callbacks. Thanks in advance! 🙏

66 Replies

Jason•5mo ago

@yhlong00000

wuxmes•5mo ago

same here

wuxmes•5mo ago

getting a barrage of 520 and 415 since 4 am

Iggy•5mo ago

same here

Coderik•5mo ago

Same here with two clients of mine simultaneously. Status request immediately after the job's completion returns the correct result. But no webhook

Iggy•5mo ago

getting JSON strings sometimes empty responses badly formatted string

sdhiman15•5mo ago

i am not getting any responses

Iggy•5mo ago

it started happening today

sdhiman15•5mo ago

getting empty object in request.body and undefined in request.rawBody Yes

Iggy•5mo ago

everything is breaking 😢

sdhiman15•5mo ago

yes No one is responding too. my entire flow is disrupted due to this.

Moritur•5mo ago

Same here, we're only getting the input data, but not the request id anymore so we can't fetch the output of the job

eliasb•5mo ago

from what I found, they're missing the "content-type" header the webhook POST is missing the content-type header, if you can fix it, just set it to application/json (I am using a middleware in rails)

Ahmadzafar176•5mo ago

same here, getting initial "IN_QUEUE" status but not receiving any response , runsync works ok anyone has a fix or know what's wrong?

Moritur•5mo ago

I think we need to wait for them to wake up, it's currently ~2am in san francisco

wuxmes•5mo ago

for me I did a manual fix by adding a middleware by directly setting all webhook requests as contenttype of json

    def __call__(self, request):
        if request.method == "POST" and "webhook_runpod" in request.path:
            request.META["CONTENT_TYPE"] = "application/json"
        return self.get_response(request)

    def __call__(self, request):
        if request.method == "POST" and "webhook_runpod" in request.path:
            request.META["CONTENT_TYPE"] = "application/json"
        return self.get_response(request)

smth like this fixed it for me in python

sdhiman15•5mo ago

Yes, If using nodejs, set the request header to application/json manually and use express json parser to get the parsed body

app.post('/webook', (req, res) => {
    req.headers['content-type'] = "application/json";
    express.json()(req, res, ()=> {
        console.log(req.body);
        // Your code goes here

        res.sendStatus(200);
    })
})

app.post('/webook', (req, res) => {
    req.headers['content-type'] = "application/json";
    express.json()(req, res, ()=> {
        console.log(req.body);
        // Your code goes here

        res.sendStatus(200);
    })
})

Ahmadzafar176•5mo ago

unfortunately my company uses bubble as a front end service, do i have to make a middleware on my end to solve this?

juergengunz•5mo ago

i think someone pushed an update before they went to sleep

Jason•5mo ago

Kek

Nikita•5mo ago

happens to everyone to be honest

Jason•5mo ago

Anyone can take a screenshot I wanna see? Does it happen on all region? What or what region do you guys use

juergengunz•5mo ago

i only use europe regions

Nikita•5mo ago

fails on every datacenter

Jason•5mo ago

The job fails or it only returns empty output?

Nikita•5mo ago

it returns a valid output (can be seen in requests tab of an endpoint), but the webhook post request body doesn't seem to be a serialized json event setting content type manually didnt work for me trying to serialize it manually from chunks sent

Jason•5mo ago

So it errors in webhook only? But the /status and /run output works well?

Nikita•5mo ago

Yup For those who work with node.js and didn't resolve it with manually setting the content-type, here's the custom serializer from chunks into json (express)

public async handler(req: Request, res: Response, next: NextFunction) {
    req.headers["content-type"] = "application/json";

    let buffer = "";
    req.setEncoding("utf8");

    req.on("data", (chunk) => {
      buffer += chunk;
    });

    req.on("end", () => {
      try {
        req.body = JSON.parse(buffer);
      } catch (err) {
        console.error("Error parsing JSON:", err);
      }
      next();
    });
  }

public async handler(req: Request, res: Response, next: NextFunction) {
    req.headers["content-type"] = "application/json";

    let buffer = "";
    req.setEncoding("utf8");

    req.on("data", (chunk) => {
      buffer += chunk;
    });

    req.on("end", () => {
      try {
        req.body = JSON.parse(buffer);
      } catch (err) {
        console.error("Error parsing JSON:", err);
      }
      next();
    });
  }

furkan.huudle•5mo ago

same here 🙌🏻 Sometimes I get empty body (with binary data). Is there any explanations from runpod?

Nikita•5mo ago

not yet, i see this happens since 03:00 at night UTC

Moritur•5mo ago

Is there any way to escalate issues like these to runpod staff, especially if it happens in the middle of the night for them?

Nikita•5mo ago

i guess discord is the only place

Jason•5mo ago

Contact button on the website What tis it like? Any picture?

Moritur•5mo ago

Is that actually escalating it? I'm guessing they're all sleeping right now

Jason•5mo ago

Well no, it creates a new support request

Moritur•5mo ago

Ok, I guess they will notice either way once they wake up. But I guess a company like runpod should also have monitoring set up that screams at them when suddenly a huge amount of webhooks across all customers fail.

Nikita•5mo ago

We've recently had an sdk version that didn't work properly for about 2 weeks straight so...

furkan.huudle•5mo ago

Im using n8n for webhook, I can provide screenshot but there is no programming things, so I don't sure if it's fit or understand other developers But I think runpod should increase their focus on support or deployment management. Because 1 or 2 months ago, runpod sdk was broken and I couldn't see if I checked discord but anyway, Im sharing screnshots @nerdylive

furkan.huudle•5mo ago

The right one is request from runpod

furkan.huudle•5mo ago

And Runpod respond as binary, I guess

furkan.huudle•5mo ago

And that's the binary data

Jason•5mo ago

Ohh because of the header data type is missing or wrong type I guess yeah.

furkan.huudle•5mo ago

Yes :/ When they will fix u think?

Jason•5mo ago

When they wake up and working Maybe like few more hours

yhlong00000•5mo ago

Hey, sorry about this! We’re aware of the issue and will have a hotfix in next hour. The response is currently missing the application/json header. As a workaround, you can update your code to parse the body as JSON even if the header is missing.

Nikita•5mo ago

Thanks for quick respond 💪

furkan.huudle•5mo ago

thanks meow meow 🚀

Iggy•5mo ago

@yhlong00000 any updates? 👀

Jason•5mo ago

Don't worry they will announce it as soon as it's fixed either here , or in #📢｜announcements

Iggy•5mo ago

was in the middle of a huge refactor to make this work but if you are working on it I'll just wait

yhlong00000•5mo ago

We’re running the final tests now, it should be ready soon. We’re pushing the change now; it will take about 15 minutes. I’ll keep you posted. The release is almost complete, and my testing shows the response looks good. Could you verify it on your end and let me know if you still encounter any issues?

Ahmadzafar176•5mo ago

seems to be working now thanks

Moritur•5mo ago

It works now. Are there any steps you are taking to prevent issues like this from happening in the future? The biggest issue for us was that there was no official reaction at all for more than 5 hours of our regular working day.

yhlong00000•5mo ago

Yes, we will reflect on this incident internally and implement additional safeguards and necessary changes to prevent this from happening again in the future. We’re truly sorry for the inconvenience!

Iggy•5mo ago

All systems operational now, back to normal Even with priority support this was quite unnerving, would be great if you could have support team for the midnight san francisco hours

yhlong00000•5mo ago

Thanks for the feedback! We’re currently short on support staff but will work towards providing 24/7 support in the future.

kazuph（かずふ）🍙OP•5mo ago

I'd like to confirm that our application has recovered from the above issue. Thank you.

CosMix•5mo ago

I am having issues right now... Cannot create pods through the python package despite having GPUs available. Keep getting the "There are no longer any instances available with the requested specifications. Please refresh and try again" but if I try to create it with exactly the same settings from the Web UI it all works out...

Jason•5mo ago

Maybe the machine just freed up? the supply is real time, so when its really free it will be available to rent in around that time

CosMix•5mo ago

Its been happening since last week. Is I try to hit the button on the webui at the same time with triggering the script, it doesn't work with the runpod python package but it works with the Web UI.

Jason•5mo ago

@yhlong00000 https://discord.com/channels/912829806415085598/1307856048173879327 i think these guys are experiencing the same problem

MaxFrax•5mo ago

Are any of you still experiencing issues with serverless vllm? I cannot manage to release a working endpoint. I keep getting 500s and even some 502 bad gateway from cloudflare. I don't even know how to further describe my issues, it's days that I'm banging my head on this problem and I'm losing sanity. I tried to rollback to runpod/worker-v1-vllm:v1.6.0stable-cuda12.1.0, without any luck. Lucklily it seems that my old endpoints created in the past few months are not experiencing visible issues 502 are coming in strong now and my in progress requests seems to be multiplying according to inprogress counter (without aparent reason)

yhlong00000•5mo ago

The UI refreshes periodically to display the latest GPU availability. When you click the deploy button, the system checks the real-time availability of the GPU. If availability is low and many users are renting or releasing GPUs, it’s possible the UI shows a GPU as available, but by the time you deploy, it’s already taken due to the refresh delay. maybe try to record a video, screenshot, logs, endpointIds, current settings, those will be useful to figure out the issue.

MaxFrax•5mo ago

Didn't manage to collect all the material yet, however it seems related to constraining the generation with: extra_body={"guided_json": json_schema} https://docs.vllm.ai/en/latest/usage/structured_outputs.html

CosMix•5mo ago

That also happens with instances that show "Medium" or "High" availability. Is a general issue with Runpod.

Gaming

Programming

Has anyone experienced issues with serverless /run callbacks since December?

Did you find this page helpful?