Has anyone experienced issues with serverless /run callbacks since December?
We've noticed that response bodies are empty when using /run endpoints with callbacks in the RunPod serverless environment (occurring sometime after December 2nd).
Additional context:
- /runsync endpoints are working normally
- Response JSON format appears correct in the "Requests" tab of RunPod console under Status
- Our last deployment to this endpoint was two months ago
Could anyone confirm if there have been any releases since December that might have introduced this issue? We haven't made any changes to our deployment since two months ago, but are now seeing empty response bodies with callbacks.
Thanks in advance! 🙏
66 Replies
@yhlong00000
same here
getting a barrage of 520 and 415 since 4 am
same here
Same here with two clients of mine simultaneously. Status request immediately after the job's completion returns the correct result. But no webhook
getting JSON strings
sometimes empty responses
badly formatted string
i am not getting any responses
it started happening today
getting empty object in request.body and undefined in request.rawBody
Yes
everything is breaking 😢
yes
No one is responding too.
my entire flow is disrupted due to this.
Same here, we're only getting the input data, but not the request id anymore so we can't fetch the output of the job
from what I found, they're missing the "content-type" header
the webhook POST is missing the content-type header, if you can fix it, just set it to application/json (I am using a middleware in rails)
same here, getting initial "IN_QUEUE" status but not receiving any response , runsync works ok
anyone has a fix or know what's wrong?
I think we need to wait for them to wake up, it's currently ~2am in san francisco
for me I did a manual fix by adding a middleware by directly setting all webhook requests as contenttype of json
smth like this fixed it for me in python
Yes, If using nodejs, set the request header to application/json manually and use express json parser to get the parsed body
unfortunately my company uses bubble as a front end service, do i have to make a middleware on my end to solve this?
i think someone pushed an update before they went to sleep
Kek
happens to everyone to be honest
Anyone can take a screenshot I wanna see?
Does it happen on all region? What or what region do you guys use
i only use europe regions
fails on every datacenter
The job fails or it only returns empty output?
it returns a valid output (can be seen in requests tab of an endpoint), but the webhook post request body doesn't seem to be a serialized json
event setting content type manually didnt work for me
trying to serialize it manually from chunks sent
So it errors in webhook only?
But the /status and /run output works well?
Yup
For those who work with node.js and didn't resolve it with manually setting the content-type, here's the custom serializer from chunks into json
(express)
same here 🙌🏻
Sometimes I get empty body (with binary data). Is there any explanations from runpod?
not yet, i see this happens since 03:00 at night UTC
Is there any way to escalate issues like these to runpod staff, especially if it happens in the middle of the night for them?
i guess discord is the only place
Contact button on the website
What tis it like? Any picture?
Is that actually escalating it? I'm guessing they're all sleeping right now
Well no, it creates a new support request
Ok, I guess they will notice either way once they wake up. But I guess a company like runpod should also have monitoring set up that screams at them when suddenly a huge amount of webhooks across all customers fail.
We've recently had an sdk version that didn't work properly for about 2 weeks straight so...
Im using n8n for webhook, I can provide screenshot but there is no programming things, so I don't sure if it's fit or understand other developers
But I think runpod should increase their focus on support or deployment management. Because 1 or 2 months ago, runpod sdk was broken and I couldn't see if I checked discord
but anyway, Im sharing screnshots @nerdylive
The right one is request from runpod
And Runpod respond as binary, I guess
And that's the binary data
Ohh because of the header data type is missing or wrong type I guess yeah.
Yes :/ When they will fix u think?
When they wake up and working
Maybe like few more hours
Hey, sorry about this! We’re aware of the issue and will have a hotfix in next hour. The response is currently missing the application/json header. As a workaround, you can update your code to parse the body as JSON even if the header is missing.
Thanks for quick respond 💪
thanks meow meow 🚀
@yhlong00000 any updates? 👀
Don't worry they will announce it as soon as it's fixed either here , or in #📢|announcements
was in the middle of a huge refactor to make this work but if you are working on it I'll just wait
We’re running the final tests now, it should be ready soon.
We’re pushing the change now; it will take about 15 minutes. I’ll keep you posted.
The release is almost complete, and my testing shows the response looks good. Could you verify it on your end and let me know if you still encounter any issues?
seems to be working now
thanks
It works now. Are there any steps you are taking to prevent issues like this from happening in the future? The biggest issue for us was that there was no official reaction at all for more than 5 hours of our regular working day.
Yes, we will reflect on this incident internally and implement additional safeguards and necessary changes to prevent this from happening again in the future. We’re truly sorry for the inconvenience!
All systems operational now, back to normal
Even with priority support this was quite unnerving, would be great if you could have support team for the midnight san francisco hours
Thanks for the feedback! We’re currently short on support staff but will work towards providing 24/7 support in the future.
I'd like to confirm that our application has recovered from the above issue. Thank you.
I am having issues right now... Cannot create pods through the python package despite having GPUs available. Keep getting the "There are no longer any instances available with the requested specifications. Please refresh and try again" but if I try to create it with exactly the same settings from the Web UI it all works out...
Maybe the machine just freed up? the supply is real time, so when its really free it will be available to rent in around that time
Its been happening since last week. Is I try to hit the button on the webui at the same time with triggering the script, it doesn't work with the runpod python package but it works with the Web UI.
@yhlong00000 https://discord.com/channels/912829806415085598/1307856048173879327
i think these guys are experiencing the same problem
Are any of you still experiencing issues with serverless vllm? I cannot manage to release a working endpoint. I keep getting 500s and even some 502 bad gateway from cloudflare. I don't even know how to further describe my issues, it's days that I'm banging my head on this problem and I'm losing sanity. I tried to rollback to runpod/worker-v1-vllm:v1.6.0stable-cuda12.1.0, without any luck. Lucklily it seems that my old endpoints created in the past few months are not experiencing visible issues
502 are coming in strong now and my in progress requests seems to be multiplying according to inprogress counter (without aparent reason)
The UI refreshes periodically to display the latest GPU availability. When you click the deploy button, the system checks the real-time availability of the GPU. If availability is low and many users are renting or releasing GPUs, it’s possible the UI shows a GPU as available, but by the time you deploy, it’s already taken due to the refresh delay.
maybe try to record a video, screenshot, logs, endpointIds, current settings, those will be useful to figure out the issue.
Didn't manage to collect all the material yet, however it seems related to constraining the generation with:
extra_body={"guided_json": json_schema}
https://docs.vllm.ai/en/latest/usage/structured_outputs.htmlThat also happens with instances that show "Medium" or "High" availability. Is a general issue with Runpod.