Runpod serverless overhead/slow
I have a handler that is apparently running very fast, but my requests are not. I'm hoping to process video frames. I know this is an unconventional use case, but it appears that it is working reasonably well with this one exception:
What we're actually seeing are not fast responses, but responses that take at least a second, and often longer. The runpod dashboard claims that this is execution time, but the worker logs disagree.
Request ID Delay Time Execution Time
sync-61d689e6-502e... 0.18s 1.28s
What's going on here? Is there anything we can do?
I'll post my handler code in the next message.
129 Replies
My handler code looks like this:
I've removed bits and pieces in order for the message to fit, but the important code remains.
I was also wondering if there's some way to use streaming for this, but it doesn't seem like it since we can only stream responses, we cannot stream data to the server, unless I'm misunderstanding something! I'd really love to spin up a worker per-user and let them connect via websockets but I'm not sure if there's a way to do anything like that.
What is running http://localhost:8080/process_frame?
A process that actually processes the video frame and returns data to send to the user. We could probably fold it into the handler, but profiling shows that it takes a small fraction of a second to return.
You're handler looks fine to me... likely your dealy comes from process_frame
It's running this:
https://neuralpanda.ai/elonify
Neural Panda
Make magic with video.
Unless I'm misunderstanding my logging, it's very fast:
logger.info(f"Handler completed in {end_time - start_time:.3f} seconds")
This implies that it posted to the process_frame endpoint and received a response and is sending data back in 0.09 seconds right?
Also, thank you so much for responding so quickly!
Are you running locally?
Yes
If you mean is the process_frame endpoint running locally.
are you doing a docker run each time?
I have tested this both locally and deployed. The stats I've shown you are from the deployed version.
And no, it's a simple FastAPI endpoint (process_frame is)
So from above the handler completes in 0.090 seconds. Is that not fast enough for you?
Oh it is, but I don't get data that fast.
The execution time from Runpod shows as 1.5 seconds+
That's what's confusing me.
How many active, max workers do you have set. Have you enabled Flashboot?
This is with an active worker. Just one that I'm using for testing.
10 max, one active right now.
Are you doing synch or async RUN call?
sync
With Sync you should get a response from the handler as soon as it finishes.... what is your experience?
Well, I don't know where the delay is, but it takes 2-3 seconds to get a response from Runpod even though my handler says its' finishing processing in 0.09 seconds.
And runpod's dashboard says the execution time is 1.5 seconds even though the handler disagrees.
It's this delay that I'm trying to understand.
You should use pods instead or a worker with ws
With synchronous your client connects to the wireless worker, and waits for it to respond. With an active worker it should be booted up and waiting for your API call. So, when you connect your response should be around 0.009. Maybe the delay is part of RunPod's processing of the data? I do not run jobs that are capable of such quick response time.
Or Try asking runpod to open ports for you then setup ws connection and just process a video on serverless on 1job
With my payloads I would not notice a 1.41 second delay
Sorry, had to run out for a moment.
I initially set up a pod with WS but I don't know how to scale that. I then wanted to set up a worker with WS but it didn't seem like that was supported.
I agree it might be part of Runpod's data processing. I posted partly to see if someone from Runpod would be able to confirm, or alternatively point me in the right direction if it were something on my end.
@nerdylive Who would I ask at Runpod? If I could get serverless websockets working I think that might be ideal.
Contact button from the website
Or @flash-singh i guess, and explain to him what are you gonna use it for
Yep i think you have to go to support for opening ports
I'll try that.
We were hoping to get this working tonight. Got a fun little app that we want to let people try asap since the underlying libraries just came out.
are you using runsync? how are you running and getting job results?
websocket is possible but requires more work, much simpler without
@flash-singh I am using runsync. The handler appears to process the job in 0.09 seconds, but the execution time on the Runpod dashboard is ~1.5s.
I'm not sure why that would be, which is why I came here. I would love websockets on serverless, and I have websockets working on a pod. It's much better.
Here are some relevant screenshots.
I sent in a support request as well, but I am really hoping to be able to come up with a solution a bit faster if possible.
how big is your payload or output when job is done?
~50kb
They're video frames jpg encoded with longest side 512. Fairly small.
Nothing that would explain a full second delay.
keep in mind that 1.5s accounts for time before job is even started by the worker, its the time between when job was given to when output is recieved
the time your measuring .09 is between the 1.5s
Understood. This is testing using an active worker with no other users.
With a direct connection to the server it streams quite well (tested using a pod).
If overhead is just that high with serverless then I have to consider an alternative, websockets would be great but of course they aren't exposed on serverless.
direct connection will avoid server in middle and extra overhead for sure
The challenge is scalability. I have a pod running with websockets that's quite fast but supporting larger numbers of users would be very tricky.
extra overhead is about 200-300ms
My hope was to spin up a serverless worker per-user and connect them via websockets.
200-300ms would be a lot better than what I'm seeing.
~5x better.
you can do that if a user will use the whole worker
How would I do that? I wasn't sure how I could establish a connection to the worker.
Maybe ask flash singh for help exposing port
Then you can host a ws server inside serverless
Yes sir. @flash-singh can I expose a port? 🙂
i can do that tomorrow, pm me details about your template id and user account
also ive posed on here quite a bit on how you would use websockets, you can read into that more
Websockets on serverless?
I've got websockets working perfectly on my pod. It's serverless that's the challenge.
yes it does require special way to do it within handler
I did search here for posts on serverless websockets but didn't see anything. Possibly my Discord search skills just aren't quite good enough 🙂
On start job, inside handler maybe open a ws server somehow
you would treat jobs as way to manage workers that run websocket server, then you would run actual inference using websocket communication
I'm pretty sure I can manage by passing back a websocket URL in the initial request.
Yeah that's roughly how I have it set up now. I have a separate server from the handler running the actual processing.
Search websocket on this serverless forum
I think if I have the ports exposed and an address/ip I can send back I should be ok.
yep that would be it
use handler to control lifecycle of websocket server
Yeah we're on the same page. I'll still search the forums in case there's a better example of how to do it than the code I have now.
you can use progress hook to communicate additional info about the job, like port, ip, etc without finishing the job
I was thinking I'd just set it up in a sync call and pass back the ip/port.
your planning to only expose 1 websocket server per worker?
Yes that was the plan. I don't know how well my process would handle concurrency.
you cant use runsync, it will complete the job
Or you would pass a webhook callback url to pass the infos
Ahhh ok. I like the progress call to pass back the required info.
Easier than having another server for the webhook.
you want the websocket server to stay running and close when job is done
How does progress hooks works btw
job done can be user signaling that their done running inference or any other trigger you want to use
until job is running, the worker will stay running
Ok I'll obviously have some testing to do but I think this approach is viable.
progress is different from streams, you can keep checking progress and it will give you last state you set for progress, unlike stream which goes away once consumed
I actually looked into streams for this but of course I'd need to pass frames from the user and get frames back and I couldn't see a way to do that.
My current implementation is a bit hacky, I just send individual frames for processing by whichever worker picks them up (that's where I'm getting the ~1.5s delay).
for the best latency direct connection is best
Yep
Yes ws would be better than that or some socket
If ports are open
By the way, thank you both so much for your help!
we do have direct proxy connection feature but that doesnt have auto scale built-in yet
So how will you return the port and ip after the server is up for connection
Direct proxy???
you can get both ip and port in env variables
I meant, to the client or backend that needs to connect to runpod serverless
direct proxy is where serverless workers run rest api, all we do is forward requests to it
I can send you my serverless endpoint id and account email now. Is there any other information you need?
Should I want until morning to send it?
up to you, ill handle it tomorrow morning
Much appreciated. Once the ports are open I might have a few other questions around how to actually get up and running.
Hopefully I can handle it myself, but if not, I hope you won't mind another question or two 🙂
sure np
@flash-singh Can you hook that up for me as well? I've been building a web socket proxy for this but if I can get a port open that wouldn't be needed any longer.
You can. But you need to manually do a editPodJob to update the ports field of grpahql call on the endpoint’s pod
Wow not the template?
Do you have the graphql to do it on your own?
I’ll get back to you once I’m home to find the code. You can also spin up a pod and open you F12 -> network tab and then edit the pod to expose a random port. then you will see the graph request pop up on it. That’s the one you want
alright thanks
so the pod id is the worker id? im guessing
No the pod is is different
You can go to your serverless page and open F12 network tab and click refresh icon button and you will exam then graphql then you will see a pods field on the myEndpoints field
oh ic
How to use progress hook on serverless worker? I know you only have a webhook param to pass into a job but I thought that’s only triggered when the job finishes
call the webhook manually inside the handler code, but the thread's author doesn't like that way, but theres another way by using streams, you just need to poll the /stream endpoint, like the /status, and adjust the worker accordingly
I see. Already calling it manually
@flash-singh could you give a bit more official guide about how to setup websocket connections on serverless workers? I think we all need this!
or somehow you can update the /status too other than /stream
@briefPeach I was about to say that I would love to have an official guide here in our docs, as I also have a hobby project where I need a websocket connection on serverless workes!
@Encyrption @flash-singh @briefPeach @nerdylive @teddycatsdomino: Please share all the things you already did / know in terms of "websocket on serverless worker" in here or via a DM to me and then I will make sure to put everything together, so that we can cover this use case in an official manner!
You have to open ports, wait official manner? Wdym
Basically everything they say here will be enough ig ask questions maybe we can dig further
I mean like everything that is needed, from implementing this in a worker, to things that need to be done during deployment of the worker to how to test this. So that other people can follow a step by step guide on how to set this up.
Edit pod req
https://api.runpod.io/graphql?api_key=<you_api_key>
{
"operationName": "editPodJob",
"variables": {
"input": {
"podId": "jg9z32zai",
"dockerArgs": "",
"imageName": "weixuanf/runpod-worker-comfy",
"containerDiskInGb": 50,
"volumeInGb": 60,
"volumeMountPath": "/workspace",
"ports": "8080/http,8888/http"
}
},
"query": "mutation editPodJob($input: PodEditJobInput!) {\n podEditJob(input: $input) {\n id\n env\n port\n ports\n dockerArgs\n imageName\n containerDiskInGb\n volumeInGb\n volumeMountPath\n __typename\n }\n}"
}
first find your serverless worker's pods ids, then you use this graphql to edit the ports of your pod
Nice
don't do this, it can be overwritten
The best way to expose posts for serverless workers is using the template, currently only we can do this. I've set it for the users that pinged me about it.
Hmm ima test speeds and latency on ws and sockets can you open ports for me too?
i'll create a mock template which i will fill later
Thank you @flash-singh for all your help!
Here's the high level flow for you to follow once template has ports, that also means workers are allowed to expose port specified.
1. Use job to control lifecycle of the ports / service running on the port
2. In handler, when job runs, run the webserver on the port exposed in the template.
3. Take env variables for public ip and port map env and use progress hook
RUNPOD_PUBLIC_IP
RUNPOD_TCP_PORT_8888
if port exposed is 8888
4. On client-side, read the status, it will have all the progress metadata.
5. Do whatever you need while job is in progress to communicate with WS and run any workload you want.
6. Somehow signal the WS or job to mark itself done, the handler should close to websocket server at this point.
https://docs.runpod.io/serverless/workers/handlers/handler-additional-controlsI do not have any RUNPOD_PUBLIC_IP or RUNPOD_TCP_PORT_8888 in my environment variables. @Encyrption suggested that this may be due to my region. @flash-singh do I have to be on a specific region to access these variables?
😨😨oh I see….
No i guess not
are you accessing it from a different ssh instance?
or from a process ran from a docker start cmd / entrypoint?
I've tried both from my handler and from the web console that's accessible through the dashboard.
I get all of the other Runpod environment variables, just not those two.
It shouldn't be accessible for the web console imo
I've removed any potentially sensitive values, but this is the full set of RUNPOD_ prefixed environment variables accessible on my worker. I've tried deinitializing all workers and letting them reinitialize. I've also tried on a region that @Encyrption confirmed worked for him so it doesn't appear to be regional.
Maybe it's the template
What do you mean? Is there something I should configure on the template that I might have missed?
Figured it out. I gave @flash-singh the endpoint id and it looks like he needed the template id.
Idk I thought maybe the port hasn't been exposed yet?
Right..
I think Teddy gave Sing his endpoint ID when he should have provide his template ID. 😦
Yea, Ill try to test my template tmmrw haha
you need to reset workers
Thank you again Singh for your help. I've dmed you as I'm sure you've seen. I still do not seem to be able to get any value for RUNPOD_TCP_PORT_8888 or RUNPOD_PUBLIC_IP. I've tried setting worker counts to zero and then back up (forcing a reset). I've tried a new endpoint. I've tried different regions. I'm not sure what else to try at this point. If anyone has any ideas as to what I might have done wrong and could point me in the right direction, I would be grateful.
will get some direction on this tomorrow
@flash-singh I am getting deeper into the weeds coding worker to use new port. Can we use the proxy links (https://podid-8188.proxy.runpod.net/) with this like pods do or do I need to handle on my on with dynamic DNS or similar?
you can use proxy, i would have to change port tomorrow for that
Do you mind doing that? Would doing that change anything else?
updated, port is 8888
@teddycatsdomino i see that the ports on the template got reset again, ill check tomorrow, its possible we have mechanism in place that resets it when you update endpoint, will have to fix that
Thank you @flash-singh! I would love to be able to use a proxy link as well.
im not sure how well websockets will work with that, you would have to test
i guess proxy like that is better for http connections right?
I've tried it on a Pod and it worked well.
@flash-singh Since your last update I no longer see these in my ENV variables:
RUNPOD_PUBLIC_IP
RUNPOD_TCP_PORT_8888
I can figure out the proxy URL from the RUNPOD_POD_ID but I will need to know the port to BIND the web socket to. What should I do?
I tried binding websocket to 8888 but that doesn't seem to work.
will check, it might be a bug on our end where the ports get reset on template
Thanks! Please let me know what you find and if I need to do anything on my end.
what happens is that if you update the template in our UI, it will reset the ports, we are planning a fix for it this week
@flash-singh Thank you for looking into and it's great to hear that a fix is on the way! Would it be at all possible to get something working today if we don't change the template? Also, what counts as an updated template? A new release? Changing env vars? I wouldn't need to change anything except the docker image version.
any changes to template, so yes env vars are part of it
we are planning release this week to the UI which will allow you to set ports yourself in the template
But overrides and deploying new versions using the serverless dashboard would work?
@flash-singh I understand if we need to wait for the update, but we were hoping to go live yesterday and this is the only blocker for us right now. If it is at all possible to get a port (with or without a proxy link) we should be able to get up and running almost immediately. Let me know if there's anything we can do, and, as always, thank you for your help!
For anyone keeping an eye on this, flash-singh did kindly set up a proxy link for me, but unfortunately deploying a release via the serverless dashboard does count as updating the template. Looks like anyone hoping to use this approach will need to wait for the update and official support.
new change is live for serverless templates, you can add ports now
Thank you @flash-singh! This is working very well.