How do i deploy a Worker with a Pod?
I have deployed a worker with a Serverless deployment, now i expected to be able to deploy the exact same image to a Pod and be able to have an endpoint URL to make a similar Worker request, but i'm not having success?
I am currently using the following as the initial entrypoint for handler.py...
Is there any doc that discusses how to get a Serverless Worker deployed to a Pod?
thx.
12 Replies
If you want to use serverless as a pod you will need to provide an alternative Container Start Command. But, without modifications this will not run an API on your pod. Serverless doesn't run an API directly. RunPod infrastructure does that part. To run an API on your pod you would have to have a web server (flask, etc.) to respond to queries. Your pod Start Command should should point to a script that starts that web server.
You can edit the start command in your Template on RunPod web interface.
I was assuming that Pods and Serverless could be deployed interchangeably for a single client API? I was planning on having a core of "always-on" Pods, then autoscale using Serverless? It sounds like that is not possible?
Is auto-scaling possible with Pods?
Thx.
It is not directly possible, no, You might be better off sticking with serverless and configuring some active servers. Active servers are always on and RunPod gives you a 30% discount when you use active servers.
Flashboot should help also. With that enabled, once your worker finishes a request it will respond to another, if there are any in the queue. So, as your traffic increases that scales along with it.
My eyeball inspection of pricing is that a community pod is about 1/2 the price of serverless active? Pods also have more certainty of GPU, network, location, etc?...
Oh well, at least i understand the constraints now. thx.
With serverless, you have complete control over GPU, network, location, Cuda version, on par with pods. You do have more GPU choices, on the high end, with pods. But serverless is geared towards inference and you shouldn't really need more than 80GB of VRAM for inference. When comparing prices I would suggest to compare pods in secure cloud to serverless as all serverless are in secure cloud.
You can do as you described, having pods acting as active servers but then you will have 2 endpoints, 1 for pod, and one for your workers. Beyond building and configuring API web server for the pod you would need to run an API proxy in front of both endpoints, deciding which to use on the fly. To me this is a lot of extra work for a small potential reward especially when the scaling for serverless workers is so robust on its own already.
When comparing prices I would suggest to compare pods in secure cloud to serverless as all serverless are in secure cloud.Yes, they might be roughly close, but if i'm OK with using Community, it still does not change the fact that POD may be considerably less than Servless (for always-on). Thanks for all your suggestions and feedback, will take it all into account.
sure thing, good luck! 🙂
It would be cool if there was a serverless worker option in Community. Maybe suggest that in the Feedback section?
Sure. It would also be nice if the Worker API could be unified across Pod and Serverless.
#🧐|feedback