Active workers or Flex workers? - Stable Diffusion
I'm integrating Stable Diffusion into a mobile application where user prompts are sent to RunPod for image generation, with the results sent back to the app. The usage is highly variable, ranging from 15 to 100 image generations per day, and there may be days with no usage at all.
Given this variability, should I opt for active workers or flex workers in RunPod for the most efficient scaling and cost management?
And in my case, what is Flex workers/Active workers suitable for?
11 Replies
Active workers help reduce cold start time but come with a cost, even with a 30% discount when idle. Based on your traffic of 100 images/day, I wouldn’t recommend using active workers unless you need super-fast responses. You can configure 3-5 max workers to handle traffic surges for scaling. By the way, where did you see ‘flex worker’? It’s an old term, and we should consider removing it to avoid confusion.😂
probably here:
https://www.runpod.io/serverless-gpu
Serverless GPU Endpoints for AI Inference
Run machine learning inference at scale with RunPod Serverless GPU endpoints.
What is flex called now
😂😂,it’s me luck history knowledge🥲, the flex worker pretty much mean every other type of worker except active worker
yea thats what i understand too
Ahhhh so it's active worker, vs non-active.
In my case, would the non-active worker scale to 0, upon zero traffic?
Check this doc and let me know if you have more questions: https://docs.runpod.io/serverless/references/endpoint-configurations#active-min-workers
Endpoint configurations | RunPod Documentation
Configure your Endpoint settings to optimize performance and cost, including GPU selection, worker count, idle timeout, and advanced options like data centers, network volumes, and scaling strategies.
I'm having trouble understanding the documentation.
[Idle Timeout]
The amount of time in seconds a worker not currently processing a job will remain active until it is put back into standby. During the idle period, your worker is considered running and will incur a charge.
Does that mean, there is no option to do something like "charge per generation"?
I was initially interested in on the fact that this obsolete "flex worker" has the ability to run only if required. And if no computation is used, there is 0 charges.
"You will incur the cost of any active workers you have set regardless if they are working on a job."
Does that mean, there is no such thing as... lets say, an API call from my app to Runpod. Which then uses a worker to generate an image. And then worker shuts down after image is generated?
idle timeout = "idle active" so it keeps the worker active for x amount of secs to keep the model warmer for faster starts
yes just set it to 1 sec then, it will go off after 1 second if you got no other active workers
A serverless worker runs per job, and once it’s done, it shuts off to save costs. The idle timeout keeps the worker running for a few extra seconds in case another request comes in right after, so you can avoid a cold start.
Ahhh that's what I was hoping as an answer. Thank you! @nerdylive also helped explain about the timeouts, that will be amazing for my task. Thank you!