R
RunPod3mo ago
GAB

Active workers or Flex workers? - Stable Diffusion

I'm integrating Stable Diffusion into a mobile application where user prompts are sent to RunPod for image generation, with the results sent back to the app. The usage is highly variable, ranging from 15 to 100 image generations per day, and there may be days with no usage at all. Given this variability, should I opt for active workers or flex workers in RunPod for the most efficient scaling and cost management? And in my case, what is Flex workers/Active workers suitable for?
11 Replies
yhlong00000
yhlong000003mo ago
Active workers help reduce cold start time but come with a cost, even with a 30% discount when idle. Based on your traffic of 100 images/day, I wouldn’t recommend using active workers unless you need super-fast responses. You can configure 3-5 max workers to handle traffic surges for scaling. By the way, where did you see ‘flex worker’? It’s an old term, and we should consider removing it to avoid confusion.😂
nerdylive
nerdylive3mo ago
Serverless GPU Endpoints for AI Inference
Run machine learning inference at scale with RunPod Serverless GPU endpoints.
No description
No description
No description
No description
No description
No description
nerdylive
nerdylive3mo ago
What is flex called now
yhlong00000
yhlong000003mo ago
😂😂,it’s me luck history knowledge🥲, the flex worker pretty much mean every other type of worker except active worker
nerdylive
nerdylive3mo ago
yea thats what i understand too
GAB
GABOP3mo ago
Ahhhh so it's active worker, vs non-active. In my case, would the non-active worker scale to 0, upon zero traffic?
yhlong00000
yhlong000003mo ago
Check this doc and let me know if you have more questions: https://docs.runpod.io/serverless/references/endpoint-configurations#active-min-workers
Endpoint configurations | RunPod Documentation
Configure your Endpoint settings to optimize performance and cost, including GPU selection, worker count, idle timeout, and advanced options like data centers, network volumes, and scaling strategies.
GAB
GABOP3mo ago
I'm having trouble understanding the documentation. [Idle Timeout] The amount of time in seconds a worker not currently processing a job will remain active until it is put back into standby. During the idle period, your worker is considered running and will incur a charge. Does that mean, there is no option to do something like "charge per generation"? I was initially interested in on the fact that this obsolete "flex worker" has the ability to run only if required. And if no computation is used, there is 0 charges. "You will incur the cost of any active workers you have set regardless if they are working on a job." Does that mean, there is no such thing as... lets say, an API call from my app to Runpod. Which then uses a worker to generate an image. And then worker shuts down after image is generated?
nerdylive
nerdylive3mo ago
idle timeout = "idle active" so it keeps the worker active for x amount of secs to keep the model warmer for faster starts yes just set it to 1 sec then, it will go off after 1 second if you got no other active workers
yhlong00000
yhlong000003mo ago
A serverless worker runs per job, and once it’s done, it shuts off to save costs. The idle timeout keeps the worker running for a few extra seconds in case another request comes in right after, so you can avoid a cold start.
GAB
GABOP3mo ago
Ahhh that's what I was hoping as an answer. Thank you! @nerdylive also helped explain about the timeouts, that will be amazing for my task. Thank you!
Want results from more Discord servers?
Add your server