blistick
RRunPod
•Created by blistick on 1/5/2024 in #⚡|serverless
What does "throttled" mean?
Yes, official guidance would be good. Like you, I don't want to incur the cost of always active.
12 replies
RRunPod
•Created by blistick on 1/5/2024 in #⚡|serverless
What does "throttled" mean?
Constant throttling is rather scary from a production standpoint.
12 replies
RRunPod
•Created by blistick on 1/5/2024 in #⚡|serverless
What does "throttled" mean?
@justin @ashleyk Thank you both very much for this advice.
To summarize it seems I should, (a) have at least 2 max workers, and (b) enable as many regions as possible for my endpoint.
(@justin I followed your previous advice about improving worker startup time by NOT using a network drive (which really helped, btw) but I forgot to edit my endpoint to allow more regions.)
12 replies
RRunPod
•Created by blistick on 12/26/2023 in #⚡|serverless
Slow model loading
Haha. Well, if you have ZERO clue that must mean I have NEGATIVE clue.
I'll check out those resources.
22 replies
RRunPod
•Created by blistick on 12/26/2023 in #⚡|serverless
Slow model loading
@justin This all makes sense. I've got some work to do, but at least there's a path forward to better performance. Thanks again! (And Happy Holidays!)
22 replies
RRunPod
•Created by blistick on 12/26/2023 in #⚡|serverless
Slow model loading
@justin Thanks very much for this useful information. Lots to try!
I'm confused about your suggestion to keep the models in memory. I already load them outside of my handler function, but even for an "active" worker, that code isn't executed until a request comes in, at least from what I can tell. Maybe active workers will only execute that code once, so they do stay in memory. I'll do more tests to see.
Of course, for cost reasons I'd rather not need to keep an active worker.
22 replies