R
RunPod10mo ago
Martin

How to load model into memory before the first run of a pod?

In the template worker, in the handler file it is written:
# If your handler runs inference on a model, load the model here.
# You will want models to be loaded into memory before starting serverless.
# If your handler runs inference on a model, load the model here.
# You will want models to be loaded into memory before starting serverless.
I am loading my model here. But when a new pod started in my endpoint, it's first run will systematically take more than 10s because it is loading the model. This results in some requests taking more than 10x longer that the expected latency. Is there a way to load the model as soon as the new pod is "active"? Thanks.
6 Replies
ashleyk
ashleyk10mo ago
Enable FlashBoot, but its only effective if you have a constant flow of requests. By the way serverless and pods are 2 completely different things, there are no pods in serverless, only workers.
Martin
MartinOP10mo ago
What is flashboot doing? Is it running this part ahead? Why is it not running it when I have a flow that is not constant?
ashleyk
ashleyk10mo ago
Because workers are shared between customers. You can also set active workers, but they are running constantly and pretty expensive.
Madiator2011
Madiator201110mo ago
This part is asking it's for loading model to VRAM so lets say you have SD model it will load it on first boot and after job is done it will keep model in VRAM so it does not need to load it again. This is for active worker mostly as for normal workers worker is going down after job is done.
Martin
MartinOP10mo ago
Then how can you explain the first request hitting the worker is taking much more time than the next ones, even after having the worker down for some time? What I would expect is that on the boot of the worker: - image is loaded - first part of the handler runs (loading my model) So then when a request is hitting the worker for the first time it will be as quick as the next times.
ashleyk
ashleyk10mo ago
FlashBoot, as I said. FlashBoot is also not guaranteed, also as I said, it depends on your flow of requests.
Want results from more Discord servers?
Add your server