Martin Posts - Answer Overflow

Martin

•Created by Martin on 3/18/2024 in #⚡｜serverless

How to load model into memory before the first run of a pod?

In the template worker, in the handler file it is written:

# If your handler runs inference on a model, load the model here.
# You will want models to be loaded into memory before starting serverless.

# If your handler runs inference on a model, load the model here.
# You will want models to be loaded into memory before starting serverless.

I am loading my model here. But when a new pod started in my endpoint, it's first run will systematically take more than 10s because it is loading the model. This results in some requests taking more than 10x longer that the expected latency. Is there a way to load the model as soon as the new pod is "active"? Thanks.

11 replies

Gaming

Programming