Martin
RRunPod
•Created by Martin on 3/18/2024 in #⚡|serverless
How to load model into memory before the first run of a pod?
Then how can you explain the first request hitting the worker is taking much more time than the next ones, even after having the worker down for some time?
What I would expect is that on the boot of the worker:
- image is loaded
- first part of the handler runs (loading my model)
So then when a request is hitting the worker for the first time it will be as quick as the next times.
11 replies
RRunPod
•Created by Martin on 3/18/2024 in #⚡|serverless
How to load model into memory before the first run of a pod?
What is flashboot doing? Is it running this part ahead?
Why is it not running it when I have a flow that is not constant?
11 replies