Martin
Martin
RRunPod
Created by Martin on 3/18/2024 in #⚡|serverless
How to load model into memory before the first run of a pod?
Then how can you explain the first request hitting the worker is taking much more time than the next ones, even after having the worker down for some time? What I would expect is that on the boot of the worker: - image is loaded - first part of the handler runs (loading my model) So then when a request is hitting the worker for the first time it will be as quick as the next times.
11 replies
RRunPod
Created by Martin on 3/18/2024 in #⚡|serverless
How to load model into memory before the first run of a pod?
What is flashboot doing? Is it running this part ahead? Why is it not running it when I have a flow that is not constant?
11 replies