Noah Yoshida
Noah Yoshida
RRunPod
Created by Charixfox on 5/21/2024 in #⚡|serverless
Speed up cold start on large models
it take 0.1 delay and then 160 seconds of execution time, sometimes it takes 40 delay and 200 exec, and as long as the pod not idle, it takes about 1s delay and 20-40s of exec time.
This was a common problem when people tried to bake large files into AWS AMIs. What would happen is that AWS would start up the EC2 instance quickly, but lazy-load the bytes needed to run applications, so while the app would be quick to start, it would need to load in the entire data file before serving a request, so the initial request would take forever. There are techniques to do this with containers as well - only load in layers needed for initial application start up, then load in the rest of the layers whenever their files are attempted to be accessed by the running application. I would bet something similar is happening here. It is generally a good way to improve startup times for containers that don't need massive files/libraries to run their main application, but is a pretty sneaky anti-pattern for the usecase of storing LLM weights in containers. You think the container is starting up fast, but really its going to take an equal or longer amount of time to download the LLM weights behind the scenes :/
17 replies