How to reduce cold start & execution time?
Hi , i have a serverless endpoint and it have like 70 sec cold start and 50 sec execution time.I was trying to change the gpu's and
someting happened , it started to work so fast like 500ms cold starts and 10 sec exection time and output was fine?
How did that happen do you guys have any idea ? how can i achieve that again?
(Now it's broken, 70sec + 50 sec again) , its not about gpu's i guess im on 80GB gpu and it still tooks 50 secs.I dont know how that happened.Flash boot is enabled but its not working rn i guess.
7 Replies
Flash boot only really provides a benefit if you have a constant flow of requests.
is it also helps to reduce execution time
its fast again , i even dont wait queue its like instant
currently -> "executionTime": 12128, its was like 70000 before
Solution
some resources maybe explaining flashboot better
is this like video? audio? image?
It would be nice to understand what flashboot is
yea i wish so too haha. but at least from the thread i linked its basically seems like some sort of caching procedure which is why more workers / or active workers all help with them reducing that cold start
flash boot basically keeps a worker on stand-by to accept new requests if you have a constant flow of requests so that you don't need to wait for the worker to start up and load everything, it doesn't really provide any benefit unless you have constant flow of requests.