How to reduce cold start & execution time?

Hi , i have a serverless endpoint and it have like 70 sec cold start and 50 sec execution time.I was trying to change the gpu's and someting happened , it started to work so fast like 500ms cold starts and 10 sec exection time and output was fine? How did that happen do you guys have any idea ? how can i achieve that again? (Now it's broken, 70sec + 50 sec again) , its not about gpu's i guess im on 80GB gpu and it still tooks 50 secs.I dont know how that happened.Flash boot is enabled but its not working rn i guess.
7 Replies
ashleyk
ashleykβ€’12mo ago
Flash boot only really provides a benefit if you have a constant flow of requests.
πŸ¦„πŸ¦„
πŸ¦„πŸ¦„OPβ€’12mo ago
is it also helps to reduce execution time its fast again , i even dont wait queue its like instant currently -> "executionTime": 12128, its was like 70000 before
justin
justinβ€’12mo ago
some resources maybe explaining flashboot better is this like video? audio? image?
ssssteven
sssstevenβ€’12mo ago
It would be nice to understand what flashboot is
justin
justinβ€’12mo ago
yea i wish so too haha. but at least from the thread i linked its basically seems like some sort of caching procedure which is why more workers / or active workers all help with them reducing that cold start
ashleyk
ashleykβ€’12mo ago
flash boot basically keeps a worker on stand-by to accept new requests if you have a constant flow of requests so that you don't need to wait for the worker to start up and load everything, it doesn't really provide any benefit unless you have constant flow of requests.
Want results from more Discord servers?
Add your server