Cold Start Time is too long
When i test a HelloWorld project, run , it take too much time. Worker Configuration as attachment, I have enable FlashBoot, it say can reduce cold start time to 2 s. In Documentation, I see "The Delay Time should be extremely minimal, unless the API process was spun up from a cold start, then a sizable delay is expected for the first request sent." "a sizable delay" mean if from a cold start, it may be 12s? Is there anything I misunderstand? please let me know.
8 Replies
Delay time will be extremely high when you're using a GPU type that has "Low availability". I suggest creating a new network volume and new endpoint in a different region that has higher availability. FlashBoot can't offer 2s cold start time if your application takes longer than 2s to load models etc. You also only benefit from FlashBoot if you send a constant flow of requests, not if you only make occassional requests to the endpoint.
thanks,another question, Does the delay time or cold start time count towards the cost, or does the fee only include the execution time?
Cold start time is part of delay time. You are charged for the cold start part of delay time but not for the part of delay time where your request is in the queue. You are basically charged for the entire duration that the worker runs, including cold start and the idle time that you configure.
Not just execution time.
ok, I see. I test a high avaliability GPU in same region, delay time is 10s , better than last ,but not good enough.
Probably mostly cold start time if there is high availbility for the GPU tier.
Document :“Network Volume :This will limit the availability of cards, as your endpoint workers will be locked to the datacenter that houses your network volume.” if i change new network volume in a different region that has higher availability, Does this mean that the worker will only use the GPU from the data center in that region, and cannot use GPUs from other regions?
Yes, that is correct.
Understood, thank you.