R
RunPod10mo ago
fanbing

Cold Start Time is too long

When i test a HelloWorld project, run , it take too much time. Worker Configuration as attachment, I have enable FlashBoot, it say can reduce cold start time to 2 s. In Documentation, I see "The Delay Time should be extremely minimal, unless the API process was spun up from a cold start, then a sizable delay is expected for the first request sent." "a sizable delay" mean if from a cold start, it may be 12s? Is there anything I misunderstand? please let me know.
No description
No description
No description
8 Replies
ashleyk
ashleyk10mo ago
Delay time will be extremely high when you're using a GPU type that has "Low availability". I suggest creating a new network volume and new endpoint in a different region that has higher availability. FlashBoot can't offer 2s cold start time if your application takes longer than 2s to load models etc. You also only benefit from FlashBoot if you send a constant flow of requests, not if you only make occassional requests to the endpoint.
fanbing
fanbingOP10mo ago
thanks,another question, Does the delay time or cold start time count towards the cost, or does the fee only include the execution time?
ashleyk
ashleyk10mo ago
Cold start time is part of delay time. You are charged for the cold start part of delay time but not for the part of delay time where your request is in the queue. You are basically charged for the entire duration that the worker runs, including cold start and the idle time that you configure. Not just execution time.
fanbing
fanbingOP10mo ago
ok, I see. I test a high avaliability GPU in same region, delay time is 10s , better than last ,but not good enough.
ashleyk
ashleyk10mo ago
Probably mostly cold start time if there is high availbility for the GPU tier.
fanbing
fanbingOP10mo ago
Document :“Network Volume :This will limit the availability of cards, as your endpoint workers will be locked to the datacenter that houses your network volume.” if i change new network volume in a different region that has higher availability, Does this mean that the worker will only use the GPU from the data center in that region, and cannot use GPUs from other regions?
ashleyk
ashleyk10mo ago
Yes, that is correct.
fanbing
fanbingOP10mo ago
Understood, thank you.
Want results from more Discord servers?
Add your server