queue delay times
Hi , I'm seeing really long delay times . even though there's nothing in the queue , and this is a really small CPU serverless endpoint . Any idea what causes this ?

17 Replies
I'm having same issue, on 16gb/24gb, my request stays a shit ton of time in the queue, these are only two items

cc @PRB - any issue on the statuspage ?
@sahir which datacenter are you running this in?
@youssef can you open a support ticket? we are already looking at this but will be nice to keep track and get back to you
all locations were selected , so its made workers here

This is happening to my other endpoints too now
Same here, almost 2 minutes cold start every time
but once every few requests it goes <5 seconds again

are you guys on cpu endpoints or GPU endpoints?
@Kays please reply so i can resolve faster
GPU endpoints
Its mostly A100 for me
I mean h100*
endpoint id will help
pury32p7r6r4wf
I can give you an example test request if you like
your cold starts are high, are you loading model from network volume or is the model just too big?
I'm not using network volumes, the model is flux-dev (24gb)
But what's weird is that cold start sometimes is extremely quick, like under 5 seconds
Hey there, any updates on this? Is it just the model being too big? @PRB @flash-singh thanks!
seems to be fixed now somehow
thats just flashboot, anything over 10s should be your ideal cold start, is your model baked into the container image?
yes it is on the container
right now I'm getting around 50/50 flashboots
thats about right, depends on workload and capacity, for h100s thats really good if your p50 is hitting flashboot
Cool, yes I’m happy with that rate, yesterday was more like 90-10 that’s why I mentioned it