Cold start time
Does anyone know the cold start time of model hosted in serverless runpod? Kandinsky model. π
13 Replies
that is last week
thanks man. Are you using runpod Kandinsky endpoint or hosting it yourself in serverless?
that is the runpod endpoint
but is it Kandinsky 3.0?
v2
hummmm.. I need 3.0
we don't have v3
cold start time will be higher if I host v3 myself in serverless?
its about the same, depends on workload
workload means amount of requests ?
My web app would have few requests per day... like 5, 10
the cold start time can get higher than 3, 4 seconds?
yes
true cold start, yes, you would have to measure worst xase for v3, for v2 seems to be around 12s
π§
12 seconds is a lot
For something like this
why not just chatgpt curious?
But if u know a worker is potentially about to be pinged u could try to pre-start the worker with a warm up request
and set idle time to 2 mins
like if u see someone typing or open up a chat on ur web app
send a prewarm request so worker is on and set it to idle for 3 mins for potential incoming requests
Personally tho for the thing ur talking about i use runpod for heavier transcriptions / image gen etc.
but if u need a fast llm response i found chatgpt is still the best at low volume workstreams before makes sense to host ur own