R
RunPodβ€’12mo ago
rafael21@

Cold start time

Does anyone know the cold start time of model hosted in serverless runpod? Kandinsky model. πŸ™‚
13 Replies
flash-singh
flash-singhβ€’12mo ago
that is last week
No description
rafael21@
rafael21@OPβ€’12mo ago
thanks man. Are you using runpod Kandinsky endpoint or hosting it yourself in serverless?
flash-singh
flash-singhβ€’12mo ago
that is the runpod endpoint
rafael21@
rafael21@OPβ€’12mo ago
but is it Kandinsky 3.0?
flash-singh
flash-singhβ€’12mo ago
v2
rafael21@
rafael21@OPβ€’12mo ago
hummmm.. I need 3.0
flash-singh
flash-singhβ€’12mo ago
we don't have v3
rafael21@
rafael21@OPβ€’12mo ago
cold start time will be higher if I host v3 myself in serverless?
flash-singh
flash-singhβ€’12mo ago
its about the same, depends on workload
rafael21@
rafael21@OPβ€’12mo ago
workload means amount of requests ? My web app would have few requests per day... like 5, 10 the cold start time can get higher than 3, 4 seconds?
flash-singh
flash-singhβ€’12mo ago
yes true cold start, yes, you would have to measure worst xase for v3, for v2 seems to be around 12s
rafael21@
rafael21@OPβ€’12mo ago
😧 12 seconds is a lot
justin
justinβ€’12mo ago
For something like this why not just chatgpt curious? But if u know a worker is potentially about to be pinged u could try to pre-start the worker with a warm up request and set idle time to 2 mins like if u see someone typing or open up a chat on ur web app send a prewarm request so worker is on and set it to idle for 3 mins for potential incoming requests Personally tho for the thing ur talking about i use runpod for heavier transcriptions / image gen etc. but if u need a fast llm response i found chatgpt is still the best at low volume workstreams before makes sense to host ur own
Want results from more Discord servers?
Add your server