Is there an equivalent of flash boot for CPU-only serverless?
I was trying to figure out if there was a way to have a CPU job only fire up when it was needed so it would not accrue charges when idle (like flash boot for GPU serverless) Thanks!
11 Replies
Cpu serverless does this
What do you mean if I'm not wrong you have asked this again after I answered in general, do you mind explaining more about that or the difference with cpu serverless?
Oh apologies I did not see the reply for ssome reason
yeah I thought I tried it and it ran continously but let me look into it
Then there is something wrong in your code probably, thats still running in the main thread
OK it's working fine - my question is - for GPU+FlashBoot the initial delay time is less than one second - for CPU it is more than 6 seconds! Is there a way to reduce the CPU initial wait time? (same container, same request, the container has no CUDA so runs fine on CPU, just has that long initial delay)
@Stone Johnson what's your workload?
low so far but how come a huge LLM package can flash boot to GPU in 0.5 sec, but a dinky 8 GB container takes 6 sec of delay on a CPU?
for my app turnaround time is key
falsh boot on GPU cloud is truly incredble
warm start on CPU cloud inexplicably 10x slower
Sorry, I more so meant how are you using the CPUs so we have a better understanding of why it takes six seconds to cold boot
ok same container as for GPU. uses Jason's simple py handler for api calls, on CPU cloud
way simple
container is 8 GB, py is very close to https://blog.runpod.io/serverless-create-a-basic-api/
RunPod Blog
Serverless | Create a Custom Basic API
RunPod's Serverless platform allows for the creation of API endpoints that automatically scale to meet demand. The tutorial guides you through creating a basic worker and turning it into an API endpoint on the RunPod serverless platform. For this tutorial, we will create an API endpoint that helps us accomplish
running a bash script with a couple of executables, very boring (sox, lame and piper, voice processing stuff)
a typical job takes 2 sec of exec time on either GPU or CPU cloud (I'm not using the GPU at all as far as I can tell, no CUDA in environment)
any ideas?
So you're getting different performance on serverless with your cpu pod?
Depending on the package you use it can be used to cached to ram, but flashboot for serverless shouldn't be available yet I guess
Also what may cause difference is the flashboot itself the model can be loaded into vram which is most likely faster than ram In cpu instances
Or limited disk speeds from Runpod's machines (might be)