Is there an equivalent of flash boot for CPU-only serverless?

I was trying to figure out if there was a way to have a CPU job only fire up when it was needed so it would not accrue charges when idle (like flash boot for GPU serverless) Thanks!
11 Replies
nerdylive
nerdylive7mo ago
Cpu serverless does this What do you mean if I'm not wrong you have asked this again after I answered in general, do you mind explaining more about that or the difference with cpu serverless?
Stone Johnson
Stone JohnsonOP7mo ago
Oh apologies I did not see the reply for ssome reason yeah I thought I tried it and it ran continously but let me look into it
nerdylive
nerdylive7mo ago
Then there is something wrong in your code probably, thats still running in the main thread
Stone Johnson
Stone JohnsonOP7mo ago
OK it's working fine - my question is - for GPU+FlashBoot the initial delay time is less than one second - for CPU it is more than 6 seconds! Is there a way to reduce the CPU initial wait time? (same container, same request, the container has no CUDA so runs fine on CPU, just has that long initial delay)
haris
haris7mo ago
@Stone Johnson what's your workload?
Stone Johnson
Stone JohnsonOP7mo ago
low so far but how come a huge LLM package can flash boot to GPU in 0.5 sec, but a dinky 8 GB container takes 6 sec of delay on a CPU? for my app turnaround time is key falsh boot on GPU cloud is truly incredble warm start on CPU cloud inexplicably 10x slower
haris
haris7mo ago
Sorry, I more so meant how are you using the CPUs so we have a better understanding of why it takes six seconds to cold boot
Stone Johnson
Stone JohnsonOP7mo ago
ok same container as for GPU. uses Jason's simple py handler for api calls, on CPU cloud way simple
Stone Johnson
Stone JohnsonOP7mo ago
container is 8 GB, py is very close to https://blog.runpod.io/serverless-create-a-basic-api/
RunPod Blog
Serverless | Create a Custom Basic API
RunPod's Serverless platform allows for the creation of API endpoints that automatically scale to meet demand. The tutorial guides you through creating a basic worker and turning it into an API endpoint on the RunPod serverless platform. For this tutorial, we will create an API endpoint that helps us accomplish
Stone Johnson
Stone JohnsonOP7mo ago
running a bash script with a couple of executables, very boring (sox, lame and piper, voice processing stuff) a typical job takes 2 sec of exec time on either GPU or CPU cloud (I'm not using the GPU at all as far as I can tell, no CUDA in environment) any ideas?
nerdylive
nerdylive6mo ago
So you're getting different performance on serverless with your cpu pod? Depending on the package you use it can be used to cached to ram, but flashboot for serverless shouldn't be available yet I guess Also what may cause difference is the flashboot itself the model can be loaded into vram which is most likely faster than ram In cpu instances Or limited disk speeds from Runpod's machines (might be)
Want results from more Discord servers?
Add your server