RunPod•9mo ago

Is there an equivalent of flash boot for CPU-only serverless?

I was trying to figure out if there was a way to have a CPU job only fire up when it was needed so it would not accrue charges when idle (like flash boot for GPU serverless) Thanks!

11 Replies

nerdylive•9mo ago

Cpu serverless does this What do you mean if I'm not wrong you have asked this again after I answered in general, do you mind explaining more about that or the difference with cpu serverless?

Stone JohnsonOP•9mo ago

Oh apologies I did not see the reply for ssome reason yeah I thought I tried it and it ran continously but let me look into it

nerdylive•9mo ago

Then there is something wrong in your code probably, thats still running in the main thread

Stone JohnsonOP•9mo ago

OK it's working fine - my question is - for GPU+FlashBoot the initial delay time is less than one second - for CPU it is more than 6 seconds! Is there a way to reduce the CPU initial wait time? (same container, same request, the container has no CUDA so runs fine on CPU, just has that long initial delay)

haris•9mo ago

@Stone Johnson what's your workload?

Stone JohnsonOP•9mo ago

low so far but how come a huge LLM package can flash boot to GPU in 0.5 sec, but a dinky 8 GB container takes 6 sec of delay on a CPU? for my app turnaround time is key falsh boot on GPU cloud is truly incredble warm start on CPU cloud inexplicably 10x slower

haris•9mo ago

Sorry, I more so meant how are you using the CPUs so we have a better understanding of why it takes six seconds to cold boot

Stone JohnsonOP•9mo ago

ok same container as for GPU. uses Jason's simple py handler for api calls, on CPU cloud way simple

Stone JohnsonOP•9mo ago

container is 8 GB, py is very close to https://blog.runpod.io/serverless-create-a-basic-api/

RunPod Blog

Serverless | Create a Custom Basic API

RunPod's Serverless platform allows for the creation of API endpoints that automatically scale to meet demand. The tutorial guides you through creating a basic worker and turning it into an API endpoint on the RunPod serverless platform. For this tutorial, we will create an API endpoint that helps us accomplish

Stone JohnsonOP•9mo ago

running a bash script with a couple of executables, very boring (sox, lame and piper, voice processing stuff) a typical job takes 2 sec of exec time on either GPU or CPU cloud (I'm not using the GPU at all as far as I can tell, no CUDA in environment) any ideas?

nerdylive•9mo ago

So you're getting different performance on serverless with your cpu pod? Depending on the package you use it can be used to cached to ram, but flashboot for serverless shouldn't be available yet I guess Also what may cause difference is the flashboot itself the model can be loaded into vram which is most likely faster than ram In cpu instances Or limited disk speeds from Runpod's machines (might be)

Gaming

Programming

Is there an equivalent of flash boot for CPU-only serverless?

Did you find this page helpful?