Stone Johnson Comments - Answer Overflow

Topics

Stone Johnson

•Created by Stone Johnson on 6/23/2024 in #⚡｜serverless

Is there an equivalent of flash boot for CPU-only serverless?

any ideas?

23 replies

•Created by Stone Johnson on 6/23/2024 in #⚡｜serverless

Is there an equivalent of flash boot for CPU-only serverless?

a typical job takes 2 sec of exec time on either GPU or CPU cloud (I'm not using the GPU at all as far as I can tell, no CUDA in environment)

23 replies

•Created by Stone Johnson on 6/23/2024 in #⚡｜serverless

Is there an equivalent of flash boot for CPU-only serverless?

running a bash script with a couple of executables, very boring (sox, lame and piper, voice processing stuff)

23 replies

•Created by Stone Johnson on 6/23/2024 in #⚡｜serverless

Is there an equivalent of flash boot for CPU-only serverless?

container is 8 GB, py is very close to https://blog.runpod.io/serverless-create-a-basic-api/

23 replies

•Created by Stone Johnson on 6/23/2024 in #⚡｜serverless

Is there an equivalent of flash boot for CPU-only serverless?

way simple

23 replies

•Created by Stone Johnson on 6/23/2024 in #⚡｜serverless

Is there an equivalent of flash boot for CPU-only serverless?

ok same container as for GPU. uses Jason's simple py handler for api calls, on CPU cloud

23 replies

•Created by Stone Johnson on 6/23/2024 in #⚡｜serverless

Is there an equivalent of flash boot for CPU-only serverless?

warm start on CPU cloud inexplicably 10x slower

23 replies

•Created by Stone Johnson on 6/23/2024 in #⚡｜serverless

Is there an equivalent of flash boot for CPU-only serverless?

falsh boot on GPU cloud is truly incredble

23 replies

•Created by Stone Johnson on 6/23/2024 in #⚡｜serverless

Is there an equivalent of flash boot for CPU-only serverless?

for my app turnaround time is key

23 replies

•Created by Stone Johnson on 6/23/2024 in #⚡｜serverless

Is there an equivalent of flash boot for CPU-only serverless?

low so far but how come a huge LLM package can flash boot to GPU in 0.5 sec, but a dinky 8 GB container takes 6 sec of delay on a CPU?

23 replies

•Created by Stone Johnson on 6/23/2024 in #⚡｜serverless

Is there an equivalent of flash boot for CPU-only serverless?

OK it's working fine - my question is - for GPU+FlashBoot the initial delay time is less than one second - for CPU it is more than 6 seconds! Is there a way to reduce the CPU initial wait time? (same container, same request, the container has no CUDA so runs fine on CPU, just has that long initial delay)

23 replies

•Created by Stone Johnson on 6/23/2024 in #⚡｜serverless

Is there an equivalent of flash boot for CPU-only serverless?

yeah I thought I tried it and it ran continously but let me look into it

23 replies

•Created by Stone Johnson on 6/23/2024 in #⚡｜serverless

Is there an equivalent of flash boot for CPU-only serverless?

Oh apologies I did not see the reply for ssome reason

23 replies

•Created by Stone Johnson on 12/21/2023 in #⚡｜serverless

Best Mixtral/LLaMA2 LLM for code-writing, inference, 24 to 48 GB?

Gonna try it! Do I understand that it is 7B size, so probably runs in 16 GB?

6 replies

•Created by Stone Johnson on 12/21/2023 in #⚡｜serverless

Best Mixtral/LLaMA2 LLM for code-writing, inference, 24 to 48 GB?

Will try it out

6 replies

•Created by Stone Johnson on 12/21/2023 in #⚡｜serverless

Best Mixtral/LLaMA2 LLM for code-writing, inference, 24 to 48 GB?

Thanks a ton!

6 replies