Stone Johnson
RRunPod
•Created by Stone Johnson on 6/23/2024 in #⚡|serverless
Is there an equivalent of flash boot for CPU-only serverless?
any ideas?
23 replies
RRunPod
•Created by Stone Johnson on 6/23/2024 in #⚡|serverless
Is there an equivalent of flash boot for CPU-only serverless?
a typical job takes 2 sec of exec time on either GPU or CPU cloud (I'm not using the GPU at all as far as I can tell, no CUDA in environment)
23 replies
RRunPod
•Created by Stone Johnson on 6/23/2024 in #⚡|serverless
Is there an equivalent of flash boot for CPU-only serverless?
running a bash script with a couple of executables, very boring (sox, lame and piper, voice processing stuff)
23 replies
RRunPod
•Created by Stone Johnson on 6/23/2024 in #⚡|serverless
Is there an equivalent of flash boot for CPU-only serverless?
container is 8 GB, py is very close to https://blog.runpod.io/serverless-create-a-basic-api/
23 replies
RRunPod
•Created by Stone Johnson on 6/23/2024 in #⚡|serverless
Is there an equivalent of flash boot for CPU-only serverless?
way simple
23 replies
RRunPod
•Created by Stone Johnson on 6/23/2024 in #⚡|serverless
Is there an equivalent of flash boot for CPU-only serverless?
ok same container as for GPU. uses Jason's simple py handler for api calls, on CPU cloud
23 replies
RRunPod
•Created by Stone Johnson on 6/23/2024 in #⚡|serverless
Is there an equivalent of flash boot for CPU-only serverless?
warm start on CPU cloud inexplicably 10x slower
23 replies
RRunPod
•Created by Stone Johnson on 6/23/2024 in #⚡|serverless
Is there an equivalent of flash boot for CPU-only serverless?
falsh boot on GPU cloud is truly incredble
23 replies
RRunPod
•Created by Stone Johnson on 6/23/2024 in #⚡|serverless
Is there an equivalent of flash boot for CPU-only serverless?
for my app turnaround time is key
23 replies
RRunPod
•Created by Stone Johnson on 6/23/2024 in #⚡|serverless
Is there an equivalent of flash boot for CPU-only serverless?
low so far but how come a huge LLM package can flash boot to GPU in 0.5 sec, but a dinky 8 GB container takes 6 sec of delay on a CPU?
23 replies
RRunPod
•Created by Stone Johnson on 6/23/2024 in #⚡|serverless
Is there an equivalent of flash boot for CPU-only serverless?
OK it's working fine - my question is - for GPU+FlashBoot the initial delay time is less than one second - for CPU it is more than 6 seconds! Is there a way to reduce the CPU initial wait time? (same container, same request, the container has no CUDA so runs fine on CPU, just has that long initial delay)
23 replies
RRunPod
•Created by Stone Johnson on 6/23/2024 in #⚡|serverless
Is there an equivalent of flash boot for CPU-only serverless?
yeah I thought I tried it and it ran continously but let me look into it
23 replies
RRunPod
•Created by Stone Johnson on 6/23/2024 in #⚡|serverless
Is there an equivalent of flash boot for CPU-only serverless?
Oh apologies I did not see the reply for ssome reason
23 replies
RRunPod
•Created by Stone Johnson on 12/21/2023 in #⚡|serverless
Best Mixtral/LLaMA2 LLM for code-writing, inference, 24 to 48 GB?
Gonna try it! Do I understand that it is 7B size, so probably runs in 16 GB?
6 replies
RRunPod
•Created by Stone Johnson on 12/21/2023 in #⚡|serverless
Best Mixtral/LLaMA2 LLM for code-writing, inference, 24 to 48 GB?
Will try it out
6 replies
RRunPod
•Created by Stone Johnson on 12/21/2023 in #⚡|serverless
Best Mixtral/LLaMA2 LLM for code-writing, inference, 24 to 48 GB?
Thanks a ton!
6 replies