Stone Johnson Posts - Answer Overflow

Stone Johnson

•Created by Stone Johnson on 6/23/2024 in #⚡｜serverless

Is there an equivalent of flash boot for CPU-only serverless?

I was trying to figure out if there was a way to have a CPU job only fire up when it was needed so it would not accrue charges when idle (like flash boot for GPU serverless) Thanks!

23 replies

RRunPod

•Created by Stone Johnson on 12/21/2023 in #⚡｜serverless

Best Mixtral/LLaMA2 LLM for code-writing, inference, 24 to 48 GB?

Good evening all you experts! I'm past the pain and suffering stage and into the finesse and finishing stage - what is the best class of models for doing basic inference and in particular formulating simple commands based on a set of simple rules, and which will fit into a 24 GB (or 48 GB if much better) runpod?

6 replies

RRunPod

•Created by Stone Johnson on 12/17/2023 in #⚡｜serverless

Can worker-vllm work with https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GPTQ

I have customers that want to use this and I think worker-vllm is the way to go with it. Hope it can work with worker-vllm! Link is https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GPTQ and I guess it works with text-generation-webui so why not worker-vllm

1 replies

Gaming

Programming