Stone Johnson
Stone Johnson
RRunPod
Created by Stone Johnson on 6/23/2024 in #⚡|serverless
Is there an equivalent of flash boot for CPU-only serverless?
I was trying to figure out if there was a way to have a CPU job only fire up when it was needed so it would not accrue charges when idle (like flash boot for GPU serverless) Thanks!
23 replies
RRunPod
Created by Stone Johnson on 12/21/2023 in #⚡|serverless
Best Mixtral/LLaMA2 LLM for code-writing, inference, 24 to 48 GB?
Good evening all you experts! I'm past the pain and suffering stage and into the finesse and finishing stage - what is the best class of models for doing basic inference and in particular formulating simple commands based on a set of simple rules, and which will fit into a 24 GB (or 48 GB if much better) runpod?
6 replies
RRunPod
Created by Stone Johnson on 12/17/2023 in #⚡|serverless
Can worker-vllm work with https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GPTQ
I have customers that want to use this and I think worker-vllm is the way to go with it. Hope it can work with worker-vllm! Link is https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GPTQ and I guess it works with text-generation-webui so why not worker-vllm
1 replies