Sal ✨
Sal ✨
RRunPod
Created by Sal ✨ on 10/7/2024 in #⛅|pods
Runpod VLLM - How to use GGUF with VLLM
I have this repo mradermacher/Llama-3.1-8B-Stheno-v3.4-i1-GGUF and I use this command "--host 0.0.0.0 --port 8000 --max-model-len 37472 --model mradermacher/Llama-3.1-8B-Stheno-v3.4-i1-GGUF --dtype bfloat16 --gpu-memory-utilization 0.95 --quantization gguf" but it doesn't work... It say "2024-10-07T20:39:24.964316283Z ValueError: No supported config format found in mradermacher/Llama-3.1-8B-Stheno-v3.4-i1-GGUF" I don't have this problem with normal models, only with quantized one...
11 replies