artbred
RRunPod
•Created by artbred on 9/15/2024 in #⚡|serverless
GGUF vllm
It seems that the newest version of vllm's supports gguf models, have anyone figured out how to make this work in runpod serverless? Seems like need to set some custom ENV vars, or maybe anyone knows a way to convert gguf back to safetensors?
3 replies