GromitInWA
RRunPod
•Created by Bj9000 on 1/27/2025 in #⚡|serverless
Serveless quants
I have the same question. Now that VLLM supports quants, I'm wondering if there's a way to specify it through an environment variable. Also, I'm not sure what format to use for the tokenizer path - is it the full path or just the top-level HF repo for the original model?
4 replies