codyman4488
RRunPod
•Created by codyman4488 on 3/4/2025 in #⚡|serverless
how to run a quantized model on server less? I'd like to run the 4/8 bit version of this model:
or do we need to set env vars to use a quantized model like this ?
5 replies
RRunPod
•Created by codyman4488 on 3/4/2025 in #⚡|serverless
how to run a quantized model on server less? I'd like to run the 4/8 bit version of this model:
understood, so this should run out of box?
https://huggingface.co/neuralmagic/DeepSeek-R1-Distill-Qwen-32B-FP8-dynamic
5 replies