R
RunPodβ€’4w ago
Arad

Deploying bitsandbytes-quantized Models on RunPod Serverless using Custom Docker Image

Hey everyone πŸ‘‹ Looking for tips from anyone who's worked with bitsandbytes-quantized models on RunPod's serverless setup. It's not available out of the box with vLLM, and I was wondering if anyone's got it working? Saw a post in the serverless forum about maybe using a custom Docker image for this. For context: I've fine-tuned LLaMA-3.1 70B-instruct using the unsloth library (which utilizes bitsandbytes for quantization) and am looking to deploy it. Any insights would be greatly appreciated! πŸ™
1 Reply
nerdylive
nerdyliveβ€’4w ago
im not sure if theres a way but if theres a way maybe you can unquantize it somehow? or convert it to another format that it supports
Want results from more Discord servers?
Add your server