Mohamed Nagy
Mohamed Nagy
RRunPod
Created by Arad on 10/29/2024 in #⚡|serverless
Deploying bitsandbytes-quantized Models on RunPod Serverless using Custom Docker Image
yeah, [https://github.com/mohamednaji7/worker-vllm/tree/main] I add few line to complete the option of using bitsandbytes and this is a my merge request [https://github.com/runpod-workers/worker-vllm/pull/146]
26 replies
RRunPod
Created by Arad on 10/29/2024 in #⚡|serverless
Deploying bitsandbytes-quantized Models on RunPod Serverless using Custom Docker Image
26 replies
RRunPod
Created by Arad on 10/29/2024 in #⚡|serverless
Deploying bitsandbytes-quantized Models on RunPod Serverless using Custom Docker Image
the param Load_Fromat support accept "BitsAndBytes" and if it set to "BitsAndBytes" then QUANTIZATION must be "bitsandbytes" ("None" will not work) the QUANTIZATION options are "None", "AWQ", "SqueezeLLM", "GPTQ" the error Here is the error: itsAndBytes load format and QLoRA adapter only support 'bitsandbytes' quantization engine.py :115 2025-01-21 11:18:49,916 Error initializing vLLM engine: BitsAndBytes load format and QLoRA adapter only support 'bitsandbytes' quantization, but got None https://github.com/runpod-workers/worker-vllm/issues/99
26 replies
RRunPod
Created by Arad on 10/29/2024 in #⚡|serverless
Deploying bitsandbytes-quantized Models on RunPod Serverless using Custom Docker Image
I got the expected error
26 replies
RRunPod
Created by Arad on 10/29/2024 in #⚡|serverless
Deploying bitsandbytes-quantized Models on RunPod Serverless using Custom Docker Image
I will inform you
26 replies
RRunPod
Created by Arad on 10/29/2024 in #⚡|serverless
Deploying bitsandbytes-quantized Models on RunPod Serverless using Custom Docker Image
yes
26 replies
RRunPod
Created by Arad on 10/29/2024 in #⚡|serverless
Deploying bitsandbytes-quantized Models on RunPod Serverless using Custom Docker Image
this may work, I am going to test runpod-vllm-worker with LOAD_FORMAT it supports bitsandbytes hope the src/engine will load it, I think it will not because in the github repo they does not handle it fully like in this https://docs.vllm.ai/en/stable/quantization/bnb.html
26 replies
RRunPod
Created by Arad on 10/29/2024 in #⚡|serverless
Deploying bitsandbytes-quantized Models on RunPod Serverless using Custom Docker Image
in this worker-config.json
"QUANTIZATION": {
"env_var_name": "QUANTIZATION",
"value": "",
"title": "Quantization",
"description": "Method used to quantize the weights.",
"required": false,
"type": "select",
"options": [
{ "value": "None", "label": "None" },
{ "value": "awq", "label": "AWQ" },
{ "value": "squeezellm", "label": "SqueezeLLM" },
{ "value": "gptq", "label": "GPTQ" }
]
},
"QUANTIZATION": {
"env_var_name": "QUANTIZATION",
"value": "",
"title": "Quantization",
"description": "Method used to quantize the weights.",
"required": false,
"type": "select",
"options": [
{ "value": "None", "label": "None" },
{ "value": "awq", "label": "AWQ" },
{ "value": "squeezellm", "label": "SqueezeLLM" },
{ "value": "gptq", "label": "GPTQ" }
]
},
does not has bitsandbytes
26 replies
RRunPod
Created by Arad on 10/29/2024 in #⚡|serverless
Deploying bitsandbytes-quantized Models on RunPod Serverless using Custom Docker Image
yes
26 replies
RRunPod
Created by Arad on 10/29/2024 in #⚡|serverless
Deploying bitsandbytes-quantized Models on RunPod Serverless using Custom Docker Image
I fork the vllm-wrorker and change it to accept bitsandbytes
26 replies
RRunPod
Created by Arad on 10/29/2024 in #⚡|serverless
Deploying bitsandbytes-quantized Models on RunPod Serverless using Custom Docker Image
I tried this but, the vllm-worker checks the variable if its not one of the defined chocies
26 replies
RRunPod
Created by annasuhstuff on 6/26/2024 in #⚡|serverless
LoRA adapter on Runpod.io (using vLLM Worker)
How do we deploy this on a serverless endpoint?
22 replies
RRunPod
Created by Arad on 10/29/2024 in #⚡|serverless
Deploying bitsandbytes-quantized Models on RunPod Serverless using Custom Docker Image
Any updates!, I want to do the same thing with 3.3 version.
26 replies