RunPod•5mo ago

Depoying a model which is quantised with bitsandbytes(model config).

I have fintuned a 7B model by quantising in my local machine with 12 GB of VRAM with my custom dataset. And As I went to deploy my model on runpod with vLLM for faster inference. I found only 3 types of quantised model being deployed there namely GPTQ,AWQ and Squeeze LLM. Is there anything I am interpreting wrong or Runpod don't have the feature to deploy model that way? For now is there any other workaround that I can do to deploy my model as of now?

2 Replies

Nova2k21OP•5mo ago

Here is the log file for a post request.

message.txt

nerdylive•5mo ago

Runpod can deploy them, just have to use the right tools, you'll have to create a custom worker, using code, and new docker image, but not vllm, since I think vllm doesn't support some quantization you mentioned above If the vllm do, use the right argument to specify the quantization

Gaming

Programming

Depoying a model which is quantised with bitsandbytes(model config).

Did you find this page helpful?