Mohamed Nagy Comments - Answer Overflow

Mohamed Nagy

Posts Comments

RRunPod

•Created by Mohamed Nagy on 1/26/2025 in #⚡｜serverless

How to respond to the requests at https://api.runpod.ai/v2/<YOUR ENDPOINT ID>/openai/v1

nice, I will test my instance with openai client requests

18 replies

RRunPod

•Created by Mohamed Nagy on 1/26/2025 in #⚡｜serverless

How to respond to the requests at https://api.runpod.ai/v2/<YOUR ENDPOINT ID>/openai/v1

nice, testing is very diffcult using runpod, for you furtur tests you could doas I do, I run the worker from a remote repo and change the remote repo code

18 replies

RRunPod

•Created by Mohamed Nagy on 1/26/2025 in #⚡｜serverless

How to respond to the requests at https://api.runpod.ai/v2/<YOUR ENDPOINT ID>/openai/v1

Nice, I will test it in the next version

18 replies

RRunPod

•Created by Mohamed Nagy on 1/26/2025 in #⚡｜serverless

How to respond to the requests at https://api.runpod.ai/v2/<YOUR ENDPOINT ID>/openai/v1

a CPU instance is it on RunPod cloud

18 replies

RRunPod

•Created by Mohamed Nagy on 1/26/2025 in #⚡｜serverless

How to respond to the requests at https://api.runpod.ai/v2/<YOUR ENDPOINT ID>/openai/v1

is it works wit you?

18 replies

RRunPod

•Created by Mohamed Nagy on 1/26/2025 in #⚡｜serverless

How to respond to the requests at https://api.runpod.ai/v2/<YOUR ENDPOINT ID>/openai/v1

I think its needs to route the /v1/chat/completion , waht do you think? I made a bunch of failed trials

18 replies

RRunPod

•Created by Mohamed Nagy on 1/26/2025 in #⚡｜serverless

How to respond to the requests at https://api.runpod.ai/v2/<YOUR ENDPOINT ID>/openai/v1

I am doing a dummy test because I am building my worker and want to use unsloth instead of vllm and transformers

18 replies

RRunPod

•Created by Arad on 10/29/2024 in #⚡｜serverless

Deploying bitsandbytes-quantized Models on RunPod Serverless using Custom Docker Image

yeah, [https://github.com/mohamednaji7/worker-vllm/tree/main] I add few line to complete the option of using bitsandbytes and this is a my merge request [https://github.com/runpod-workers/worker-vllm/pull/146]

26 replies

RRunPod

•Created by Arad on 10/29/2024 in #⚡｜serverless

Deploying bitsandbytes-quantized Models on RunPod Serverless using Custom Docker Image

https://github.com/runpod-workers/worker-vllm/issues/145

26 replies

RRunPod

•Created by Arad on 10/29/2024 in #⚡｜serverless

Deploying bitsandbytes-quantized Models on RunPod Serverless using Custom Docker Image

the param Load_Fromat support accept "BitsAndBytes" and if it set to "BitsAndBytes" then QUANTIZATION must be "bitsandbytes" ("None" will not work) the QUANTIZATION options are "None", "AWQ", "SqueezeLLM", "GPTQ" the error Here is the error: itsAndBytes load format and QLoRA adapter only support 'bitsandbytes' quantization

engine.py           :115  2025-01-21 11:18:49,916 Error initializing vLLM engine: BitsAndBytes load format and QLoRA adapter only support 'bitsandbytes' quantization, but got None

https://github.com/runpod-workers/worker-vllm/issues/99

26 replies

RRunPod

•Created by Arad on 10/29/2024 in #⚡｜serverless

Deploying bitsandbytes-quantized Models on RunPod Serverless using Custom Docker Image

I got the expected error

26 replies

RRunPod

•Created by Arad on 10/29/2024 in #⚡｜serverless

Deploying bitsandbytes-quantized Models on RunPod Serverless using Custom Docker Image

I will inform you

26 replies

RRunPod

•Created by Arad on 10/29/2024 in #⚡｜serverless

Deploying bitsandbytes-quantized Models on RunPod Serverless using Custom Docker Image

yes

26 replies

RRunPod

•Created by Arad on 10/29/2024 in #⚡｜serverless

Deploying bitsandbytes-quantized Models on RunPod Serverless using Custom Docker Image

this may work, I am going to test runpod-vllm-worker with LOAD_FORMAT it supports bitsandbytes hope the src/engine will load it, I think it will not because in the github repo they does not handle it fully like in this https://docs.vllm.ai/en/stable/quantization/bnb.html

26 replies

RRunPod

•Created by Arad on 10/29/2024 in #⚡｜serverless

Deploying bitsandbytes-quantized Models on RunPod Serverless using Custom Docker Image

in this worker-config.json

  "QUANTIZATION": {
    "env_var_name": "QUANTIZATION",
    "value": "",
    "title": "Quantization",
    "description": "Method used to quantize the weights.",
    "required": false,
    "type": "select",
    "options": [
      { "value": "None", "label": "None" },
      { "value": "awq", "label": "AWQ" },
      { "value": "squeezellm", "label": "SqueezeLLM" },
      { "value": "gptq", "label": "GPTQ" }
    ]
  },

  "QUANTIZATION": {
    "env_var_name": "QUANTIZATION",
    "value": "",
    "title": "Quantization",
    "description": "Method used to quantize the weights.",
    "required": false,
    "type": "select",
    "options": [
      { "value": "None", "label": "None" },
      { "value": "awq", "label": "AWQ" },
      { "value": "squeezellm", "label": "SqueezeLLM" },
      { "value": "gptq", "label": "GPTQ" }
    ]
  },

does not has bitsandbytes

26 replies

RRunPod

•Created by Arad on 10/29/2024 in #⚡｜serverless

Deploying bitsandbytes-quantized Models on RunPod Serverless using Custom Docker Image

yes

26 replies

RRunPod

•Created by Arad on 10/29/2024 in #⚡｜serverless

Deploying bitsandbytes-quantized Models on RunPod Serverless using Custom Docker Image

I fork the vllm-wrorker and change it to accept bitsandbytes

26 replies

RRunPod

•Created by Arad on 10/29/2024 in #⚡｜serverless

Deploying bitsandbytes-quantized Models on RunPod Serverless using Custom Docker Image

I tried this but, the vllm-worker checks the variable if its not one of the defined chocies

26 replies

RRunPod

•Created by annasuhstuff on 6/26/2024 in #⚡｜serverless

LoRA adapter on Runpod.io (using vLLM Worker)

How do we deploy this on a serverless endpoint?

22 replies

RRunPod

•Created by Arad on 10/29/2024 in #⚡｜serverless

Deploying bitsandbytes-quantized Models on RunPod Serverless using Custom Docker Image

Any updates!, I want to do the same thing with 3.3 version.

26 replies

Gaming

Programming