RunPod•10mo ago

LoRA adapter on Runpod.io (using vLLM Worker)

Hi, I hope everyone is doing well. I'm reaching out to seek some insights or advice regarding an issue I'm encountering while attempting to deploy a serverless API endpoint on RunPod.io. The model in question has been adapted using a Lora adapter, and I seems like I am stuck because of missing configuration file. However, the nature of the model's adaptation with the Lora adapter means that I don't have a traditional configuration file available. (see screenshot please) Given the technical nature of this issue, I was hoping someone here might have encountered a similar situation or could offer guidance on how to proceed. Specifically, I'm looking for any advice on how to bypass the requirement for a config file in this context or if there's an alternative method of supplying the necessary configuration information to satisfy the deployment process.

14 Replies

Jason•10mo ago

Hey how do you run the model?

annasuhstuffOP•10mo ago

using this from peft import PeftModel, PeftConfig from transformers import AutoModelForCausalLM config = PeftConfig.from_pretrained("alsokit/eLM-mini-4B-4K-4bit-v01") base_model = AutoModelForCausalLM.from_pretrained("unsloth/phi-3-mini-4k-instruct-bnb-4bit") model = PeftModel.from_pretrained(base_model, "alsokit/eLM-mini-4B-4K-4bit-v01")

Jason•10mo ago

Do you get any warnings for using that? or maybe errors?

annasuhstuffOP•10mo ago

well, also have an error ValueError: Can't find 'adapter_config.json' at 'alsokit/eLM-mini-4B-4K-4bit-v01'

Jason•10mo ago

try this # Example: Initialize LoRA config lora_config = LoraConfig(init_lora_weights="gaussian", target_modules=["to_k", "to_q", "to_v", "to_out.0"]) im not sure if it works, or the configs

annasuhstuffOP•10mo ago

ow, it seems like it works after doing these steps !pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" !pip install --no-deps xformers "trl<0.9.0" peft accelerate bitsandbytes from unsloth import FastLanguageModel import torch max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally! dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+ load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False. 4bit pre quantized models we support for 4x faster downloading + no OOMs. fourbit_models = [ "unsloth/mistral-7b-v0.3-bnb-4bit", # New Mistral v3 2x faster! "unsloth/mistral-7b-instruct-v0.3-bnb-4bit", "unsloth/llama-3-8b-bnb-4bit", # Llama-3 15 trillion tokens model 2x faster! "unsloth/llama-3-8b-Instruct-bnb-4bit", "unsloth/llama-3-70b-bnb-4bit", "unsloth/Phi-3-mini-4k-instruct", # Phi-3 2x faster! "unsloth/Phi-3-medium-4k-instruct", "unsloth/mistral-7b-bnb-4bit", "unsloth/gemma-7b-bnb-4bit", # Gemma 2.2x faster! ] # More models at https://huggingface.co/unsloth model, tokenizer = FastLanguageModel.from_pretrained( # model_name = "unsloth/mistral-7b-v0.3", # Choose ANY! eg teknium/OpenHermes-2.5-Mistral-7B model_name = "unsloth/Phi-3-mini-4k-instruct", max_seq_length = max_seq_length, dtype = dtype, load_in_4bit = load_in4bit, # token = "hf...", # use one if using gated models like meta-llama/Llama-2-7b-hf ) Does Runpod serverless support LoRA adapter?

unsloth (Unsloth AI)

annasuhstuffOP•10mo ago

the problem is, i only have adapter_config 2024-06-26T12:49:19.602715611Z OSError: alsokit/eLM-mini-4B-4K-4bit-v01 does not appear to have a file named config.json. Checkout 'https://huggingface.co/alsokit/eLM-mini-4B-4K-4bit-v01/tree/main' for available files. is there any way to set an endpoint using my model without config (cause i only have adapter_config) or i have to somehow change the model?

Jason•10mo ago

They sure do like other gpus do support I guess With peft I think you need the config

annasuhstuffOP•10mo ago

Are there any ways to avoid this eror? (no config found) Yes, with peft i need, but with other method i dont so, to enable API endpoint, as I see, config is also a musthave?

Jason•10mo ago

I guess so, what's the other method? Why not use that

annasuhstuffOP•10mo ago

above i have wrote the code which does not use peft after merging adapter and base model, i have a config, but the model quality gets a lot worse so i thought i could set API endpoint without config

Jason•10mo ago

Ahh yeah that's why maybe you should have the lora config hahah Is the lora trained in the same base model?

annasuhstuffOP•10mo ago

yes

Mohamed Nagy•3mo ago

How do we deploy this on a serverless endpoint?

Gaming

Programming

LoRA adapter on Runpod.io (using vLLM Worker)

Did you find this page helpful?