LoRA adapter on Runpod.io (using vLLM Worker)
Hi, I hope everyone is doing well. I'm reaching out to seek some insights or advice regarding an issue I'm encountering while attempting to deploy a serverless API endpoint on RunPod.io. The model in question has been adapted using a Lora adapter, and I seems like I am stuck because of missing configuration file.
However, the nature of the model's adaptation with the Lora adapter means that I don't have a traditional configuration file available. (see screenshot please)
Given the technical nature of this issue, I was hoping someone here might have encountered a similar situation or could offer guidance on how to proceed. Specifically, I'm looking for any advice on how to bypass the requirement for a config file in this context or if there's an alternative method of supplying the necessary configuration information to satisfy the deployment process.
![No description](https://answer-overflow-discord-attachments.s3.us-east-1.amazonaws.com/1255494613821558814/Screenshot_from_2024-06-26_14-15-12.png)
13 Replies
Hey how do you run the model?
using this
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM
config = PeftConfig.from_pretrained("alsokit/eLM-mini-4B-4K-4bit-v01")
base_model = AutoModelForCausalLM.from_pretrained("unsloth/phi-3-mini-4k-instruct-bnb-4bit")
model = PeftModel.from_pretrained(base_model, "alsokit/eLM-mini-4B-4K-4bit-v01")
Do you get any warnings for using that?
or maybe errors?
well, also have an error
ValueError: Can't find 'adapter_config.json' at 'alsokit/eLM-mini-4B-4K-4bit-v01'
try this # Example: Initialize LoRA config
lora_config = LoraConfig(init_lora_weights="gaussian", target_modules=["to_k", "to_q", "to_v", "to_out.0"])
im not sure if it works, or the configs
ow, it seems like it works after doing these steps
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps xformers "trl<0.9.0" peft accelerate bitsandbytes
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
"unsloth/mistral-7b-v0.3-bnb-4bit", # New Mistral v3 2x faster!
"unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
"unsloth/llama-3-8b-bnb-4bit", # Llama-3 15 trillion tokens model 2x faster!
"unsloth/llama-3-8b-Instruct-bnb-4bit",
"unsloth/llama-3-70b-bnb-4bit",
"unsloth/Phi-3-mini-4k-instruct", # Phi-3 2x faster!
"unsloth/Phi-3-medium-4k-instruct",
"unsloth/mistral-7b-bnb-4bit",
"unsloth/gemma-7b-bnb-4bit", # Gemma 2.2x faster!
] # More models at https://huggingface.co/unsloth
model, tokenizer = FastLanguageModel.from_pretrained(
# model_name = "unsloth/mistral-7b-v0.3", # Choose ANY! eg teknium/OpenHermes-2.5-Mistral-7B
model_name = "unsloth/Phi-3-mini-4k-instruct",
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in4bit,
# token = "hf...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)
Does Runpod serverless support LoRA adapter?
the problem is, i only have adapter_config
2024-06-26T12:49:19.602715611Z OSError: alsokit/eLM-mini-4B-4K-4bit-v01 does not appear to have a file named config.json. Checkout 'https://huggingface.co/alsokit/eLM-mini-4B-4K-4bit-v01/tree/main' for available files.
is there any way to set an endpoint using my model without config (cause i only have adapter_config) or i have to somehow change the model?
They sure do like other gpus do support I guess
With peft I think you need the config
Are there any ways to avoid this eror? (no config found)
Yes, with peft i need, but with other method i dont
so, to enable API endpoint, as I see, config is also a musthave?
I guess so, what's the other method?
Why not use that
above i have wrote the code which does not use peft
after merging adapter and base model, i have a config, but the model quality gets a lot worse
so i thought i could set API endpoint without config
Ahh yeah that's why maybe you should have the lora config hahah
Is the lora trained in the same base model?
yes