LoRA adapter on Runpod.io (using vLLM Worker)

Hi, I hope everyone is doing well. I'm reaching out to seek some insights or advice regarding an issue I'm encountering while attempting to deploy a serverless API endpoint on RunPod.io. The model in question has been adapted using a Lora adapter, and I seems like I am stuck because of missing configuration file. However, the nature of the model's adaptation with the Lora adapter means that I don't have a traditional configuration file available. (see screenshot please) Given the technical nature of this issue, I was hoping someone here might have encountered a similar situation or could offer guidance on how to proceed. Specifically, I'm looking for any advice on how to bypass the requirement for a config file in this context or if there's an alternative method of supplying the necessary configuration information to satisfy the deployment process.
No description
13 Replies
nerdylive
nerdylive7d ago
Hey how do you run the model?
annasuhstuff
annasuhstuff6d ago
using this from peft import PeftModel, PeftConfig from transformers import AutoModelForCausalLM config = PeftConfig.from_pretrained("alsokit/eLM-mini-4B-4K-4bit-v01") base_model = AutoModelForCausalLM.from_pretrained("unsloth/phi-3-mini-4k-instruct-bnb-4bit") model = PeftModel.from_pretrained(base_model, "alsokit/eLM-mini-4B-4K-4bit-v01")
nerdylive
nerdylive6d ago
Do you get any warnings for using that? or maybe errors?
annasuhstuff
annasuhstuff6d ago
well, also have an error ValueError: Can't find 'adapter_config.json' at 'alsokit/eLM-mini-4B-4K-4bit-v01'
nerdylive
nerdylive6d ago
try this # Example: Initialize LoRA config lora_config = LoraConfig(init_lora_weights="gaussian", target_modules=["to_k", "to_q", "to_v", "to_out.0"]) im not sure if it works, or the configs
annasuhstuff
annasuhstuff6d ago
ow, it seems like it works after doing these steps !pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" !pip install --no-deps xformers "trl<0.9.0" peft accelerate bitsandbytes from unsloth import FastLanguageModel import torch max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally! dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+ load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False. 4bit pre quantized models we support for 4x faster downloading + no OOMs. fourbit_models = [ "unsloth/mistral-7b-v0.3-bnb-4bit", # New Mistral v3 2x faster! "unsloth/mistral-7b-instruct-v0.3-bnb-4bit", "unsloth/llama-3-8b-bnb-4bit", # Llama-3 15 trillion tokens model 2x faster! "unsloth/llama-3-8b-Instruct-bnb-4bit", "unsloth/llama-3-70b-bnb-4bit", "unsloth/Phi-3-mini-4k-instruct", # Phi-3 2x faster! "unsloth/Phi-3-medium-4k-instruct", "unsloth/mistral-7b-bnb-4bit", "unsloth/gemma-7b-bnb-4bit", # Gemma 2.2x faster! ] # More models at https://huggingface.co/unsloth model, tokenizer = FastLanguageModel.from_pretrained( # model_name = "unsloth/mistral-7b-v0.3", # Choose ANY! eg teknium/OpenHermes-2.5-Mistral-7B model_name = "unsloth/Phi-3-mini-4k-instruct", max_seq_length = max_seq_length, dtype = dtype, load_in_4bit = load_in4bit, # token = "hf...", # use one if using gated models like meta-llama/Llama-2-7b-hf ) Does Runpod serverless support LoRA adapter?
annasuhstuff
annasuhstuff6d ago
the problem is, i only have adapter_config 2024-06-26T12:49:19.602715611Z OSError: alsokit/eLM-mini-4B-4K-4bit-v01 does not appear to have a file named config.json. Checkout 'https://huggingface.co/alsokit/eLM-mini-4B-4K-4bit-v01/tree/main' for available files. is there any way to set an endpoint using my model without config (cause i only have adapter_config) or i have to somehow change the model?
nerdylive
nerdylive6d ago
They sure do like other gpus do support I guess With peft I think you need the config
annasuhstuff
annasuhstuff6d ago
Are there any ways to avoid this eror? (no config found) Yes, with peft i need, but with other method i dont so, to enable API endpoint, as I see, config is also a musthave?
nerdylive
nerdylive6d ago
I guess so, what's the other method? Why not use that
annasuhstuff
annasuhstuff6d ago
above i have wrote the code which does not use peft after merging adapter and base model, i have a config, but the model quality gets a lot worse so i thought i could set API endpoint without config
nerdylive
nerdylive6d ago
Ahh yeah that's why maybe you should have the lora config hahah Is the lora trained in the same base model?
annasuhstuff
annasuhstuff6d ago
yes