annasuhstuff
annasuhstuff
RRunPod
Created by annasuhstuff on 6/27/2024 in #⚡|serverless
Quantization method
now I have this error
9 replies
RRunPod
Created by annasuhstuff on 6/27/2024 in #⚡|serverless
Quantization method
2024-06-27T10:50:05.563358317Z ValueError: Quantization method specified in the model config (bitsandbytes) does not match the quantization method specified in the quantization argument (gptq).
9 replies
RRunPod
Created by annasuhstuff on 6/27/2024 in #⚡|serverless
Quantization method
thank you so much, now i get it
9 replies
RRunPod
Created by annasuhstuff on 6/26/2024 in #⚡|serverless
LoRA adapter on Runpod.io (using vLLM Worker)
yes
21 replies
RRunPod
Created by annasuhstuff on 6/26/2024 in #⚡|serverless
LoRA adapter on Runpod.io (using vLLM Worker)
above i have wrote the code which does not use peft after merging adapter and base model, i have a config, but the model quality gets a lot worse so i thought i could set API endpoint without config
21 replies
RRunPod
Created by annasuhstuff on 6/26/2024 in #⚡|serverless
LoRA adapter on Runpod.io (using vLLM Worker)
Are there any ways to avoid this eror? (no config found) Yes, with peft i need, but with other method i dont so, to enable API endpoint, as I see, config is also a musthave?
21 replies
RRunPod
Created by annasuhstuff on 6/26/2024 in #⚡|serverless
LoRA adapter on Runpod.io (using vLLM Worker)
is there any way to set an endpoint using my model without config (cause i only have adapter_config) or i have to somehow change the model?
21 replies
RRunPod
Created by annasuhstuff on 6/26/2024 in #⚡|serverless
LoRA adapter on Runpod.io (using vLLM Worker)
2024-06-26T12:49:19.602715611Z OSError: alsokit/eLM-mini-4B-4K-4bit-v01 does not appear to have a file named config.json. Checkout 'https://huggingface.co/alsokit/eLM-mini-4B-4K-4bit-v01/tree/main' for available files.
21 replies
RRunPod
Created by annasuhstuff on 6/26/2024 in #⚡|serverless
LoRA adapter on Runpod.io (using vLLM Worker)
the problem is, i only have adapter_config
21 replies
RRunPod
Created by annasuhstuff on 6/26/2024 in #⚡|serverless
LoRA adapter on Runpod.io (using vLLM Worker)
ow, it seems like it works after doing these steps !pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" !pip install --no-deps xformers "trl<0.9.0" peft accelerate bitsandbytes from unsloth import FastLanguageModel import torch max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally! dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+ load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False. 4bit pre quantized models we support for 4x faster downloading + no OOMs. fourbit_models = [ "unsloth/mistral-7b-v0.3-bnb-4bit", # New Mistral v3 2x faster! "unsloth/mistral-7b-instruct-v0.3-bnb-4bit", "unsloth/llama-3-8b-bnb-4bit", # Llama-3 15 trillion tokens model 2x faster! "unsloth/llama-3-8b-Instruct-bnb-4bit", "unsloth/llama-3-70b-bnb-4bit", "unsloth/Phi-3-mini-4k-instruct", # Phi-3 2x faster! "unsloth/Phi-3-medium-4k-instruct", "unsloth/mistral-7b-bnb-4bit", "unsloth/gemma-7b-bnb-4bit", # Gemma 2.2x faster! ] # More models at https://huggingface.co/unsloth model, tokenizer = FastLanguageModel.from_pretrained( # model_name = "unsloth/mistral-7b-v0.3", # Choose ANY! eg teknium/OpenHermes-2.5-Mistral-7B model_name = "unsloth/Phi-3-mini-4k-instruct", max_seq_length = max_seq_length, dtype = dtype, load_in_4bit = load_in4bit, # token = "hf...", # use one if using gated models like meta-llama/Llama-2-7b-hf ) Does Runpod serverless support LoRA adapter?
21 replies
RRunPod
Created by annasuhstuff on 6/26/2024 in #⚡|serverless
LoRA adapter on Runpod.io (using vLLM Worker)
well, also have an error ValueError: Can't find 'adapter_config.json' at 'alsokit/eLM-mini-4B-4K-4bit-v01'
21 replies
RRunPod
Created by annasuhstuff on 6/26/2024 in #⚡|serverless
LoRA adapter on Runpod.io (using vLLM Worker)
using this from peft import PeftModel, PeftConfig from transformers import AutoModelForCausalLM config = PeftConfig.from_pretrained("alsokit/eLM-mini-4B-4K-4bit-v01") base_model = AutoModelForCausalLM.from_pretrained("unsloth/phi-3-mini-4k-instruct-bnb-4bit") model = PeftModel.from_pretrained(base_model, "alsokit/eLM-mini-4B-4K-4bit-v01")
21 replies
RRunPod
Created by annasuhstuff on 6/25/2024 in #⚡|serverless
No config error /
When I create an endpoint, i just pass my hj token in environment variables HF_TOKEN : hf_blablabla (i copy the token from my hf account)
5 replies