vaventt
Does GPU Cloud is suitable for deploying LLM or only for training?
One more question, I believe you do have experience with HF transformers and LLMs, do you know what command to put into Dockerfile in order to get weights pre-downloaded while building Docker image, so than they can be later in the endpoint loaded with .from_pretrained? Or I'm looking into wrong side?
I thought just downloading model repo and than using .from_pretrained method to load weights from local folder, but looks like they have different extensions or what, but it don't work and I still haven't found an reliable solution(
RUN git clone https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
model_path = "mistralai/Mistral-7B-Instruct-v0.2"
model = AutoModelForCausalLM.from_pretrained(
f"./{model_path.split('/')[1]}/",
local_files_only=True)
And getting error SafetensorError: Error while deserializing header: HeaderTooLarge
26 replies
Does GPU Cloud is suitable for deploying LLM or only for training?
Awesome, thanks, than this is the way to go, one more question about FlashBoost, should I always use it? In order to reduce cold starts to 2s, even for big 70B LLM, or it has some restrictions and possible issues?
26 replies
RRunPod
•Created by wizardjoe on 1/4/2024 in #⚡|serverless
Error building worker-vllm docker image for mixtral 8x7b
Great, thanks a lot, btw I'm located in Eastern Europe, how to choose the best region for me, to open my network storage? By distance, EU-RO-1 and EU-CZ-1 should be the closest, but maybe some regions have more GPUs in general to choose and to work with?
69 replies
RRunPod
•Created by wizardjoe on 1/4/2024 in #⚡|serverless
Error building worker-vllm docker image for mixtral 8x7b
So network storage for my workers will definitely help and will be a good deployment practice, in terms of using RunPod?
69 replies
RRunPod
•Created by wizardjoe on 1/4/2024 in #⚡|serverless
Error building worker-vllm docker image for mixtral 8x7b
I have only 1
rp_handler.py
file where all the code are located, it starts after my last command in Docker Container, after it triggers handler function, on line 35 my hugging face weights are starting to download:
AutoModelForCausalLM.from_pretrained(
model_path,
trust_remote_code=True,
device_map="auto",
torch_dtype=torch.float16,
quantization_config=quantization_config
)`
Sometimes it works from the first time, but specifficaly with big LLaMa which weights are downloading up to 1hour in my case it without throwing an error stops container, while there is still 1 queue, small Mistral-7b usually works great, but when I take bigger model it just don't work69 replies
RRunPod
•Created by wizardjoe on 1/4/2024 in #⚡|serverless
Error building worker-vllm docker image for mixtral 8x7b
Hi Justin, can you help me with few questions, I need to develop and deploy a RAG system based on open-source LLM, I have tried several times, RunPod Serverless A6000/A100, it starts worker and container, than later it can download 50gb of weights or 150gb, but not all 270gb and just stops and restarts to download the weights again and again burning only money, but no real outcome, I just can't deploy LLaMa-70B, RunPod don't gives me a chance, what should I do? Is Cloud GPU option more suitable and stable for production than Serverless?
69 replies