Suba
Suba
RRunPod
Created by octopus on 7/24/2024 in #⚡|serverless
Guide to deploy Llama 405B on Serverless?
@NERDDISCO pls let me know if ollama worker worked with 405
51 replies
RRunPod
Created by octopus on 7/24/2024 in #⚡|serverless
Guide to deploy Llama 405B on Serverless?
for 405
51 replies
RRunPod
Created by octopus on 7/24/2024 in #⚡|serverless
Guide to deploy Llama 405B on Serverless?
@nerdylive would like to know if you got any news on the vllm update
51 replies
RRunPod
Created by octopus on 7/24/2024 in #⚡|serverless
Guide to deploy Llama 405B on Serverless?
🙂
51 replies
RRunPod
Created by octopus on 7/24/2024 in #⚡|serverless
Guide to deploy Llama 405B on Serverless?
great, thank you very much for your time
51 replies
RRunPod
Created by octopus on 7/24/2024 in #⚡|serverless
Guide to deploy Llama 405B on Serverless?
ok.. is it done automatically or should we raise a ticket etc
51 replies
RRunPod
Created by octopus on 7/24/2024 in #⚡|serverless
Guide to deploy Llama 405B on Serverless?
ok got it, 405 is not in there
51 replies
RRunPod
Created by octopus on 7/24/2024 in #⚡|serverless
Guide to deploy Llama 405B on Serverless?
2024-07-24T04:42:22.063990694Z engine.py :110 2024-07-24 04:42:22,063 Error initializing vLLM engine: rope_scaling must be a dictionary with two fields, type and factor, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}
51 replies
RRunPod
Created by octopus on 7/24/2024 in #⚡|serverless
Guide to deploy Llama 405B on Serverless?
but the current vllm accepts only two params
51 replies
RRunPod
Created by octopus on 7/24/2024 in #⚡|serverless
Guide to deploy Llama 405B on Serverless?
llama 3.1 's config.json has lots of params under rope_scaling
51 replies
RRunPod
Created by octopus on 7/24/2024 in #⚡|serverless
Guide to deploy Llama 405B on Serverless?
No I get error related to rope_scaling
51 replies
RRunPod
Created by octopus on 7/24/2024 in #⚡|serverless
Guide to deploy Llama 405B on Serverless?
since I am using serverless I am unable to run any command
51 replies
RRunPod
Created by octopus on 7/24/2024 in #⚡|serverless
Guide to deploy Llama 405B on Serverless?
I am using runpod/worker-vllm:stable-cuda12.1.0
51 replies
RRunPod
Created by octopus on 7/24/2024 in #⚡|serverless
Guide to deploy Llama 405B on Serverless?
LlamaForCausalLM Llama 3.1, Llama 3, Llama 2, LLaMA, Yi meta-llama/Meta-Llama-3.1-405B-Instruct, meta-llama/Meta-Llama-3.1-70B, meta-llama/Meta-Llama-3-70B-Instruct, meta-llama/Llama-2-70b-hf, 01-ai/Yi-34B, etc.
51 replies
RRunPod
Created by octopus on 7/24/2024 in #⚡|serverless
Guide to deploy Llama 405B on Serverless?
looks like it supports
51 replies
RRunPod
Created by octopus on 7/24/2024 in #⚡|serverless
Guide to deploy Llama 405B on Serverless?
51 replies
RRunPod
Created by octopus on 7/24/2024 in #⚡|serverless
Guide to deploy Llama 405B on Serverless?
@nerdylive not sure about this, do we have a document or page that lists vllm's support for a model?
51 replies
RRunPod
Created by octopus on 7/24/2024 in #⚡|serverless
Guide to deploy Llama 405B on Serverless?
I tried several 405 B models in HF but get error related to rope_scaling. Looks like we need to modify it to null and try. To do this I need to download all files and upload again.
51 replies
RRunPod
Created by octopus on 7/24/2024 in #⚡|serverless
Guide to deploy Llama 405B on Serverless?
@octopus - you need to attach a network volume to the end point. The volume should have at least 1 TB space to hold the 405 B model (unless you are using quantized models). Then increase the number of workers to match the model gpu requirement (like 10 48 GB GPUs)
51 replies