Answer Overflow Logo
Change Theme
Search Answer Overflow
GitHub
Add Your Server
Login
Home
Popular
Topics
Gaming
Programming
gdimanov
Posts
Comments
R
RunPod
•
Created by gdimanov on 6/16/2024 in
#⛅|pods
Ram issue
Okay, managed to get it working, ty guys, had to restrict --max-model-len
14 replies
R
RunPod
•
Created by gdimanov on 6/16/2024 in
#⛅|pods
Ram issue
could you refence me the documentation from where you got the "--tensor-parallel-size "
14 replies
R
RunPod
•
Created by gdimanov on 6/16/2024 in
#⛅|pods
Ram issue
something new : "The model's max seq len (131072) is larger than the maximum number of tokens that can be stored in KV cache (74192). Try increasing
gpu_memory_utilization
or decreasing
max_model_len
when initializing the engine"
14 replies
R
RunPod
•
Created by gdimanov on 6/16/2024 in
#⛅|pods
Ram issue
"--host 0.0.0.0 --model cognitivecomputations/dolphin-2.9.2-qwen2-7b --tensor-parallel-size 3"
14 replies
R
RunPod
•
Created by gdimanov on 6/16/2024 in
#⛅|pods
Ram issue
14 replies
R
RunPod
•
Created by gdimanov on 6/16/2024 in
#⛅|pods
Ram issue
14 replies
R
RunPod
•
Created by gdimanov on 6/16/2024 in
#⛅|pods
Ram issue
The app also seems to be restarting non stop - maybe due to the error. I have "start container" message each 20 seconds.
14 replies