Ram issue

Hello guys, I am running the setup on the attached picture. The image I am trying to pull is cognitivecomputations/dolphin-2.9.2-qwen2-7b from huggingface. Even though I have a lot of RAM, I am getting this error: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 9.25 GiB. GPU How can I overcome this ? I want to jump from 7b to 70b. According to "can you run it" it should be running fine on this setup - attaching image.
No description
No description
9 Replies
Madiator2011
Madiator20113w ago
looks like your app is not using all GPU's
gdimanov
gdimanov3w ago
The app also seems to be restarting non stop - maybe due to the error. I have "start container" message each 20 seconds.
Madiator2011
Madiator20113w ago
what do you see in container logs and also what is you start command?
gdimanov
gdimanov3w ago
start command:
No description
No description
No description
Madiator2011
Madiator20113w ago
oh nvm it's using one gpu so based on docs you need add --tensor-parallel-size (number of gpu's)
gdimanov
gdimanov3w ago
No description
gdimanov
gdimanov3w ago
"--host 0.0.0.0 --model cognitivecomputations/dolphin-2.9.2-qwen2-7b --tensor-parallel-size 3"
nerdylive
nerdylive3w ago
Try another value
gdimanov
gdimanov3w ago
something new : "The model's max seq len (131072) is larger than the maximum number of tokens that can be stored in KV cache (74192). Try increasing gpu_memory_utilization or decreasing max_model_len when initializing the engine" could you refence me the documentation from where you got the "--tensor-parallel-size " Okay, managed to get it working, ty guys, had to restrict --max-model-len