R
RunPod4d ago
avif

Trying to work with: llama3-70b-8192 and I get out of memory

Hi I am trying to work with the model: llama3-70b-8192 but I cant deploy my serverless endpoint because out of memory. I have attached image config screenshot. please reccoment on other settings to make it work [rank0]: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 896.00 MiB. GPU Thanks
No description
2 Replies
nerdylive
nerdylive4d ago
Hi is it a quantized model?
nerdylive
nerdylive4d ago
Recommended Storage Space: 392.19 GB and
No description
Want results from more Discord servers?
Add your server