Trying to work with: llama3-70b-8192 and I get out of memory
Hi
I am trying to work with the model:
llama3-70b-8192
but I cant deploy my serverless endpoint because out of memory.
I have attached image config screenshot. please reccoment on other settings to make it work
[rank0]: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 896.00 MiB. GPU
Thanks2 Replies