RunPod•4mo ago

How to deploy ModelsLab/Uncensored-llama3.1-nemotron?

I have tried to deploy this model https://huggingface.co/ModelsLab/Uncensored-llama3.1-nemotron Btw I am facing cude memory issue(I have tried 24gb, 48gb), it does not work, how to fix?

ModelsLab/Uncensored-llama3.1-nemotron · Hugging Face

9 Replies

openmindOP•4mo ago

@haris , can you please check and give advice?

openmindOP•4mo ago

"message":"engine.py :115 2024-12-16 19:24:32,150 Error initializing vLLM engine: CUDA out of memory. Tried to allocate 896.00 MiB. GPU 0 has a total capacity of 47.50 GiB of which 130.31 MiB is free. Process 4026461 has 47.37 GiB memory in use. Of the allocated memory 46.87 GiB is allocated by PyTorch, and 19.12 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)\n"

CUDA semantics — PyTorch 2.5 documentation

A guide to torch.cuda, a PyTorch module to run CUDA operations