openmind
openmind
RRunPod
Created by openmind on 12/16/2024 in #⚡|serverless
How to deploy ModelsLab/Uncensored-llama3.1-nemotron?
tried 80gb, btw same issue regarding memory
11 replies
RRunPod
Created by openmind on 12/16/2024 in #⚡|serverless
How to deploy ModelsLab/Uncensored-llama3.1-nemotron?
"message":"engine.py :115 2024-12-16 19:24:32,150 Error initializing vLLM engine: CUDA out of memory. Tried to allocate 896.00 MiB. GPU 0 has a total capacity of 47.50 GiB of which 130.31 MiB is free. Process 4026461 has 47.37 GiB memory in use. Of the allocated memory 46.87 GiB is allocated by PyTorch, and 19.12 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)\n"
11 replies
RRunPod
Created by openmind on 12/16/2024 in #⚡|serverless
How to deploy ModelsLab/Uncensored-llama3.1-nemotron?
@haris , can you please check and give advice?
11 replies