openmind Comments - Answer Overflow

openmind

Posts Comments

RRunPod

•Created by openmind on 12/16/2024 in #⚡｜serverless

How to deploy ModelsLab/Uncensored-llama3.1-nemotron?

tried 80gb, btw same issue regarding memory

11 replies

RRunPod

•Created by openmind on 12/16/2024 in #⚡｜serverless

How to deploy ModelsLab/Uncensored-llama3.1-nemotron?

"message":"engine.py :115 2024-12-16 19:24:32,150 Error initializing vLLM engine: CUDA out of memory. Tried to allocate 896.00 MiB. GPU 0 has a total capacity of 47.50 GiB of which 130.31 MiB is free. Process 4026461 has 47.37 GiB memory in use. Of the allocated memory 46.87 GiB is allocated by PyTorch, and 19.12 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)\n"

11 replies

RRunPod

•Created by openmind on 12/16/2024 in #⚡｜serverless

How to deploy ModelsLab/Uncensored-llama3.1-nemotron?

@haris , can you please check and give advice?

11 replies

Gaming

Programming