deepseek-r is loading for >1h into vram.

Seems it is related to nmap on network drive. How do you solve it?
11 Replies
nerdylive
nerdylive2w ago
What error or what's the log did you get
yhlong00000
yhlong000002w ago
are you using our vllm with network volume? it might downloading the model which could take a while.
Igor Gulamov
Igor GulamovOP2w ago
I use my own docker container with sglang inside. For rocm you have only pytorch, no vllm or sglang. I use loaded to the disk model. There is no space to load the 2nd one: 670gb total. and it is downloading for 4 hours, not for 1
nerdylive
nerdylive2w ago
Okay, if it's loading, is there an oom error?
Igor Gulamov
Igor GulamovOP2w ago
loading safetensors checkpoint shards for 1 hour no oom, this is 8xMI300X I read on the github that it is often related on nmap over network drives, but not sure
nerdylive
nerdylive2w ago
Nmap? What's that What's taking so long, can you debug to see the progress ( loading)
Igor Gulamov
Igor GulamovOP2w ago
function, that reads .tensor and load it to gpu, takes extremally long time. mmap is mapping file as memory to load data directly from ssd to vram with no ram consumption
nerdylive
nerdylive2w ago
Oh Im not experienced on these model downloading area, so I don't know much Makes sense, people report network storage is slow to load big models
flash-singh
flash-singh2w ago
dont use network storage to load the models, instead move them to container disk or pod volume disk, see if that loads them any faster
Igor Gulamov
Igor GulamovOP2w ago
It is on pod volume disk
No description
Henky!!
Henky!!2w ago
6xA100 Q4_K_S GGUF is possible on https://koboldai.org/runpodcpp (if you adjust the container storage to 500GB), won't be sglang but it does have OpenAI API support so it should be easy to integrate with

Did you find this page helpful?