R
RunPod4mo ago
ditti

Terminating local vLLM process while loading safetensor checkpoints

I started using Llama 3.1 70B as a serverless function recently. I got it to work, and the setup is rather simple: 2 x A100 GPUs 200GB Network volume 200GB Container storage Model https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct The serverless function started successfully a few times, but(!) there are two recurring issues, and the first probably leads to the second. Issue #1: Loading the safetensors checkpoint is almost always very slow (> 15s/it) Issue #2: The container is terminated before it loads all the checkpoints, and is basically restarted in a loop for no given reason
2024-09-11 14:08:47.002 | info | 52g6dgmu0eudyb | (VllmWorkerProcess pid=148) INFO 09-11 13:08:37 utils.py:841] Found nccl from library libnccl.so.2\n
2024-09-11 14:08:47.002 | info | 52g6dgmu0eudyb | (VllmWorkerProcess pid=148) INFO 09-11 13:08:37 multiproc_worker_utils.py:215] Worker ready; awaiting tasks\n
2024-09-11 14:08:01.350 | info | 52g6dgmu0eudyb | \rLoading safetensors checkpoint shards: 80% Completed | 24/30 [06:17<01:38, 16.35s/it]\n
2024-09-11 14:07:59.935 | info | 52g6dgmu0eudyb | INFO 09-11 13:07:59 multiproc_worker_utils.py:136] Terminating local vLLM worker processes\n
2024-09-11 14:07:42.177 | info | 52g6dgmu0eudyb | \rLoading safetensors checkpoint shards: 77% Completed | 23/30 [05:57<01:46, 15.15s/it]\n
2024-09-11 14:07:28.855 | info | 52g6dgmu0eudyb | \rLoading safetensors checkpoint shards: 73% Completed | 22/30 [05:44<02:07, 15.93s/it]\n
2024-09-11 14:08:47.002 | info | 52g6dgmu0eudyb | (VllmWorkerProcess pid=148) INFO 09-11 13:08:37 utils.py:841] Found nccl from library libnccl.so.2\n
2024-09-11 14:08:47.002 | info | 52g6dgmu0eudyb | (VllmWorkerProcess pid=148) INFO 09-11 13:08:37 multiproc_worker_utils.py:215] Worker ready; awaiting tasks\n
2024-09-11 14:08:01.350 | info | 52g6dgmu0eudyb | \rLoading safetensors checkpoint shards: 80% Completed | 24/30 [06:17<01:38, 16.35s/it]\n
2024-09-11 14:07:59.935 | info | 52g6dgmu0eudyb | INFO 09-11 13:07:59 multiproc_worker_utils.py:136] Terminating local vLLM worker processes\n
2024-09-11 14:07:42.177 | info | 52g6dgmu0eudyb | \rLoading safetensors checkpoint shards: 77% Completed | 23/30 [05:57<01:46, 15.15s/it]\n
2024-09-11 14:07:28.855 | info | 52g6dgmu0eudyb | \rLoading safetensors checkpoint shards: 73% Completed | 22/30 [05:44<02:07, 15.93s/it]\n
Any ideas why that could happen?
9 Replies
octopus
octopus3w ago
@ditti were you able to find a solution to this?
yhlong00000
yhlong000003w ago
It might caused by network volume.
octopus
octopus3w ago
you mean cuz we have network volume attached?
yhlong00000
yhlong000003w ago
If you have your model stored in network volume, it might be slow when cold start and occasionally might cause it failed to start the worker
Titanium-Monkey
Was able to get this to work locally without network volume - did not test with it. But I had to use 4x48GB GPU's as it kept crashing on 2x80GB GPUs
No description
No description
No description
nerdylive
nerdylive3w ago
Soo is there a way around if the model is large? We couldn't just build an image in it with llm and upload it to registry
yhlong00000
yhlong000003w ago
we will release a hugging face model cache soon and that should help in this case.
nerdylive
nerdylive3w ago
Ooh ya nice
Want results from more Discord servers?
Add your server