ditti Posts - Answer Overflow

ditti

•Created by ditti on 9/11/2024 in #⚡｜serverless

Terminating local vLLM process while loading safetensor checkpoints

I started using Llama 3.1 70B as a serverless function recently. I got it to work, and the setup is rather simple: 2 x A100 GPUs 200GB Network volume 200GB Container storage Model https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct The serverless function started successfully a few times, but(!) there are two recurring issues, and the first probably leads to the second. Issue #1: Loading the safetensors checkpoint is almost always very slow (> 15s/it) Issue #2: The container is terminated before it loads all the checkpoints, and is basically restarted in a loop for no given reason

2024-09-11  14:08:47.002 | info | 52g6dgmu0eudyb | [1;36m(VllmWorkerProcess pid=148)[0;0m INFO 09-11 13:08:37 utils.py:841] Found nccl from library libnccl.so.2\n
2024-09-11  14:08:47.002 | info | 52g6dgmu0eudyb | [1;36m(VllmWorkerProcess pid=148)[0;0m INFO 09-11 13:08:37 multiproc_worker_utils.py:215] Worker ready; awaiting tasks\n
2024-09-11  14:08:01.350 | info | 52g6dgmu0eudyb | \rLoading safetensors checkpoint shards:  80% Completed | 24/30 [06:17<01:38, 16.35s/it]\n
2024-09-11  14:07:59.935 | info | 52g6dgmu0eudyb | INFO 09-11 13:07:59 multiproc_worker_utils.py:136] Terminating local vLLM worker processes\n
2024-09-11  14:07:42.177 | info | 52g6dgmu0eudyb | \rLoading safetensors checkpoint shards:  77% Completed | 23/30 [05:57<01:46, 15.15s/it]\n
2024-09-11  14:07:28.855 | info | 52g6dgmu0eudyb | \rLoading safetensors checkpoint shards:  73% Completed | 22/30 [05:44<02:07, 15.93s/it]\n

2024-09-11  14:08:47.002 | info | 52g6dgmu0eudyb | [1;36m(VllmWorkerProcess pid=148)[0;0m INFO 09-11 13:08:37 utils.py:841] Found nccl from library libnccl.so.2\n
2024-09-11  14:08:47.002 | info | 52g6dgmu0eudyb | [1;36m(VllmWorkerProcess pid=148)[0;0m INFO 09-11 13:08:37 multiproc_worker_utils.py:215] Worker ready; awaiting tasks\n
2024-09-11  14:08:01.350 | info | 52g6dgmu0eudyb | \rLoading safetensors checkpoint shards:  80% Completed | 24/30 [06:17<01:38, 16.35s/it]\n
2024-09-11  14:07:59.935 | info | 52g6dgmu0eudyb | INFO 09-11 13:07:59 multiproc_worker_utils.py:136] Terminating local vLLM worker processes\n
2024-09-11  14:07:42.177 | info | 52g6dgmu0eudyb | \rLoading safetensors checkpoint shards:  77% Completed | 23/30 [05:57<01:46, 15.15s/it]\n
2024-09-11  14:07:28.855 | info | 52g6dgmu0eudyb | \rLoading safetensors checkpoint shards:  73% Completed | 22/30 [05:44<02:07, 15.93s/it]\n

Any ideas why that could happen?

10 replies

Gaming

Programming