Terminating local vLLM process while loading safetensor checkpoints
I started using Llama 3.1 70B as a serverless function recently.
I got it to work, and the setup is rather simple:
2 x A100 GPUs
200GB Network volume
200GB Container storage
Model https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct
The serverless function started successfully a few times, but(!) there are two recurring issues, and the first probably leads to the second.
Issue #1: Loading the safetensors checkpoint is almost always very slow (> 15s/it)
Issue #2: The container is terminated before it loads all the checkpoints, and is basically restarted in a loop for no given reason
Any ideas why that could happen?
0 Replies