Terminating local vLLM process while loading safetensor checkpoints
I started using Llama 3.1 70B as a serverless function recently.
I got it to work, and the setup is rather simple:
2 x A100 GPUs
200GB Network volume
200GB Container storage
Model https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct
The serverless function started successfully a few times, but(!) there are two recurring issues, and the first probably leads to the second.
Issue #1: Loading the safetensors checkpoint is almost always very slow (> 15s/it)
Issue #2: The container is terminated before it loads all the checkpoints, and is basically restarted in a loop for no given reason
Any ideas why that could happen?
9 Replies
@ditti were you able to find a solution to this?
It might caused by network volume.
you mean cuz we have network volume attached?
If you have your model stored in network volume, it might be slow when cold start and occasionally might cause it failed to start the worker
Was able to get this to work locally without network volume - did not test with it.
But I had to use 4x48GB GPU's as it kept crashing on 2x80GB GPUs
Soo is there a way around if the model is large? We couldn't just build an image in it with llm and upload it to registry
we will release a hugging face model cache soon and that should help in this case.
Ooh ya nice