Serverless vLLM workers crash
Whenever I create a serverless vLLM (doesn't matter what model I use), the workers all end up crashing and having the status "unhealthy". I went on the vLLM supported models website and I use only models that are supported. The last time I ran a serverless vLLM, I used meta-llama/Llama-3.1-70B, and used a proper huggingface token that allows access to the model. The result of trying to run the default "Hello World" prompt on this serverless vLLM is in the attached images. A worker has the status running, but when you open the stats, they all are at 0%, and there are no logs. The worker then has the status unhealthy, and is moved to the Extra section. In this specific scenario, the last worker had the status idle and never picked up the request. I didn't let it sit for too long, only about 10 minutes, but it did not pick up the request and start working.
1 Reply
Can you check the logs?