R
RunPod4w ago
zethos

Need help in fixing long running deployments in serverless vLLM

Hi, I am trying to deploy migtissera/Tess-3-Mistral-Large-2-123B model on serverless using 8 48GB GPUs using vLLM. The total size of model weights around 245 GB. I have tried two ways: 1st way without any network volume, it takes really long time to serve first request as it needs to download the weights. Then if the worker goes to idle and I sent a request again it downloads weights and takes long time. 2nd way: I tried to use a 300GB network storage but it is usually gets stuck at half of model weights downloading and then the worker gets killed. I am losing money fast because of this. Please help. I have attached all the screenshots.
No description
No description
No description
10 Replies
zethos
zethosOP4w ago
@nerdylive Could you please help? I tried loading the model to network volume using a pod and then attach the network volume to the serverless instaance. Still, its taking time to load.
nerdylive
nerdylive4w ago
1 seems normal, 2nd way because of execution timeout? In your endpoint? Maybe if it's too slow try downloading when you make the image Docker image for serverless template
zethos
zethosOP4w ago
yes. I increased the execution timeout and it works for some time. And then when worker goes idle, again it needs to load the models and a 15-20 misn wait.
nerdylive
nerdylive4w ago
yes thats normal, you will wanna see if this loads your model faster this way, download via a command in your dockerfile, so its built in your image
zethos
zethosOP4w ago
Yes, this is what I am going to do. The model has around 51 files with total siz around 240 GB. I am thinking of building the docker image with the whole 245 GB of files inside using the Option 2 montioned here: https://github.com/runpod-workers/worker-vllm?tab=readme-ov-file#option-2-build-docker-image-with-model-inside Do you think it will be too much 245 GB + take around 20 GB for Ubuntu and cuda drivers.
nerdylive
nerdylive4w ago
hmm yeah im not sure what will be too big, but aslong it works on your registry and runpod you can try it i havent tried building a docker image with model that size
zethos
zethosOP4w ago
yeah, previously I tried with many whisper and Bert based models in a single docker image itself and it worked. May be beacuse of docker image size is small. Hardly 20 GB max. yeah, thanks. Can you tell me the exact template? Or, you were refering to the vLLM template?
nerdylive
nerdylive4w ago
you make your own serverless template, image follow the guide in the vllm there?
zethos
zethosOP4w ago
yeah, also dockerhub has size limit of 100GB, so I cannot put modelfiles inside docker and upload to dockerhub
nerdylive
nerdylive4w ago
ic

Did you find this page helpful?