RunPod•4w ago

Need help in fixing long running deployments in serverless vLLM

Hi, I am trying to deploy migtissera/Tess-3-Mistral-Large-2-123B model on serverless using 8 48GB GPUs using vLLM. The total size of model weights around 245 GB. I have tried two ways: 1st way without any network volume, it takes really long time to serve first request as it needs to download the weights. Then if the worker goes to idle and I sent a request again it downloads weights and takes long time. 2nd way: I tried to use a 300GB network storage but it is usually gets stuck at half of model weights downloading and then the worker gets killed. I am losing money fast because of this. Please help. I have attached all the screenshots.

10 Replies

zethosOP•4w ago

@nerdylive Could you please help? I tried loading the model to network volume using a pod and then attach the network volume to the serverless instaance. Still, its taking time to load.

nerdylive•4w ago

1 seems normal, 2nd way because of execution timeout? In your endpoint? Maybe if it's too slow try downloading when you make the image Docker image for serverless template

zethosOP•4w ago

yes. I increased the execution timeout and it works for some time. And then when worker goes idle, again it needs to load the models and a 15-20 misn wait.

nerdylive•4w ago

yes thats normal, you will wanna see if this loads your model faster this way, download via a command in your dockerfile, so its built in your image

zethosOP•4w ago

Yes, this is what I am going to do. The model has around 51 files with total siz around 240 GB. I am thinking of building the docker image with the whole 245 GB of files inside using the Option 2 montioned here: https://github.com/runpod-workers/worker-vllm?tab=readme-ov-file#option-2-build-docker-image-with-model-inside Do you think it will be too much 245 GB + take around 20 GB for Ubuntu and cuda drivers.

nerdylive•4w ago

hmm yeah im not sure what will be too big, but aslong it works on your registry and runpod you can try it i havent tried building a docker image with model that size

zethosOP•4w ago

yeah, previously I tried with many whisper and Bert based models in a single docker image itself and it worked. May be beacuse of docker image size is small. Hardly 20 GB max. yeah, thanks. Can you tell me the exact template? Or, you were refering to the vLLM template?

nerdylive•4w ago

you make your own serverless template, image follow the guide in the vllm there?

zethosOP•4w ago

yeah, also dockerhub has size limit of 100GB, so I cannot put modelfiles inside docker and upload to dockerhub

nerdylive•4w ago

Gaming

Programming

Need help in fixing long running deployments in serverless vLLM

Did you find this page helpful?