RunPod•2mo ago

Llama-3.1-Nemotron-70B-Instruct in Serverless

Hello there, I've been trying to deploy Nvidia's Llama-3.1-Nemotron-70B-Instruct in serverless using vLLM template but I could not get it work no matter what. I'm trying to deploy it using an endpoint using 2 x H100 GPUs, but in my most attempts I don't even see weights being downloaded. Requests start and after few minutes worker terminates. In this scenario I get error: Unrecognized model in nvidia/Llama-3.1-Nemotron-70B-Instruct. Should have a model_type

 key in its config.json, or contain one of the following strings in its name: albert, align, altclip, audio-spectrogram-transformer, autoformer, bark, bart, (and list goes on)

Even weirder is that I deploy exact same configuration again but sometimes it downloads the weights and then does not work with different errors each time. It's not consistent. In fact, I tried few other popular 70B models but couldn't get any of them work. Has anybody tried and managed to run 70B models in serverless so far?

2 Replies

Madiator2011 (Work)•2mo ago

For 70b + models I would recomend using pod to cache model to network storage then try to run on serverless

Ergin BilginOP•2mo ago

I've tried to deploy with a network volume as well but that did not change anything. Maybe I didn't configure the volume properly.

Gaming

Programming

Llama-3.1-Nemotron-70B-Instruct in Serverless

Did you find this page helpful?