Ergin Bilgin
RRunPod
•Created by Ergin Bilgin on 11/1/2024 in #⚡|serverless
Llama-3.1-Nemotron-70B-Instruct in Serverless
Hello there,
I've been trying to deploy Nvidia's Llama-3.1-Nemotron-70B-Instruct in serverless using vLLM template but I could not get it work no matter what.
I'm trying to deploy it using an endpoint using 2 x H100 GPUs, but in my most attempts I don't even see weights being downloaded. Requests start and after few minutes worker terminates.
In this scenario I get error:
Unrecognized model in nvidia/Llama-3.1-Nemotron-70B-Instruct. Should have a
model_type key in its config.json, or contain one of the following strings in its name: albert, align, altclip, audio-spectrogram-transformer, autoformer, bark, bart, (and list goes on)
Even weirder is that I deploy exact same configuration again but sometimes it downloads the weights and then does not work with different errors each time. It's not consistent.
In fact, I tried few other popular 70B models but couldn't get any of them work.
Has anybody tried and managed to run 70B models in serverless so far?3 replies
How can I clean up storage in my network volume?
Hello, I'm using stable diffusion template with a network volume. I noticed that even though I clean up files in Jupyter, space is not freed up in my volume. I suspect files go to trash but not removed completely. I searched a lot but could not find the trash folder. Does anybody know where I can find or any other way of cleaning up my storage space properly?
4 replies