H100 NVL
If I've understood the docs correctly, H100 NVL is not available on serverless. Are there any plans to bring it to serverless? The extra 14GB of VRAM over the other GPUs is pretty useful for 70(ish)B parameter LLMs.
4 Replies
you can try 4*48
I'm specifically interested in an 8-bit quant of Qwen2.5 72B, which uses 77GB of VRAM, leaving very little overhead with a single 80GB GPU
I estimated 2x RTX 6000 Ada to be the cheapest, but I can see that the 48GB PRO option lists 3 GPUs: L40, L40S, RTX 6000 Ada. Is there any way to pick which one to use or is the allocation just random?
yes, you can
ah ok, didn't really expect it to be there
thanks a lot