How to deploy Multi-Modal Model on Serverless
I am trying to deploy meta-llama/Llama-3.2-11B-Vision (which is just 11B model) on vllm serverless . Using
M = (P * 4B) / (32/Q) * 1.2
this formula ,I estimated that 26GB of VRAM should be enough to deploy it. But i tried hosting it with 48GB, 80GB, also tried allocataing two 48GB VRAM per node, but it never works. I am getting this error torch.OutOfMemoryError.
0 Replies