How to deploy Multi-Modal Model on Serverless

I am trying to deploy meta-llama/Llama-3.2-11B-Vision (which is just 11B model) on vllm serverless . Using M = (P * 4B) / (32/Q) * 1.2 this formula ,I estimated that 26GB of VRAM should be enough to deploy it. But i tried hosting it with 48GB, 80GB, also tried allocataing two 48GB VRAM per node, but it never works. I am getting this error torch.OutOfMemoryError.
No description
0 Replies
No replies yetBe the first to reply to this messageJoin

Did you find this page helpful?