Théo Champion Posts - Answer Overflow

Théo Champion

•Created by Théo Champion on 5/30/2024 in #⚡｜serverless

Issue loading a heavy-ish (HuggingFaceM4/idefics2-8b) model on serverless (slow network?)

Hey there, I'm trying to load the https://huggingface.co/HuggingFaceM4/idefics2-8b model into a serverless worker but i'm running into an issue. I'm loading the model outside the handler function like so:

        self.device = torch.device(
            "cuda" if torch.cuda.is_available() else "cpu")
        self.processor = AutoProcessor.from_pretrained(
            "HuggingFaceM4/idefics2-8b")
        self.model = AutoModelForVision2Seq.from_pretrained(
            "HuggingFaceM4/idefics2-8b",

            torch_dtype=torch.float16,
            # _attn_implementation="flash_attention_2",

        ).to(self.device)
        print("Time taken to load model: ", time.time()-to_time)

        self.device = torch.device(
            "cuda" if torch.cuda.is_available() else "cpu")
        self.processor = AutoProcessor.from_pretrained(
            "HuggingFaceM4/idefics2-8b")
        self.model = AutoModelForVision2Seq.from_pretrained(
            "HuggingFaceM4/idefics2-8b",

            torch_dtype=torch.float16,
            # _attn_implementation="flash_attention_2",

        ).to(self.device)
        print("Time taken to load model: ", time.time()-to_time)

When starting the instances, the model starts downloading from HF, but this takes a awfully long time. So long indeed that the serverless handler seems to never start. And the process starts again on another instance, in loop. My guess is that after X amount of time if the worker doesn't expose the handler function, runpod kills it. The thing is, the model is "only" 35Gb in size. Loading the model on my laptop using my home bandwidth takes only a few minutes. It seems then that the bandwidth allocation for serverless workers is too limited? I feel like this has changed in the past couple of weeks, I never had issues with this in the past. Am I missing something here?

9 replies

RRunPod

•Created by Théo Champion on 5/30/2024 in #⚡｜serverless

Network bandwidth changes?

I have been running multiple models for a while now. But in the past few weeks, I noticed a big change in latency. After investigation I found that the networks speed of my serverless workers was very slow (a few MB/s at most) making my upload/downloads longer thus causing the latency. Has there been any changes to the network bandwidth allocations for serverless workers in the past few weeks? Is there information about the current bandwidth available for serverless workers anywhere ?

1 replies

Gaming

Programming