RunPod•11mo ago

Issue loading a heavy-ish (HuggingFaceM4/idefics2-8b) model on serverless (slow network?)

Hey there, I'm trying to load the https://huggingface.co/HuggingFaceM4/idefics2-8b model into a serverless worker but i'm running into an issue. I'm loading the model outside the handler function like so:

        self.device = torch.device(
            "cuda" if torch.cuda.is_available() else "cpu")
        self.processor = AutoProcessor.from_pretrained(
            "HuggingFaceM4/idefics2-8b")
        self.model = AutoModelForVision2Seq.from_pretrained(
            "HuggingFaceM4/idefics2-8b",

            torch_dtype=torch.float16,
            # _attn_implementation="flash_attention_2",

        ).to(self.device)
        print("Time taken to load model: ", time.time()-to_time)

        self.device = torch.device(
            "cuda" if torch.cuda.is_available() else "cpu")
        self.processor = AutoProcessor.from_pretrained(
            "HuggingFaceM4/idefics2-8b")
        self.model = AutoModelForVision2Seq.from_pretrained(
            "HuggingFaceM4/idefics2-8b",

            torch_dtype=torch.float16,
            # _attn_implementation="flash_attention_2",

        ).to(self.device)
        print("Time taken to load model: ", time.time()-to_time)

When starting the instances, the model starts downloading from HF, but this takes a awfully long time. So long indeed that the serverless handler seems to never start. And the process starts again on another instance, in loop. My guess is that after X amount of time if the worker doesn't expose the handler function, runpod kills it. The thing is, the model is "only" 35Gb in size. Loading the model on my laptop using my home bandwidth takes only a few minutes. It seems then that the bandwidth allocation for serverless workers is too limited? I feel like this has changed in the past couple of weeks, I never had issues with this in the past. Am I missing something here?

HuggingFaceM4/idefics2-8b · Hugging Face

5 Replies

Jason•11mo ago

like the download is very slow? im not sure with like limited bandwidth change maybe try to create a support case for this and its probably a network outage so the network is slower

Théo ChampionOP•11mo ago

I have created a support case regarding the limited bandwidth around a week ago for another endpoint. Still haven't received a reply. That's why I assumed there has been a network wide change in bandwidth allocation

digigoblin•11mo ago

I've also noticed this bug in serverless where if the serverless handler isn't invoked in about 2-3 mins, then another worker spawns to handle the same request.

Théo ChampionOP•11mo ago

Yeah, if it is the case, I wish it was documented somewhere

digigoblin•11mo ago

I don't think its documented because its probably not supposed to actually do that, I think its more of a bug

Gaming

Programming

Issue loading a heavy-ish (HuggingFaceM4/idefics2-8b) model on serverless (slow network?)

Did you find this page helpful?