blabbercrab
RRunPod
•Created by blabbercrab on 7/7/2024 in #⚡|serverless
Trying to load a huge model into serverless
https://huggingface.co/cognitivecomputations/dolphin-2.9.2-qwen2-72b
Anyone have any idea how to do this in vLLM?
I've deployed using two 80GB gpus and have had no luck
16 replies
RRunPod
•Created by blabbercrab on 7/5/2024 in #⚡|serverless
Serverless is timing out before full load
I have a serverless endpoint which is loading a bunch of loras on top of sdxl, and the time it takes is a lot (more than 500 seconds) on the first load.
This used to work well until I added even more loras, and now it's timing out and "removing container" and restarting it again and again
Any tips to fix this?
39 replies