foxhound
RRunPod
•Created by foxhound on 1/9/2024 in #⚡|serverless
[RUNPOD] Minimize Worker Load Time (Serverless)
you're right, then i guess the issue lies in the initial loading of models into VRAM before preprocessing, disabling models offloading helps when its on. Otherwise, everthing gets reinitialised.
40 replies
RRunPod
•Created by blistick on 12/26/2023 in #⚡|serverless
Slow model loading
The main performance bottleneck doesn't stem from moving the models outside the handler or loading them from a network volume. Instead, the issue lies in the initial loading of models into VRAM (Neither memory or disk) before input preprocessing. I have attempted to mitigate the problem by disabling VRAM offloading. However, if the worker goes off, it triggers a complete reinitialization. 😦
22 replies