foxhound Comments - Answer Overflow

foxhound

•Created by foxhound on 1/9/2024 in #⚡｜serverless

[RUNPOD] Minimize Worker Load Time (Serverless)

you're right, then i guess the issue lies in the initial loading of models into VRAM before preprocessing, disabling models offloading helps when its on. Otherwise, everthing gets reinitialised.

40 replies

RRunPod

•Created by blistick on 12/26/2023 in #⚡｜serverless

Slow model loading

The main performance bottleneck doesn't stem from moving the models outside the handler or loading them from a network volume. Instead, the issue lies in the initial loading of models into VRAM (Neither memory or disk) before input preprocessing. I have attempted to mitigate the problem by disabling VRAM offloading. However, if the worker goes off, it triggers a complete reinitialization. 😦

22 replies

Gaming

Programming