Madiator2011 (Work)
RRunPod
•Created by nielsrolf on 11/12/2024 in #⚡|serverless
Incredibly long startup time when running 70b models via vllm
also serverless is not using /workspace
11 replies
RRunPod
•Created by nielsrolf on 11/12/2024 in #⚡|serverless
Incredibly long startup time when running 70b models via vllm
usually you dont want download models on sending request
11 replies
RRunPod
•Created by deepblhe on 11/7/2024 in #⚡|serverless
(Flux) Serverless inference crashes without logs.
what version of SDK?
10 replies
RRunPod
•Created by Ergin Bilgin on 11/1/2024 in #⚡|serverless
Llama-3.1-Nemotron-70B-Instruct in Serverless
For 70b + models I would recomend using pod to cache model to network storage then try to run on serverless
3 replies