Mandragora.ai
RRunPod
•Created by Mandragora.ai on 5/10/2024 in #⚡|serverless
Serverless broke for me overnight, I can't get inference to run at all.
Hi, I was using
runpod/worker-vllm:stable-cuda12.1.0
in my production app with the model TheBloke/dolphin-2.7-mixtral-8x7b-AWQ
. There appears to have been an update in the last 24 hours or so that broke my app completely. I have since spent the last six hours trying to get ANYTHING out of ANY endpoint, and I just can't get anything running. Prior to today, this was running uninterrupted for over a month. I have tried:
- Rolling back to runpod/worker-vllm:0.3.1-cuda12.1.0
- Swapping out models; tried easily 8 or 9 different ones, mostly mixtral variants. I have tried AWQ, GPTQ and unquantized models.
Logs and observations in thread (post was too long)101 replies