Zack
RRunPod
•Created by AdamOH on 3/7/2024 in #⚡|serverless
Possible memory leak on Serverless
Switching all my inference code over to vLLM is what I meant by "wholesale". I did find out that the empty responses were too large of inputs. Memory leak seems to have been something with my inference code that was resolved by switching to vLLM
9 replies
RRunPod
•Created by AdamOH on 3/7/2024 in #⚡|serverless
Possible memory leak on Serverless
I'm pretty sure there was a leak in my inference code. Switching wholesale over to vLLM did resolve this, even if I didnt end up getting a root cause on it
9 replies
RRunPod
•Created by AdamOH on 3/7/2024 in #⚡|serverless
Possible memory leak on Serverless
Following up here, I think I might be seeing this same issue. I see a slow creep of memory and eventual empty outputs with CUDA OOM. Was there any resolution or progress on understanding this issue?
9 replies