Zack Comments - Answer Overflow

Topics

Zack

•Created by AdamOH on 3/7/2024 in #⚡｜serverless

Possible memory leak on Serverless

Switching all my inference code over to vLLM is what I meant by "wholesale". I did find out that the empty responses were too large of inputs. Memory leak seems to have been something with my inference code that was resolved by switching to vLLM

9 replies

•Created by AdamOH on 3/7/2024 in #⚡｜serverless

Possible memory leak on Serverless

I'm pretty sure there was a leak in my inference code. Switching wholesale over to vLLM did resolve this, even if I didnt end up getting a root cause on it

9 replies

•Created by AdamOH on 3/7/2024 in #⚡｜serverless

Possible memory leak on Serverless

Following up here, I think I might be seeing this same issue. I see a slow creep of memory and eventual empty outputs with CUDA OOM. Was there any resolution or progress on understanding this issue?

9 replies