j Comments - Answer Overflow

•Created by j on 1/22/2025 in #⛅｜pods

Model Maximum Context Length Error

Got it. So do you know other AI chat sites handle this? Does everyone just write custom code if they're using Runpod vLLM?

20 replies

RRunPod

•Created by j on 1/22/2025 in #⛅｜pods

Model Maximum Context Length Error

Got it - so vLLM doesn't help with truncating things? I just asked b/c coming from Ollama it will automatically update your prompt so that it continues to work even past the max context length.

20 replies

RRunPod

•Created by j on 1/22/2025 in #⛅｜pods

Model Maximum Context Length Error

Unfortunately I didn't save it and Runpod logs don't go back that far - but I guess doesn't it not really matter as long as we have to set a max limit? Because in a chat application we'll eventually go past it.

20 replies

RRunPod

•Created by j on 1/22/2025 in #⛅｜pods

Model Maximum Context Length Error

Yes, but when I do that, specifically setting to 8192, I get a separate error saying that I have exceeded the maximum context length. But in general, even if I manage to set it a little higher, won't I run into the same problem then?

20 replies

RRunPod

•Created by j on 1/22/2025 in #⛅｜pods

Model Maximum Context Length Error

Yep, but won't it just default to something else even if I don't set those? And then we'll run into the same issue at whatever number of tokens that is?

20 replies

RRunPod

•Created by j on 1/22/2025 in #⛅｜pods

Model Maximum Context Length Error

But am now running into a new error:

responseBody: `{"object":"error","message":"This model's maximum context length is 4096 tokens. However, you requested 4133 tokens (3877 in the messages, 256 in the completion). Please reduce the length of the messages or completion.","type":"BadRequestError","param":null,"code":400}`,

responseBody: `{"object":"error","message":"This model's maximum context length is 4096 tokens. However, you requested 4133 tokens (3877 in the messages, 256 in the completion). Please reduce the length of the messages or completion.","type":"BadRequestError","param":null,"code":400}`,

I didn't see this when using the serverless endpoints. So my question: - Is there something I can be setting on vLLM to automatically manage the context length for me? I.e. to delete tokens from the prompt or messages automatically for me? Or do I need to manage this myself? Thanks!

20 replies

Gaming

Programming