Sticky sessions (?) for cache reuse

In my case—building an AI chat application (duh)—it'd be useful to be able to direct a succeeding request to the same node of an ever-scaling endpoint for efficient KV cache reusing. Is that currently possible with Rundpod? Because I as see now, there is no way to force a specific node when making request to a endpoint. The question applies both to the vLLM endpoint template & custom handlers.
8 Replies
nerdylive
nerdylive5mo ago
Yeah i think that would be a #🧐|feedback i would also hope for a same feature, different application, right @flash-singh ? So in short request with sticky sesion that will make request to be redirected to the same worker that has that session
Encyrption
Encyrption5mo ago
When you create your Serverless Endpoint you can select FlashBoot and RunPod will attempt to re-use your image cache to keep from having to re-load the entire image for each request. This will happen if the subsequent request is QUEUED and ready go when the last request is fulfilled. For this to work optimally your model should be baked into your image (don't use network volume, don't load any models at run-time). The easiest way to get workers that do not reload after each request is to enable some active workers. Also, since you only pay for processing actual requests you should always set your max workers to 30.
vladfaust
vladfaustOP5mo ago
Nope, this is not what we're talking about here. It's about when we already have a bunch of active workers, subsequent requests may use a sticky session to be optimally routed to a certain worker node.
nerdylive
nerdylive5mo ago
Sorry was using confusing language, I fixed it
yhlong00000
yhlong000005mo ago
Yes, when requests come in, they are distributed to any available idle workers
nerdylive
nerdylive5mo ago
noooo i think the author asked for a feature that can "direct a succeeding request to the same worker" in other words
yhlong00000
yhlong000005mo ago
Yes, I understand. I just wanted to confirm the current behavior. Sticky sessions are not currently available.
nerdylive
nerdylive5mo ago
ooh its not available yet
Want results from more Discord servers?
Add your server