vladfaust
RRunPod
•Created by vladfaust on 8/9/2024 in #⚡|serverless
Sticky sessions (?) for cache reuse
In my case—building an AI chat application (duh)—it'd be useful to be able to direct a succeeding request to the same node of an ever-scaling endpoint for efficient KV cache reusing. Is that currently possible with Rundpod? Because I as see now, there is no way to force a specific node when making request to a endpoint. The question applies both to the vLLM endpoint template & custom handlers.
10 replies