vladfaust
vladfaust
RRunPod
Created by vladfaust on 9/5/2024 in #⚡|serverless
I shouldn't be paying for this
No description
7 replies
RRunPod
Created by vladfaust on 8/9/2024 in #⚡|serverless
Sticky sessions (?) for cache reuse
In my case—building an AI chat application (duh)—it'd be useful to be able to direct a succeeding request to the same node of an ever-scaling endpoint for efficient KV cache reusing. Is that currently possible with Rundpod? Because I as see now, there is no way to force a specific node when making request to a endpoint. The question applies both to the vLLM endpoint template & custom handlers.
10 replies