AI Gateway + Vertex AI Context Caching

thanks for sharing so the ask here is for ai gateway to support tracking the costs of context caching through providers, including google what does the response look like from google when using conext caching? Curious to see how it splits out input, output, and context caching because that is how we would track tokens to then calculate
3 Replies
ItsWendell
ItsWendell18h ago
Hi @Kathy, thanks for getting back to this. I've made the wrong assumption that I couldn't proxy that cachedContents requests through AI Gateway, it does actually work, which is fantastic. Google does tell you what tokens were used from the cache, and how many tokens you've used in the cache. Here's a couple of references. Vertex AI response: https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference#response Cached content usage metadata: https://cloud.google.com/vertex-ai/generative-ai/docs/reference/rest/v1/projects.locations.cachedContents#UsageMetadata
Google Cloud
Generate content with the Gemini API in Vertex AI  |  Generative ...
Use the Model API for Gemini in Vertex AI to create custom applications. Review the Gemini model request body, model parameters, response body, and sample requests and responses.
ItsWendell
ItsWendell18h ago
If it helps, here's an AI Gateway succesful attempt log id: 01JSPQTPQVA6JEARQSZSMWTR57

Did you find this page helpful?