AI Gateway + Vertex AI Context Caching
thanks for sharing so the ask here is for ai gateway to support tracking the costs of context caching through providers, including google
what does the response look like from google when using conext caching? Curious to see how it splits out input, output, and context caching because that is how we would track tokens to then calculate
3 Replies
Google AI for Developers
Context caching | Gemini API | Google AI for Developers
Learn how to use Context Caching in the Gemini API
Hi @Kathy, thanks for getting back to this. I've made the wrong assumption that I couldn't proxy that cachedContents requests through AI Gateway, it does actually work, which is fantastic.
Google does tell you what tokens were used from the cache, and how many tokens you've used in the cache.
Here's a couple of references.
Vertex AI response:
https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference#response
Cached content usage metadata:
https://cloud.google.com/vertex-ai/generative-ai/docs/reference/rest/v1/projects.locations.cachedContents#UsageMetadata
Google Cloud
Generate content with the Gemini API in Vertex AI | Generative ...
Use the Model API for Gemini in Vertex AI to create custom applications. Review the Gemini model request body, model parameters, response body, and sample requests and responses.
If it helps, here's an AI Gateway succesful attempt log id: 01JSPQTPQVA6JEARQSZSMWTR57