Cloudflare Developers•2d ago

AI Gateway + Vertex AI Context Caching

thanks for sharing so the ask here is for ai gateway to support tracking the costs of context caching through providers, including google what does the response look like from google when using conext caching? Curious to see how it splits out input, output, and context caching because that is how we would track tokens to then calculate

3 Replies

ItsWendell•18h ago

https://ai.google.dev/gemini-api/docs/caching?lang=rest

Google AI for Developers

Context caching | Gemini API | Google AI for Developers

Learn how to use Context Caching in the Gemini API

ItsWendell•18h ago

Hi @Kathy, thanks for getting back to this. I've made the wrong assumption that I couldn't proxy that cachedContents requests through AI Gateway, it does actually work, which is fantastic. Google does tell you what tokens were used from the cache, and how many tokens you've used in the cache. Here's a couple of references. Vertex AI response: https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference#response Cached content usage metadata: https://cloud.google.com/vertex-ai/generative-ai/docs/reference/rest/v1/projects.locations.cachedContents#UsageMetadata

Google Cloud

Generate content with the Gemini API in Vertex AI | Generative ...

Use the Model API for Gemini in Vertex AI to create custom applications. Review the Gemini model request body, model parameters, response body, and sample requests and responses.

Google Cloud

REST Resource: projects.locations.cachedContents | Generative A...

ItsWendell•18h ago

If it helps, here's an AI Gateway succesful attempt log id: 01JSPQTPQVA6JEARQSZSMWTR57

Gaming

Programming

AI Gateway + Vertex AI Context Caching

Did you find this page helpful?