Gemini 2.5 Models Now Support Implicit Caching

Chinedu Chimamora

Translate this article

Updated:

May 10, 2025

Google has introduced implicit caching for Gemini 2.5 models, a new feature in the Gemini API that builds on the context caching launched in May 2024. This update offers developers a 75% token discount on repetitive context without requiring manual cache setup.

How Implicit Caching Works

With implicit caching, the Gemini API automatically detects when a request shares a common prefix with a previous one and applies a 75% token discount. To increase cache hits, keep consistent content at the start of your prompt and place variable elements, like user queries, at the end. The minimum request size for caching is now 1024 tokens for Gemini 2.5 Flash and 2048 tokens for 2.5 Pro, making more requests eligible.

For guaranteed savings, explicit caching remains available for Gemini 2.5 and 2.0 models. Usage metadata now includes cached_content_token_count, showing how many tokens are cached and charged at the lower rate.

Why It Matters

Implicit caching simplifies workflows and reduces costs for developers building AI applications, from chatbots to content tools. It makes Gemini 2.5 more efficient for projects requiring frequent context reuse.

Get Started

Explore implicit caching in the Gemini API documentation and start using Gemini 2.5 today.

Artificial Intelligence

About the Author