Google launches ‘implicit caching’ to make accessing its latest AI models cheaper do sex sex to

May, 08 2025 18:35 PM

Google is rolling out a feature in its Gemini API that the company claims will make its latest AI models cheaper for third-party developers. Google calls the feature “implicit caching” and says it can deliver 75% savings on “repetitive context” passed to models via the Gemini API. It supports Google’s Gemini 2.5 Pro and 2.5 Flash models. That’s likely to be welcome news to developers as the cost of using frontier models continues to grow. We just shipped implicit caching in the Gemini API, automatically enabling a 75% cost savings with the Gemini 2.5 models when your request hits a cache 🚢We also lowered the min token required to hit caches to 1K on 2.5 Flash and 2K on 2.5 Pro!— Logan Kilpatrick (@OfficialLoganK) May 8, 2025 Caching, a widely adopted practice in the AI industry, reuses frequently accessed or pre-computed data from models to cut down on computing requirements and cost. For example, caches can store answers to questions users often ask of a model, eliminating the need for the model to recreate answers to the same request. Google previously offered model prompt caching, but only explicit prompt caching, meaning devs had to define their highest-frequency prompts. While cost savings were supposed to be guaranteed, explicit prompt caching typically involved a lot of manual work. Some developers weren’t pleased with how Google’s explicit caching implementation worked for Gemini 2.5 Pro, which they said could cause surprisingly large API bills. Complaints reached a fever pitch in the past week, prompting the Gemini team to apologize and pledge to make changes. In contrast to explicit caching, implicit caching is automatic. Enabled by default for Gemini 2.5 models, it passes on cost savings if a Gemini API request to a model hits a cache. Techcrunch event Exhibit at TechCrunch Sessions: AI Secure your spot at TC Sessions: AI and show 1,200+ decision-makers what you’ve built — without the big spend. Available through May 9 or while tables last. Exhibit at TechCrunch Sessions: AI Secure your spot at TC Sessions: AI and show 1,200+ decision-makers what you’ve built — without the big spend. Available through May 9 or while tables last. Berkeley, CA | June 5 BOOK NOW “[W]hen you send a request to one of the Gemini 2.5 models, if the request shares a common prefix as one of previous requests, then it’s eligible for a cache hit,” explained Google in a blog post. “We will dynamically pass cost savings back to you.” The minimum prompt token count for implicit caching is 1,024 for 2.5 Flash and 2,048 for 2.5 Pro, according to Google’s developer documentation, which is not a terribly big amount, meaning it shouldn’t take much to trigger these automatic savings. Tokens are the raw bits of data models work with, with a thousand tokens equivalent to about 750 words. Given that Google’s last claims of cost savings from caching ran afoul, there are some buyer-beware areas in this new feature. For one, Google recommends that developers keep repetitive context at the beginning of requests to increase the chances of implicit cache hits. Context that might change from request to request should be appended at the end, the company says. For another, Google didn’t offer any third-party verification that the new implicit caching system would deliver the promised automatic savings. So we’ll have to see what early adopters say. Topics AI, gemini, Google Kyle Wiggers AI Editor Kyle Wiggers is TechCrunch’s AI Editor. His writing has appeared in VentureBeat and Digital Trends, as well as a range of gadget blogs including Android Police, Android Authority, Droid-Life, and XDA-Developers. He lives in Manhattan with his partner, a music therapist. View Bio May 13, 2025 London, England Get inside access to Europe’s top investment minds — with leaders from Monzo, Accel, Paladin Group, and more — plus top-tier networking at StrictlyVC London. REGISTER NOW Most Popular Google launches ‘implicit caching’ to make accessing its latest AI models cheaper Kyle Wiggers Instagram Threads is getting video ads Sarah Perez PowerSchool paid a hacker’s ransom, but now schools say they are being extorted Zack Whittaker Google rolls out AI tools to protect Chrome users against scams Aisha Malik China’s Geely moves to take EV startup Zeekr private amid trade war with US Rebecca Bellan Exhibit your startup at TechCrunch Sessions: AI while you still can! TechCrunch Events Starlink’s launch in India now a matter of when, not if Jagmeet Singh

Read Full News

JABALPUR ARTS

Google launches ‘implicit caching’ to make accessing its latest AI models cheaper do sex sex to

Continue reading with Next Blog:

Meta wins more than $167 million in damages from spyware maker that targeted WhatsApp do sex

Recent Posts