Google Launches Gemini 2.5 Flash with Novel 'Thinking Budget' Feature

Google’s Gemini 2.5 Flash introduces a “thinking budget” feature that gives developers precise control over AI reasoning levels, reducing output costs by up to 600%. The model outperforms key competitors in several benchmarks while offering customizable performance, making it a highly cost-effective AI deployment option for businesses. Enhanced with dynamic thinking, API-level control, Canvas integration, and leaderboard-level performance, it reflects Google’s rapid evolution toward efficient, fine-tuned AI systems positioned to compete head-to-head with OpenAI.

Google Launches Gemini 2.5 Flash – Key Points

Gemini 2.5 Flash Launch & Availability
Released on April 17, 2025, Gemini 2.5 Flash is now available in preview via Google AI Studio, Vertex AI, and the Gemini app. In the app, it replaces “2.0 Thinking (Experimental)” and is labeled “2.5 Flash (Experimental).” The rollout follows a rapid development cadence, with Flash introduced as the leaner sibling to the 2.5 Pro model in Google’s latest AI family. The model is also listed on Google’s site alongside experimental tools like Veo 2 and Personalization.
‘Thinking Budget’ Customization Mechanism
Developers can assign a reasoning token limit from 0 to 24,576, giving them granular control over AI “thinking.” The model adjusts compute usage based on query complexity, either reducing or intensifying reasoning accordingly. Flash supports both manual control and automatic dynamic reasoning. This hybrid design optimizes for task-based flexibility, helping developers tailor cost and latency trade-offs to real-world use cases.
Drastic Cost Differences Based on Reasoning Use
- Input: $0.15 per million tokens
- Output:
  - $0.60 per million tokens with reasoning off
  - $3.50 per million tokens with reasoning on
    Higher reasoning incurs significant cost, but provides measurable improvements in factual accuracy and reasoning performance, especially in benchmark scenarios where token budget directly correlates with quality.
Benchmark Performance
Gemini 2.5 Flash is performing competitively across multiple benchmarks:
- Humanity’s Last Exam: 12.1%
  - Surpassing Claude 3.7 Sonnet (8.9%) and DeepSeek R1 (8.6%)
  - Trailing OpenAI’s o4-mini (14.3%)
- GPQA Diamond: 78.3%
- AIME Math: 78.0% (2025), 88.0% (2024)
  The Flash model benefits from increased token allocation for reasoning, with internal tests showing output quality scales with token use. Gemini 2.5 Flash is also climbing performance leaderboards, and while Gemini 2.5 Pro currently leads the LMArena AI leaderboard, Flash is not far behind.
Performance vs. Size Comparison
Gemini 2.5 Flash is smaller than Gemini 2.5 Pro and performs significantly better than the previous 2.0 Flash model. It is designed for cost-efficient everyday use while retaining strong reasoning capabilities—something the 2.0 Flash lacked entirely. Despite its reduced footprint, it delivers advanced features like simulated reasoning and dynamic token control.
Enterprise Focus & Cost Efficiency
Tulsee Doshi, Product Director of Gemini Models, emphasizes that Flash serves enterprise needs for predictable pricing and operational efficiency. Tasks that don’t require reasoning—like simple lookups—can run with “thinking” off. For strategic or complex workflows, reasoning can be turned on, with developers defining acceptable compute levels.
API-Level Developer Control & Canvas Support
Developers can integrate 2.5 Flash via Google AI Studio and Vertex AI, with variable pricing options. The model supports Google’s Canvas feature for collaborative editing of text or code. Full research-level features are still in development but are planned for future updates. These tools are aimed at enhancing productivity in document workflows.
Transparency in Reasoning
In AI Studio, developers can view the intermediate reasoning steps (called “thoughts”) before an answer is finalized. Though these are not visible via API yet, token usage can still be monitored. Google expects this transparency to assist in debugging and budget optimization.
Strategic Release During AI Expansion Week
Gemini 2.5 Flash was launched as part of a broader product push:
- Veo 2: Google’s generative video model producing 8-second clips
- Free Student Access: All U.S. college students gain free access to Gemini Advanced until Spring 2026
  The release closely followed OpenAI’s debut of its o3 and o4-mini models, reinforcing the tightly timed rivalry between the two AI leaders. Both companies are now regularly releasing similar capabilities within days of each other.
Dynamic Thinking Feedback Loop & Future Plans
Google is actively collecting developer feedback to iterate on dynamic thinking behavior. Doshi noted the importance of understanding where the model may over-think or under-think, guiding ongoing improvements. While a final release timeline hasn’t been confirmed, internal signals suggest that general availability is approaching.
Consumer vs Developer Usage Expectations
Manual control of reasoning is reserved for developers. Google plans to simplify the Gemini app experience for consumers, relying on dynamic reasoning behind the scenes rather than exposing tuning options. This is part of a broader effort to unify powerful capabilities with ease-of-use at the interface level.

Why This Matters:

Gemini 2.5 Flash is more than a technical upgrade—it marks the commercialization of AI reasoning as a configurable, metered resource. Google’s introduction of “thinking budgets” gives developers precision in balancing cost, latency, and performance, which is especially valuable for enterprises with varied workloads. Its benchmark progress and position near the top of performance leaderboards suggest Flash is not just an affordable model—it’s one that can compete directly with leading offerings from OpenAI. Combined with its strategic timing and integrated toolset, Flash represents Google’s clearest effort yet to shift from reactive development to proactive dominance in the generative AI space.