Google today announced the general availability of Gemini 3.5 Flash during the Google I/O 2026 keynote, marking the company's most significant Flash-series release to date. The model is now live across the Gemini API, Google AI Studio, Vertex AI, the Gemini app, and AI Mode in Google Search, where it has been set as the default model.

What's New in Gemini 3.5 Flash?

CEO Sundar Pichai described Gemini 3.5 Flash as "remarkably fast" — roughly 4x faster in output token generation than other frontier models — while delivering the highest intelligence in the Flash family to date. The model is built for sustained frontier-level performance on agentic workflows, coding, and complex reasoning tasks.

"We are in the agentic Gemini era," Pichai said during his keynote address at I/O 2026. "This model is designed for sustained frontier performance in agents and coding."

Key specs include a 1 million-token input context window, 64,000-token output capacity, knowledge cutoff of January 2025, and multimodal support for text, images, video, audio, and PDF inputs with text output. The model supports tool use, function calling, structured output, code execution, and search grounding.

Benchmark Performance

According to official data from Google DeepMind, Gemini 3.5 Flash delivers substantial improvements over Gemini 3 Flash across the board:

  • Coding & Agents: 76.2% on Terminal-bench 2.1 (vs. 58.0% for Gemini 3 Flash), 55.1% on SWE-Bench Pro, and 83.6% on MCP Atlas for multi-step workflows.
  • Computer Use: 78.4% on OSWorld-Verified, a benchmark for agentic computer use.
  • Finance: 57.9% on Finance Agent v2, compared to 42.6% for the prior generation.
  • Multimodal: 84.2% on CharXiv for complex chart analysis, 83.6% on MMMU-Pro.
  • Reasoning: 72.1% on ARC-AGI-2, more than doubling the 33.6% score of Gemini 3 Flash.

In many benchmarks, Gemini 3.5 Flash actually outperforms Gemini 3.1 Pro — Google's previous premium-tier model — while maintaining Flash-series latency.

Pricing and Availability

The model is priced at $1.50 per million input tokens and $9.00 per million output tokens (including thinking tokens). This represents roughly 3–5x the cost of Gemini 3 Flash, but Google argues the higher intelligence reduces the number of retries needed for complex tasks, potentially lowering overall cost.

A free tier with rate limits is available for prototyping, and context caching costs $0.15 per million tokens per hour. Enterprise pricing via Vertex AI may vary slightly.

Why This Matters

Gemini 3.5 Flash represents more than a routine upgrade. Google's decision to make it the default model across the Gemini app and AI Mode in Search signals strong internal confidence. The model arrives at a time when competition among frontier LLMs is at its peak, with Anthropic's Claude, OpenAI's GPT, and Meta's Llama all pushing forward.

The strong focus on agentic capabilities — autonomous execution of complex, multi-step workflows — reflects where the industry is heading. Alongside the model, Google announced Gemini Spark, a new agent platform, and Managed Agents in the Gemini API.

Industry Reception

Early coverage has been largely positive. Tech outlets including CNET, Mashable, and Tom's Guide highlighted the model's impressive speed-to-intelligence ratio. Enterprise partners like JetBrains and Box reported 10–20% improvements in code quality and accuracy compared to previous models.

The Bottom Line

Gemini 3.5 Flash is a meaningful leap forward in Google's Flash model line, prioritizing agentic performance, output speed, and benchmark results that approach — and sometimes surpass — premium-tier models. The higher per-token cost may deter some developers, but for complex tasks involving coding, analysis, and automation, the value proposition is compelling. A Gemini 3.5 Pro variant is expected later this year.