Gemini 3.5 Flash launches at Google I/O 2026: the Flash model that beats last-gen Pro

On May 19, 2026 at Google I/O 2026, Google announced Gemini 3.5 Flash — the flagship of the 3.5 generation, positioned as the "default" model for coding and AI agents. The most striking part: this is only the Flash tier (traditionally the lightweight, fast, cheap line), yet according to the DeepMind site it already beats last generation's Gemini 3.1 Pro across a host of coding and agentic benchmarks — while being noticeably faster and cheaper.

Quick summary

When: May 19, 2026, at Google I/O 2026.
What: Gemini 3.5 Flash — the "default" model for coding & AI agents; status "available now" (per DeepMind). The 3.5 Pro version is "coming soon".
Highlight: The Flash model beats last generation's Gemini 3.1 Pro on coding/agentic benchmarks.
Speed: ~4x other frontier models; a 1 million (1M) token context window.
API pricing: ~$1.50 / $9 per 1M tokens (input/output) — per press reports, to be verified against official pricing when available.

What's new at Google I/O 2026?

At its annual event, Google put Gemini 3.5 Flash front and center, positioning it as the default model for coding and building AI agents. Per the DeepMind site, the model is "available now", while the more powerful version — Gemini 3.5 Pro — was introduced as "coming soon".

Beyond the model, Google also updated the Gemini app with a slate of new features: Daily Brief (a daily summary), Gemini Spark (an agent running 24/7) and Gemini Omni (AI video). The company also announced that Gemini has reached more than 900 million monthly users, serving 230+ countries and 70+ languages.

Laptop displaying lines of programming code — Gemini 3.5 Flash is positioned as the default model for coding and AI agents. Photo: Pexels

Benchmarks: the Flash model beats last-gen Pro

According to the figures on the DeepMind site, Gemini 3.5 Flash posts impressive results across coding and agentic benchmark suites:

Terminal-Bench 2.1: 76.2%
MCP Atlas: 83.6%
GDPval-AA: 1656 Elo
Finance Agent v2: 57.9% — versus 43.0% for last generation's Gemini 3.1 Pro
CharXiv: 84.2%

Table — Gemini 3.5 Flash benchmarks (DeepMind figures)
Benchmark	Gemini 3.5 Flash	Gemini 3.1 Pro
Terminal-Bench 2.1	76.2%	—
MCP Atlas	83.6%	—
GDPval-AA	1656 Elo	—
Finance Agent v2	57.9%	43.0%
CharXiv	84.2%	—

Worth noting: the Finance Agent v2 figure shows the new Flash model clearly edging out the previous generation's Pro on a complex agentic task — evidence for the "faster, cheaper, but no weaker" message.

Speed, context and pricing

Per the announcement, Gemini 3.5 Flash runs roughly 4x faster than other frontier models, together with a 1 million (1M) token context window — enough to handle a large codebase or a long document in a single pass.

On cost, per press reports (to be verified once Google publishes official pricing), the API price is around $1.50 per 1M input tokens and $9 per 1M output tokens. That is a very competitive level for a model with coding/agentic capability of this caliber — and part of why Google positions it as the "default" choice.

Table — Gemini 3.5 Flash API pricing (per press reports, to be verified)
Token type	Price / 1M tokens
Input	~$1.50
Output	~$9

What it means for product builders

The fact that a Flash tier — traditionally the cost-optimized line — can beat the previous generation's Pro shows that the cost per unit of AI capability keeps falling fast. For teams building AI agents, coding tools, and internal assistants, this opens up the ability to run complex tasks at a price point that was previously reserved for premium models.

The ~4x speed and 1M-token context also have practical implications: agents respond faster, read more context in a single call, and reduce the number of loops and the overall cost.

The Namtech perspective

A model that is both powerful and cheap is good news for any development team. However, keep one important point in mind: whether you use Gemini or any other cloud AI, your data is still sent to a foreign provider's infrastructure. For workloads involving sensitive data — customer information, internal records, core source code — this is a compliance risk (PDPL, cross-border data transfers) and a question of control.

A reasonable balance: use powerful cloud models like Gemini 3.5 Flash for general tasks, but for core data consider in-house AI running on your own company's infrastructure so the data never leaves the organization.

AI robot interacting with a digital interface — A faster and cheaper model for AI agents and coding. Photo: Tara Winstead / Pexels

FAQ

Is Gemini 3.5 Flash usable yet?

Per the DeepMind site, Gemini 3.5 Flash is "available now" as of launch (May 19, 2026). The more powerful version — Gemini 3.5 Pro — was introduced as "coming soon".

How much does the Gemini 3.5 Flash API cost?

Per press reports, the price is around $1.50 / 1M input tokens and $9 / 1M output tokens. These are figures from press reports, to be verified against Google's official pricing when published.

Why does the Flash model beat last generation's Pro?

This is a newer-generation model (3.5 versus 3.1). Per DeepMind's figures, the clearest example is Finance Agent v2: Gemini 3.5 Flash scores 57.9% versus 43.0% for Gemini 3.1 Pro — reflecting advances in architecture and training between the two generations.

Powerful and cheap is good — but the data still goes to the cloud

For sensitive data, Namtech deploys a private in-house AI platform — the model runs on your infrastructure, the data stays on-premises and never leaves the organization.

Book a free consultation

Note: This article is compiled from public sources as of 22/06/2026; the API pricing figures are per press reports and should be verified against official pricing when available. For reference only.

Sources

Gemini 3.5 Flash launches at Google I/O 2026: the Flash model that beats last-gen Pro at coding and AI agents