Google Launches Gemma 4 Open Models

Key Takeaway

Google has launched Gemma 4, a new family of open-weight AI models for local deployment, and replaced its custom Gemma license with Apache 2.0. The release expands Google’s open model strategy across high-end GPUs, mobile devices, and edge hardware while giving developers more permissive terms.

Gemma 4 Open Models – Key Points

The Story

Google has released Gemma 4 as the first major update to its open-weight model line in about a year, introducing four model variants designed for local use. The company says the new models improve reasoning, coding, vision, audio, and agentic workflow support, while also extending context windows, supporting more than 140 languages, and reducing latency. Google also made a major licensing change, dropping its previous custom Gemma terms in favor of Apache 2.0 and aligning Gemma more closely with other permissively licensed open-weight model families.

The Facts

Gemma 4 launches in four variants across two deployment tiers.
The new lineup includes 31B Dense and 26B A4B Mixture of Experts models for workstation-class use, plus Effective 2B and Effective 4B models for phones, laptops, and embedded devices. Google says the larger models support text and image input with 256,000-token context windows, while the edge models support text, image, and audio with 128,000-token context windows.
The two largest models are designed for local use on powerful hardware, but with different tradeoffs.
Google says the 31B Dense model targets higher-quality output and fine-tuning, while the 26B A4B MoE model is designed to reduce inference costs and improve latency. The company says the larger models can run unquantized in bfloat16 on a single 80GB Nvidia H100 GPU, and also points to quantized deployment on consumer-grade GPUs for local IDEs, coding assistants, and agentic workflows.
The 26B A4B model is built around sparse activation and many small experts.
Google says only 3.8 billion of the model’s roughly 25.2 billion total parameters activate during inference. It also says the architecture uses 128 small experts, activating eight per token plus one shared always-on expert, with the aim of delivering roughly 26B-class capability at compute costs closer to a 4B model.
Google is also pushing lower-precision deployment and serverless options.
The company says it is shipping Quantization-Aware Training checkpoints to preserve quality at lower precision. It also says the 31B Dense and 26B A4B models can run serverlessly in Google Cloud through Cloud Run with Nvidia RTX Pro 6000 GPUs, scaling down to zero when idle.
The smaller E2B and E4B models target mobile and edge devices with more efficient architectures.
Google says these versions were optimized with Qualcomm and MediaTek for hardware such as smartphones, Raspberry Pi, Jetson Nano, and Jetson Orin Nano. It also says the “E” stands for effective parameters: E2B runs with 2.3 billion effective parameters but 5.1 billion total parameters, using Per-Layer Embeddings to keep compute requirements low.
Google says the mobile-focused models improve efficiency, responsiveness, and on-device audio handling.
E2B and E4B reportedly use less memory and battery than Gemma 3 and are designed for near-zero latency. Google also says the edge models support native audio processing, including automatic speech recognition and speech-to-translated-text, with a smaller audio encoder and shorter frame duration than Gemma 3n for more responsive transcription.
Gemma 4 adds capabilities aimed at agentic workflows and multimodal applications.
Google says all four models support native function calling, structured JSON output, and native system instructions for common tools and APIs. It also says Gemma 4 handles variable aspect-ratio image input, supports configurable visual token budgets, and can process multiple images or video frames natively.
Google says Gemma 4 improves coding, OCR, document understanding, chart analysis, and long-context work.
The company says the new vision system is better than the previous generation for OCR and document tasks, and says the models are optimized for code generation as well as broader visual understanding. Google also says the longer context windows allow developers to pass repositories or long documents in a single prompt.
Google positions Gemma 4 as a stronger local AI option with major benchmark gains over Gemma 3.
Google says Gemma 31B ranks number three on the Arena AI text leaderboard for open models, behind GLM-5 and Kimi 2.5, and says the 26B model ranks number six. It also reports that 31B scores 89.2% on AIME 2026, 80.0% on LiveCodeBench v6, 2,150 on Codeforces ELO, 76.9% on MMMU Pro, and 85.6% on MATH-Vision, while the 26B A4B model scores 88.3% on AIME 2026, 77.1% on LiveCodeBench, and 82.3% on GPQA Diamond.
Google says the smaller models also perform strongly for their size class.
The company reports that E4B scores 42.5% on AIME 2026 and 52.0% on LiveCodeBench, while E2B scores 37.5% and 44.0% respectively. Google presents these as major gains relative to Gemma 3 despite the smaller deployment footprint.
The licensing change removes one of the biggest barriers around Gemma.
Google has replaced its previous custom license with Apache 2.0, a more permissive open-source license. The earlier Gemma terms had frustrated developers because they included prohibited-use rules Google could update unilaterally and obligations that extended across Gemma-based projects, while Apache 2.0 removes that extra legal ambiguity around redistribution and commercial deployment. Google says the shift is intended to give developers more control over their data, infrastructure, and deployment choices across on-premises and cloud environments.
Gemma 4 is also becoming the basis for the next Gemini Nano update and a broader ecosystem push.
Google confirmed that Gemini Nano 4 will have 2B and 4B variants based on Gemma 4 E2B and E4B. The company says developers can start prototyping with these models in AI Core Developer Preview and that those systems will be forward-compatible with Gemini Nano 4. Google also says Gemma has now surpassed 400 million downloads and has grown into a community of more than 100,000 variants.

How to Access / Pricing

Google says developers can use the 31B and 26B A4B models in AI Studio and the E4B and E2B models in AI Edge Gallery. Model weights are also available from Hugging Face, Kaggle, and Ollama. Google also offers hosted access to the workstation-class models through Google Cloud, including serverless deployment via Cloud Run with GPU support.

What Is New

Gemma 4 brings several concrete changes at once: a broader model range, native audio in the edge models, deeper multimodal support, stronger built-in function calling, and a more permissive license. The release also adds Quantization-Aware Training checkpoints, a sparse 26B A4B MoE architecture built around 128 experts, support for more than 140 languages, and serverless cloud deployment options for the larger models.

Background / Context

Gemma is Google’s open-weight model family, separate from its closed Gemini products. Google says Gemini Nano, the local AI model used on some Android phones for features such as scam detection, note summaries, and call summaries, has always been derived from Gemma models. The company also says Gemma 4 draws on the same underlying research as Gemini 3, and highlights earlier projects such as INSAIT’s Bulgarian-first BgGPT and Yale-linked Cell2Sentence-Scale as examples of how Gemma models have been adapted for specialized use cases.

Why This Matters

Gemma 4 matters because it combines stronger local AI performance with fewer licensing restrictions and a wider range of deployment options. That gives developers and enterprises more flexibility to build offline, commercial, edge, and serverless AI applications without relying entirely on cloud services or navigating custom legal terms.

This article was drafted with the assistance of generative AI. All facts and details were reviewed and confirmed by an editor prior to publication.