2B open-source model for mobile shrimp farming, Google's Gemini 3 technology is made more accessible: supports voice and video multimodality, completely free and commercially available.
31B open-source model outperforms models dozens of times larger and ranks among the top three in the open-source arena.
The Gemma 4 just released by Google is quite powerful.
It has defeated Qwen3.5-397B and DeepSeek v3.2-671B, whose parameter counts are 10 - 20 times larger than it.
The models that have defeated it are GLM-5 (745B) and Kimi K2.5 (1T), both of which are the latest flagships in 2026.
For a 31B model, this achievement is enough.
This time, the entire Gemma 4 series has a total of four sizes. Built on the same technology as Gemini 3, the whole series supports multi-modalities:
E2B, E4B, 26B MoE, 31B Dense.
The smallest 2B version can run on mobile phones and Raspberry Pi, and the results it achieves are not what a small model is expected to achieve.
Even without an internet connection, a mobile phone can run multi-modal processing of voice and video.
The 31B model defeats opponents 20 times larger, and the 26B model only activates 3.8B parameters
The four models have different positions, but they share a common point: efficiency comes first.
The largest 31B Dense model, as a dense model, is suitable for fine-tuning.
The 26B MoE model only activates 3.8B parameters during inference. It ranks 6th in the global open-source list, pursuing speed and using the least amount of activation to achieve the fastest inference.
The hardware requirements for them are not high.
The unquantized bfloat16 weights can run on a single 80GB H100. The quantized version can run locally on ordinary consumer-grade GPUs.
According to Google's official blog, the Gemma 4 family has significantly enhanced six core capabilities:
Advanced reasoning: Supports multi-step planning and in-depth logic, with a significant improvement in mathematical and instruction-following benchmark tests.
Native support for Agent workflows: Built-in function calls, structured JSON output, and native support for system instructions, enabling direct construction of autonomous agents.
Code generation: Supports high-quality offline code generation, turning a workstation into a local AI code assistant.
Visual and audio processing: The whole series can natively process videos and images, supporting variable resolutions. OCR and chart understanding are no problem. The smaller versions also support native audio input.
Long context support: The client-side version has a context window of 128K, and the large-parameter version can reach up to 256K, capable of reading an entire code repository at once.
Support for over 140 languages: Natively trained on over 140 languages, no need for separate localization for global applications.
Multi-modal processing runs offline on mobile phones, and Raspberry Pi can also handle it
Let's focus on two small models designed specifically for the client side.
The "E" in E2B and E4B stands for Effective, and they also follow the low-activation MoE approach.
The E2B actually has far more than 2B parameters, but only 2B of them are activated during inference.
Their task is clear: to natively process audio and vision on mobile phones and IoT devices.
This means that a mobile phone can directly use the camera to see and the microphone to listen without an internet connection, and then give you a response.
Zero latency, zero cloud dependency.
Google has also collaborated with its own Pixel phones, as well as Qualcomm and MediaTek, to optimize the entire chain, from the chip to the model to the device.
Looking back at the evolution path of the Gemma series, the value of this update becomes clearer.
Gemma 1 was released in February 2024, with two sizes, 2B and 7B, and could only process plain text.
Gemma 2 followed in June of the same year, with sizes of 2B, 9B, and 27B, still only for plain text.
Gemma 3 didn't start supporting multi-modalities until March 2025, but the 1B client-side version had limited capabilities.
Now, E2B and E4B have directly integrated multi-modalities into the client-side small models, and the capabilities are completely different.
In addition to multi-modalities, these two models also support a complete agent workflow, including function calls, structured JSON output, and system instructions.
A mobile phone can become a completely local AI code assistant, with low power consumption and token freedom.
Apache 2.0 license, the voice of the community is heard
Gemma 4 fully adopts the Apache 2.0 license.
It can be summarized in three words: free to use.
In the past, the licenses of Google's open-source models have been criticized for being "not pure enough." The custom license agreements used for Gemma 1 and 2, although allowing commercial use, made legal departments worried due to the wording of the terms.
This time, you can use it to create commercial products without paying a single cent to Google. You can deploy the model in any environment, including public clouds, private data centers, and edge devices.
In the official blog, the Google DeepMind team wrote that in the past two years, the community has repeatedly called on GitHub issues, forums, and social media:
We want the Apache 2.0 license.
Google has heard this time.
The CEO of Hugging Face also stated immediately that he believes this is not just a simple license change, but a watershed for the open-source AI community.
The release of Gemma 4 under the Apache 2.0 license is a huge milestone. We are very excited to support the entire Gemma 4 series of models on Hugging Face on the first day.
As of now, the cumulative downloads of the Gemma series of models have exceeded 400 million times. There are over 100,000 model variants contributed by the community.
One More Thing
The value of open-source models is not just about saving developers' money.
A research team from Yale University has used Gemma as a base model to develop a project called Cell2Sentence-Scale.
They convert single-cell gene expression data into input sequences for the language model, allowing AI to directly "read" cell states.
As a result, several new paths that were previously overlooked by traditional methods have been found in the discovery of cancer treatment targets.
Without Gemma, this project might have cost millions of dollars to buy API calls.
But now, a small model with dozens of B parameters has promoted real scientific discoveries.
The next time you hear a story about "what AI has changed," the starting point might be a small open-source model.
Reference links:
[1]https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4/?utm_medium=social&utm_content=
[2]https://x.com/victormustar/status/2039739591276581118?s=20
[3]https://x.com/billtheinvestor/status/2039805141876871376?s=20
This article is from the WeChat official account "Quantum Bit", author: Meng Chen. Republished by 36Kr with permission.