Hands-on Test of Google Gemma 4: Usable Offline on Phone but Fails All Logic Questions

Google launches the lightweight multimodal model Gemma 4, supporting edge-side AI deployment.

To be honest, the circle of large AI models has been quite abstract recently.

Each company is quietly working on applications leaning towards commercial implementation. Several big tech companies are preoccupied with integrating the concept of OpenClaw into their own products. However, truly eye - catching breakthroughs in underlying technologies are hardly seen.

Google scratched its head and thought this situation wasn't right.

So, just a few days ago, Google launched the new - generation open - source model Gemma 4, which includes four specifications: E2B, E4B, 26B, and 31B. Among them, the two smaller models, E2B and E4B, can be directly deployed and run on devices such as mobile phones and Raspberry Pi. The 26B and 31B models can also run with just a consumer - grade graphics card.

(Image source: Lei Technology)

You know, the AI - enabled mobile phones that were all the rage in the past two years. After people bought them home and used them for half a year, they found that over 90% of the core functions still had to rely on sending data to cloud servers via the Internet. Once the network was disconnected, the phones were useless. This really left people feeling underwhelmed.

Google stated that the release of Gemma 4 represents a significant advancement in AI on mobile devices. It brings powerful multimodal functions to end - side devices such as mobile phones, tablets, and laptops, allowing users to experience the high - efficiency processing performance that was previously only available on advanced cloud models.

Is it another case of punching above its weight? It's quite interesting.

To see the true quality of this thing, I also downloaded the latest model released by Google for testing. Next, I'll tell you about the highlights.

Google Aims to Punch Above Its Weight

Why did Google's move cause such a big stir this time?

To figure this out, we first need to understand what this model is.

Gemma 4 E2B/E4B is a lightweight end - side large model developed by Google using the MatFormer architecture. It achieves long - context and low - memory consumption design through the PLE and Hybrid Attention structures. Its memory usage is comparable to that of traditional 2B and 4B models, and it can be normally invoked with a minimum of 3.2GB of memory.

(Image source: Google)

Secondly, we need to understand what this model can do.

In the past, most large models on mobile phones simply cut a large part of the parameters of cloud models and stuffed them into mobile phones. As a result, they were often one - sided, only capable of simple text Q&A.

However, the E2B and E4B models of Gemma 4 have completely changed the approach. As mentioned above, they adopt a native multimodal design from the underlying architecture.

Native multimodality means that this model natively supports multiple input modalities such as images, audio, and video. It doesn't need to translate what you say into text and then slowly understand it. Instead, it can directly understand the tone and meaning. When looking at pictures, it doesn't need to violently compress high - resolution photos but can directly see the details in the picture.

(Image source: Google)

At least in theory.

Finally, how can I use Gemma 4?

A year ago, deploying an end - side large model on a mobile phone was an extremely complex task, often requiring the help of a Linux virtual machine. Lei Technology even published a tutorial on this. So, it's quite reasonable for people to have such questions.

But now, there's no need for that.

Google quietly launched a new application last year called Google AI Edge Gallery, which allows users to directly run open - source AI models from the Hugging Face platform on their mobile phones. This is Google's first attempt to bring lightweight AI inference to local devices.

(Image source: Google)

Currently, this application is available for download on the Android platform. Interested readers can directly go to the Play Store to download and experience it. After loading the large model, users can use this application to implement conversational AI, image understanding, and the prompt word laboratory function. They can even import custom LiteRT format models.

There's no need to connect to the Internet. You can directly use the local computing power of your mobile phone to complete tasks. It's that simple.

More Suitable for Mobile Devices

Next, it's time for the highly anticipated testing session.

As shown in the figure, Google has prepared nine models by default for this application. Among them, there are Google's own Gemma series, as well as open - source models from Qianwen and DeepSeek. We selected the currently strongest Gemma 4 - E4B, the previous - generation Gemma 3n - E4B, Qianwen's Qwen2.5 - 1.5B, and DeepSeek - R1 - 1.5B for testing.

First, a series of classic logical questions:

Q: How many letters "r" are there in the word "Strawberry"?

This question seems simple, but it has actually stumped many large AI models.

After actual testing, all the models deployed by Google will answer "2". However, the Qwen3 - 4B GGUF model I deployed separately can give the correct answer "3". But its seemingly endless thinking made it take a full two and a half minutes to generate the answer, which is quite a waste of time.

(Image source: Lei Technology)

Q: Two fathers and sons caught three fish, and each person got one. How is this possible?

This is even more extreme. None of the models could answer it correctly. Even when I asked my colleagues in real life, at least half of them couldn't figure it out. It can only be said that this kind of logical question that plays with words is a test of the concentration of both humans and large models.

(Image source: Lei Technology, from left to right: Gemma 4, Gemma 3n, DS R1, Qwen2.5)

Q: There are three people, A, B, and C. One of them is a knight (who only tells the truth), one is a knave (who only tells lies), and one is a spy (who can tell either the truth or lies).

A says: 'I am a knight.'

B says: 'What A says is true.'

C says: 'B is a spy.'

Given that the identities of the three people are different, please infer who A, B, and C are respectively and explain the reason.

This time, after a series of exhaustive reasoning, Gemma 4 finally got this question right. The total time taken was 59 seconds, which isn't too long. As for the other three large models, some just talked nonsense seriously, and some got stuck in an infinite loop of thinking.

(Image source: Lei Technology, from left to right: Gemma 4, Gemma 3n, DS R1, Qwen2.5)

From the results, small - parameter models do significantly reduce the logical thinking ability of the models. The thinking function can reduce the possibility of AI hallucinations to a certain extent, but it also increases the time required for generation.

Then, a relatively simple literary misleading question:

Q: What is the previous line of the poem "Planting beans under the southern hill"?

In fact, this is the first line of Tao Yuanming's "Returning to Seclusion in the Countryside, Part III", and there is no previous line. This is a good opportunity to see if these small - parameter models fabricate data in order to answer questions.

The result is that all of them got it wrong. When it comes to making Tao Yuanming a modern poet.

Next, a simple text - processing task.

Specifically, I provided an article of about 2,500 words and hoped that they could give a corresponding summary of the article.

Among them, only Gemma 3n - E4B and Gemma 4 - E4B could complete the task. However, the former took nearly two minutes and the answer it gave missed the point, while the latter gave a more concise answer.

As for the DS R1 - 1.5B with the smallest parameters, it couldn't give a reply at all.

(Image source: Lei Technology, from left to right: Gemma 4, Gemma 3n, DS R1, Qwen2.5)

From the above four rounds of testing, in terms of text processing and logical reasoning ability, Gemma 4 - E4B has a slight improvement. However, it significantly leads in terms of generation speed and reply success rate. It can only be said that in - depth thinking is obviously not suitable for local models.

However, Gemma 3n is not just a simple text - based large model. It is a rare small - parameter multimodal large model.

First, test the Ask Audio function exclusive to Gemma. I imported a 21 - minute wav audio. It can be seen that currently, it only supports uploading content of up to 30 seconds. The content converted from speech to text has little to do with the original audio, and its current usability is quite average.

(Image source: Lei Technology)

Next is Ask Image. I can ask questions to Gemma 4 by taking a photo or uploading a photo.

After actual testing, Gemma 4 is much more accurate in identifying elements in pictures. It can basically fully reproduce the elements in the picture. However, it still doesn't understand anime characters at all. Applications such as flower recognition are also not accurate. Only relatively common things like food and hardware can be recognized.

(Image source: Lei Technology)

As for Agent Skills... Apart from two word - games, currently, several functions need to be implemented with an Internet connection, which has little to do with end - side large models.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Google Gemma 4 hands-on test: It can be used even when the phone is offline, but it failed all the logic questions.

Google Aims to Punch Above Its Weight

More Suitable for Mobile Devices