HomeArticle

Actual test of running Gemma 4 on a budget phone: It takes 5 minutes to generate once. Local AI is destined to be exclusive to flagship phones.

雷科技2026-04-17 09:44
Don't think too highly of local models.

In April this year, Google released the new-generation open-source large model, Gemma 4. This time, it launched four versions at once, covering everything from mobile phones to workstations. The two smallest versions are specifically designed for mobile devices and are mainly for fully offline operation. This isn't really anything special, but more importantly, Google now wants mobile phones to run local models.

You may have come across a lot of content about the actual installation and testing of Gemma 4. However, most of the existing online tests are conducted on the latest iPhones or flagship phones. These flagships are the latest models, with top-tier performance and computing power, so it's reasonable for them to perform well.

At this point, I can't help but wonder, if you use an ordinary Android phone that costs a few hundred to over a thousand yuan, with a mid-range processor and not top-notch computing power, can the local model still be used normally? How big is the gap compared to those flagship phones?

(Image source: Captured by Lei Technology)

Digging deeper, is local AI destined to be an exclusive feature of flagship phones? We wanted to find out, so we directly took a thousand-yuan Android phone with a mid-range chip to test Gemma 4 and see how it performs.

Running a local model on a thousand-yuan phone is a total letdown

The phone we used for this test is the vivo Y500 Pro, a typical thousand-yuan Android phone. Although it's not an old model, the overall performance of its SoC is just average. After all, its price is set at this level, so there's not much to say. It uses the MediaTek Dimensity 7400, manufactured on TSMC's 4nm process. The CPU configuration consists of 4 large cores at 2.6GHz and 4 small cores at 2.0GHz, and the GPU is the Mali-G615 MC2.

This configuration is normal for the thousand-yuan price range and works fine for daily use. However, when it comes to comparing computing power with current flagship chips, it's not in the same league. In terms of AI, the Dimensity 7400 uses the MediaTek NPU 655, which the official says has a 15% improvement over the previous generation.

Google has released an app called Google AI Edge Gallery for the mobile version of Gemma 4. You can find it directly by searching in the app store. After downloading and opening it, select Gemma 4 E4B. After the model file is downloaded, you can use it directly, offline the whole time, without the need to connect to the Internet or make any configurations. Google really put a lot of effort into this installation experience. Without further ado, let's start the test.

(Image source: Created by Lei Technology)

For the first question, we asked a very practical question: Recommend three movies suitable for watching on a long-distance high-speed train and give reasons. Gemma 4 recommended "Forrest Gump", "Inception", and "La La Land". The movie selections are fine, and all three are quite classic. The reasons for the recommendations also make sense. However, the problem is that it gave an answer of nearly 500 words and also attached a "tip", like reminding you to bring headphones when watching movies on the train.

(Image source: Created by Lei Technology)

On the vivo Y500 Pro, it took a full 2.8 minutes to generate these 500 words. To be honest, I realized after reading that the second half wasn't really necessary.

This is actually a common problem with small-parameter models. They often don't know when to stop answering and sometimes give some "suggestions" to fill up the word count. If you read carefully, you'll find that it can actually be summarized in just two or three sentences.

Next, we chose a classic multi-step logical reasoning question: Five people are sitting in a row. A is not sitting on the leftmost side. B is sitting to the right of C. D is sitting to the left of E. E is not sitting on the rightmost side. Who is sitting in the middle? Although it carefully listed the conditions and made permutations and combinations step by step, it couldn't give the correct answer in the end. Moreover, it took 3.3 minutes, and during this process, we couldn't minimize the app and wait in the background. We had to keep the screen on all the time. That means these 3.3 minutes were completely wasted.

(Image source: Created by Lei Technology)

Of course, we can't blame the Y500 Pro for its insufficient performance. In fact, we also couldn't get the correct answer on the X300 Pro, a flagship model. However, the X300 Pro's answering speed was almost overwhelming. It gave the wrong answer in just 1.6 minutes. Even if it was wrong, it was at least decisive.

(Image source: Created by Lei Technology)

Similarly, I also tried the super classic question that has stumped many large AI models: Should you drive or walk to get your car washed? Surprisingly, this time, the two phones had different thoughts under the same model. The Y500 Pro took 2.5 minutes and told us, "If you're going for a 'car wash', you should choose to walk." This kind of answer is really ridiculous.

(Image source: Created by Lei Technology)

The X300 Pro took a bit of a roundabout way. It seemed to be repeatedly confirming whether the "car wash" action really required a car. But in the end, it still mentioned that if you're going to get your car washed, you should drive there.

After running these three questions, the overall impression of Gemma 4 E4B on the Y500 Pro is that it's very slow and full of nonsense, but it doesn't get very hot.

The slowness is the most obvious feeling. You have to wait two to three minutes on average for each question to see the complete answer. This speed is really unbearable in daily use. To be honest, no one would be willing to stare at the screen and wait for three minutes just to see an answer. But there's a detail worth mentioning. The slowness isn't because the model isn't running. It's because the NPU computing power of the Dimensity 7400 is really limited. It can only process a certain number of tokens per second, and no matter how hard it tries, that's the speed it can achieve.

There's also a relatively high error rate, but it's understandable. When the model is dealing with complex logic, it needs to "think" repeatedly in the intermediate steps. The more computing power it has, the more complete this process will be. On a thousand-yuan phone, this process may be forced to give a conclusion before it's even finished because the computing power is limited. If too many resources are allocated to the calculation, it will be difficult to continue later, so it's more likely to have hallucinations.

Gemma 4 E4B is a multi-modal model, so we also decided to see how well the Y500 Pro can recognize images. We first uploaded a photo of a night-time shopping mall and asked it what information was in the picture.

To be honest, its answer was okay. It described the building scale, roof structure, and night-time atmosphere, which was in the right direction. However, there's a very obvious problem. It completely ignored the large Apple Store sign in the picture and only mentioned a "modern large-scale shopping mall". Brand recognition requires a high level of the model. It needs to match the visual information it sees with the brand knowledge behind it. The parameter quantity of E4B is obviously not enough. It can see the outline but can't recognize what it is.

(Image source: Created by Lei Technology)

For the second picture, we casually took a photo of a green plant and asked what it was. Then it just kept spinning. For a full five minutes, there was no answer, only the loading animation that kept spinning. What's even more frustrating is that during this time, the entire app was completely unresponsive. We couldn't interrupt it and could only wait. In fact, this picture was just a very simple ground-inserted sprinkler for watering flowers, not a very rare device.

(Image source: Created by Lei Technology)

So, can the X300 Pro recognize it correctly? Actually, it can. The X300 Pro answered this question that stumped the Y500 Pro in just 32 seconds. It's a pity that it couldn't accurately identify what the device was and just guessed that it was a small sensor.

(Image source: Created by Lei Technology)

After running these three rounds of tests, Gemma 4 E4B on the Y500 Pro wasn't as completely useless as we thought. On the contrary, there were actually a few small surprises. For example, it didn't get very hot and wasn't very laggy, and it could still answer some simple questions correctly. However, the problem is that as a local model, its answering speed is really too slow. Currently, the permissions of the Google AI Edge Gallery are also insufficient. Apart from turning on the flashlight, it can't perform other system-level operations.

This is quite embarrassing. If it can only achieve this level, with such a slow answering speed and a high error rate, why would users continue to use it? To put it bluntly, unless you're in a situation where there's absolutely no network, an online large model is actually better.

Can ordinary phones really use local models?

Based on the previous tests, currently, Gemma 4 can only reach a "passing" standard on flagship phones. Although there are still cases of errors, at least the speed isn't disappointing, unlike on a thousand-yuan phone, which is both slow and inaccurate.

But looking back, what kind of strategy is Google implementing with this app?

The Google AI Edge Gallery has a function called Mobile Actions, which can directly convert your natural language instructions into operations on the Android system. For example, "Help me create a lunch calendar event" or "Turn on the flashlight". After the model understands your intention, it directly calls the system tools to complete the task.

This path has actually started on flagship phones. The Samsung Galaxy S25 series has launched a cross-application execution chain. You can use just one sentence to make multiple apps work together. For example, saying "Help me navigate to the place where I'm having a meeting tonight", the AI will automatically read the address from the calendar and then directly send it to the map. The whole process doesn't require you to copy and paste or manually switch apps. And the previously popular Doubao phone has even achieved "mobile phone autopilot".

But there's an important fact that needs to be clarified. Most of these automated operations aren't actually run by real local models. This is true for Samsung, Apple, and even the Doubao phone.

(Image source: Doubao Phone Assistant)

In essence, the upper limit of the capabilities of local models is limited. The smaller the parameter quantity, the fewer things it can do. And users' expectations for AI are getting higher and higher. Relying solely on local models can't meet that demand. So, the cloud has become a backup solution. Local models mainly handle some lightweight and real-time tasks, such as notification summaries and voice recognition, which require high speed.

Therefore, Google's app is more like a trial to introduce local models to mobile devices and gradually open up the function permissions for automated phone operations. Then, it wants as many devices as possible to be able to run these models and wait for the computing power of the chips to catch up. But when will chip