Can everyone use local AI? We tested it on a budget phone, and the results were disappointing.
In April this year, Google released the new-generation open-source large model, Gemma 4. This time, it launched four versions at once, covering everything from mobile phones to workstations. The two smallest versions are specifically designed for mobile devices and are mainly for fully offline operation. This isn't really that rare, but more importantly, Google now wants mobile phones to run local models.
You may have come across a lot of content about the actual installation and testing of Gemma 4. However, most of the existing online tests are conducted on the latest iPhones or flagship phones. These flagships are the latest models, with top-notch performance and computing power, so it's reasonable that they perform well.
At this point, Xiaolei can't help but ask, If you use an ordinary Android phone that costs a few hundred to over a thousand yuan, with a mid-range processor and not top-notch computing power, can the local model still be used normally? How big is the gap compared to those flagship phones?
(Image source: Photographed by Lei Technology)
Digging deeper, is local AI destined to be an exclusive feature of flagship phones? We want to figure this out, so we directly took a thousand-yuan Android phone with a mid-range chip to test Gemma 4 and see how it performs.
A thousand-yuan phone running a local model is a complete "disaster"
The phone we used for this test is the vivo Y500 Pro, a typical thousand-yuan Android phone. Although it's not an old model, the overall performance of its SoC is still average. After all, its price is set like this, and there's really not much to say. It uses the MediaTek Dimensity 7400, manufactured by TSMC using a 4nm process. The CPU configuration consists of 4 large cores at 2.6GHz and 4 small cores at 2.0GHz, and the GPU is the Mali-G615 MC2.
This configuration is a normal performance in the thousand-yuan price range and works fine for daily use. However, when it comes to comparing computing power with current flagship chips, it's really not in the same league. In terms of AI, the Dimensity 7400 uses the MediaTek NPU 655, which the official says has a 15% improvement over the previous generation.
Google has released an app called Google AI Edge Gallery for the mobile version of Gemma 4. You can directly search for it in the app store. After downloading and opening it, select Gemma 4 E4B. After the model file is downloaded, you can use it directly, offline throughout the process, without the need to connect to the Internet or make any configurations. Google has really put a lot of effort into this installation experience. Without further ado, let's start the test.
(Image source: Drawn by Lei Technology)
For the first question, we asked a very practical question: Recommend three movies suitable for watching on a long-distance high-speed train and give reasons. Gemma 4's answer was Forrest Gump, Inception, and La La Land. The movie selections themselves are fine, as all three are quite classic, and the reasons for the recommendations are reasonable. However, the problem is that it gave an answer of nearly 500 words and additionally attached a "tip", like reminding you to bring headphones when watching movies on the train.
(Image source: Drawn by Lei Technology)
On the vivo Y500 Pro, it took a full 2.8 minutes to generate those 500 words. To be honest, Xiaolei found that the second half of the answer didn't really need to be read.
This is actually a common problem with small-parameter models. When answering, they often don't know when to stop and occasionally give some "suggestions" to fill up the word count. If you read carefully, you'll find that it can actually be summarized in just two or three sentences.
Next, we chose a relatively classic multi-step logical reasoning question: Five people are sitting in a row. A is not sitting on the leftmost side. B is sitting to the right of C. D is sitting to the left of E. E is not sitting on the rightmost side. Who is sitting in the middle? Although it was carefully listing the conditions and making permutations and combinations step by step, it couldn't give the correct answer in the end. Moreover, it took 3.3 minutes, and during this process, we couldn't put the app in the background and wait for the answer; we had to keep the screen on all the time. That means those 3.3 minutes were completely wasted.
(Image source: Drawn by Lei Technology)
Of course, we can't blame the Y500 Pro for its lack of performance. In fact, on the X300 Pro, a flagship model, we also couldn't get the correct answer to this question. However, the X300 Pro's answering speed was almost overwhelming. It gave the wrong answer in just 1.6 minutes. Even if it was wrong, it was at least decisive.
(Image source: Drawn by Lei Technology)
Similarly, Xiaolei also tried the super classic question that has stumped many large AI models: Should you drive or walk to get your car washed? Surprisingly, this time, the two phones had different thoughts under the same model. The Y500 Pro took 2.5 minutes and told us, "If you're going for a 'car wash', you should choose to walk." It's such a ridiculous answer.
(Image source: Drawn by Lei Technology)
The X300 Pro took a bit of a detour. It seemed to be repeatedly confirming whether the "car wash" action really required a car. But in the end, it still mentioned that if you're going to get your car washed, you should drive there.
After running these three questions, the overall impression that Gemma 4 E4B on the Y500 Pro left on us was that it was very slow and full of nonsense, but it didn't get very hot.
The slowness is the most intuitive feeling. On average, you have to wait two to three minutes for each question to get a complete answer. This speed is really uncomfortable in daily use. To be honest, no one would be willing to stare at the screen and wait for three minutes just to see an answer. However, there's a detail worth mentioning. The slowness isn't because the model isn't running; it's because the NPU computing power of the Dimensity 7400 is really limited. The number of tokens it can process per second is just that much, and no matter how hard it tries, it can only reach this speed.
There's also a relatively high error rate, but it's understandable. When the model is dealing with complex logic, it needs to "think" repeatedly in the intermediate steps. The more computing power it has, the more complete this process will be. On a thousand-yuan phone, this process may be forced to give a conclusion before it's even finished because the computing power is limited. If too many resources are allocated to the calculation, it will be difficult to continue later, so it's more likely to have hallucinations.
Gemma 4 E4B is a multi-modal model, so we also decided to let the Y500 Pro try out its image recognition effect. We first uploaded a photo of a night-time shopping mall and asked it what information was in the picture.
To be honest, its answer was okay. It described the building scale, roof structure, and night-time atmosphere. The direction was correct, but there was a very obvious problem. There was a huge Apple Store sign in the picture, but it didn't mention it at all and only said "a modern large-scale shopping mall". Brand recognition requires a high level of requirements for the model. It needs to match the visual information it sees with the brand knowledge behind it. Obviously, the parameter quantity of E4B isn't enough. It can see the outline but can't recognize what it is.
(Image source: Drawn by Lei Technology)
For the second picture, we casually took a photo of a green plant and asked what it was. Then it just kept spinning. For a full five minutes, there was no answer at all, only the constantly spinning loading animation. What's even more frustrating is that during this time, the entire app was completely unresponsive. We couldn't interrupt it and could only wait. In fact, this picture was just a very simple ground-inserted sprinkler for watering flowers, not a very rare device.
(Image source: Drawn by Lei Technology)
So, can the X300 Pro correctly recognize it? Actually, it can. The X300 Pro only took 32 seconds to answer this question that stumped the Y500 Pro. It's a pity that it couldn't accurately say what the device was and only guessed that it was a small sensor.
(Image source: Drawn by Lei Technology)
After running these three rounds of tests, Gemma 4 E4B on the Y500 Pro didn't perform as poorly as we expected. On the contrary, there were actually a few small surprises. For example, it didn't get very hot and wasn't very laggy, and it could still answer some simple questions correctly. However, the problem is that as a local model, its answering speed is really too slow. Currently, the permissions of the Google AI Edge Gallery are also insufficient. Apart from turning on and off the flashlight, it can't perform other system-level operations.
This is quite embarrassing. If it can only reach this level, with such a slow answering speed and a high error rate, why would users continue to use it? To put it bluntly, unless you're in a completely offline scenario, it's really not as good as an online large model.
Can ordinary phones really use local models?
From the previous tests, it seems that currently, Gemma 4 can only reach a "passing" standard on flagship phones. Although there are still cases of errors, at least the speed isn't bad, unlike the thousand-yuan phone, which is slow and inaccurate.
But looking back, what kind of game is Google playing with this app?
The Google AI Edge Gallery has a function called Mobile Actions, which can directly convert your natural language instructions into operations on the Android system. For example, "Help me create a lunch calendar event" or "Turn on the flashlight". After the model understands your intention, it directly calls the system tools to complete the task.
This path has actually already started on flagship phones. The Samsung Galaxy S25 series has launched a cross-application execution chain. With just one sentence, multiple apps can work together. For example, saying "Help me navigate to the place where I'm having a meeting tonight", the AI will automatically read the address from the calendar and directly pass it to the map. The whole process doesn't require you to copy and paste or manually switch. There's also the previously popular Doubao phone, which has even achieved "mobile phone autopilot".
However, there's a very important fact that needs to be clarified. Most of these automated operations aren't actually run by real local models. This is the case for Samsung, Apple, and even the Doubao phone.
(Image source: Doubao Phone Assistant)
In essence, the upper limit of the capabilities of local models is limited. The smaller the number of parameters, the fewer things it can do. And users' expectations for AI are getting higher and higher. Relying solely on local models can't meet those needs. So, the cloud has become a backup solution. Local models mainly undertake some lightweight and real-time tasks, such as notification summaries and voice recognition, which require high speed.
So, Google's app is more like a trial to introduce local models to mobile devices and gradually open up the functional permissions for automated phone operations. Then, it hopes to make as many devices as possible run these models and wait for the computing power of the chips to catch up. But when will chip manufacturers be willing to allocate truly sufficient AI computing power to phones in the thousand-yuan price