Analysis of Arm's new IP: Power consumption deserves attention, and the future of small cores is worrying
In recent days, Arm held the Arm UNLOCKED Summit in Shanghai and officially launched the Arm Lumex Compute Subsystem (CSS) for mobile devices.
What is the Lumex CSS? In the simplest terms, it is actually a “marketing concept” of Arm today. The Lumex CSS includes at least Arm's brand - new C1 series CPU architecture design, G1 series GPU architecture design, C1 - DSU inter - core bus design, and some other peripheral system IP designs, all “packaged” together.
It should be noted that the Lumex CSS is not equivalent to a complete SoC architecture license, because it does not include many essential components of common mobile platforms such as NPUs, basebands, power management, and ISPs. This is why when the concept of “Arm CSS” first caught the outside world's attention this year, many people refuted the rumor that “even if you buy a complete Arm CSS, you can't directly make a mobile phone SoC”.
Of course, judging from some information circulating on the Internet, existing leading manufacturers basically conduct secondary development based on Arm's architecture and even instruction set licenses. So, in essence, the significance of Lumex as the “official public version” is still the same as before, mainly targeting small and medium - sized chip manufacturers with insufficient self - research capabilities.
However, this does not mean that the announced Lumex CSS is meaningless to friends who are concerned about the mobile phone industry and curious about the next - generation mobile device platforms. Because by delving into its content, you can still find many interesting things worth talking about.
The product naming logic has been greatly changed, but the actual effect remains to be seen
First of all, as mentioned earlier, the Arm Lumex CSS includes brand - new CPU and GPU IPs this time. Moreover, different from Arm's past practice of “only updating the big cores without replacing the small cores”, this time they not only replaced all the CPU and GPU architectures but also made significant adjustments to the entire product naming system.
Specifically, Arm released four new CPU IPs, namely C1 - Ultra, C1 - Premium, C1 - Pro, and C1 - Nano.
In terms of positioning, C1 - Ultra represents the “super - big core” in flagship SoCs, which is to replace the current Cortex - X925.
C1 - Premium represents the “second - tier flagship big core”. Arm explains that its architecture is the same as that of the super - big core, but the cache is smaller, and it is manufactured using a density library (that is, the clock frequency is lower, but the energy consumption is also lower). In other words, it is actually equivalent to “legitimizing” the Cortex - X4m that MediaTek used before and officially establishing it as an independent CPU product line.
The remaining C1 - Pro and C1 - Nano are easy to understand. They inherit the product positioning of Cortex - A725 and Cortex - A520 respectively.
Meanwhile, Arm also released the GPU IP family named “Mali G1”. There is no architectural difference between its most entry - level version and the highest - end model. They are only distinguished by the number of shader cores. The number of cores can be customized from 1 core to 24 cores and is divided into at least three levels: G1 - Ultra, G1 - Premium, and G1 - Pro according to the number of cores.
However, Arm neither mentioned the specific differences in the number of core signs between each version nor explained whether there is an entry - level “G1” or “G1 - Nano”.
The performance of the new IPs has generally improved, but power consumption is worth noting
In fact, in terms of the naming logic, Arm's new product system in this round obviously wants to express their “brand - new” and “different from the past” identities. On the other hand, compared with the previous three - digit naming method, the new IP naming rule obviously wants to make the “generation distinction” more obvious after future replacements. For example, in the future, G2 and G3, people can quickly know that they are newer than the current G1. It is indeed easier to distinguish the “new - old” relationship than the current naming like X925, A710, and A520.
However, Arm's current naming method also brings a problem, that is, it is more difficult for consumers to intuitively judge the core performance level among the same generation. Yes, C1 - Ultra is definitely much faster than C1 - Nano, but it is more inconvenient to judge “how much faster”.
Moreover, even when comparing with the previous - generation products, Arm did not fully clarify the performance improvement range of the new IPs.
For example, they claim that compared with the previous - generation “super - big core” Cortex - X925, the IPC (single - clock - cycle performance) of C1 - Ultra has increased by 12%, and the micro - architecture performance has increased by 26%.
However, Arm did not give the specific data on the performance improvement range of C1 - Premium compared with the previous - generation “second - tier flagship super - big core”. Although this may be because the Cortex - X925 itself does not have a second - tier flagship variant, and to make a comparison, you can only find the X4m from the generation before last. It is also possible that the improvement of the new second - tier flagship architecture is not that significant, and the comparison data is not good - looking.
In contrast, they gave specific data on the performance improvement range of C1 - Pro. According to Arm, compared with Cortex - A725, C1 - Pro's power consumption is 26% lower at the same performance level; if the power consumption is the same, C1 - Pro's performance can be 11% higher than that of Cortex - A725. At the same time, if both are set to the same clock frequency, C1 - Pro's performance can be up to 16% higher than that of Cortex - A725.
From these three sets of data, we can know that under the premise of the same clock frequency, the power consumption of the new C1 - Pro big core is actually slightly higher than that of Cortex - A725, but because the performance improvement is greater, the energy - efficiency ratio still increases.
Finally, there is the “small core” C1 - Nano. Arm gave relatively detailed performance data for it, but unfortunately, from these data, the improvement of C1 - Nano itself is not focused on performance. Instead, it is more reflected in the reduction of area, the decrease of power consumption, and the support for the latest instruction set.
According to Arm, in SPECint2017, the comprehensive score of C1 - Nano has increased by about 5.5% compared with Cortex - A520. Under the premise of processing the same program, its energy efficiency has increased by 26%.
Finally, there is the Mali G1 - Ultra GPU. Arm said that its performance in benchmarking software and games has increased by 20% compared with the previous generation (G925), and the power consumption per frame has decreased by 9%. After calculation, it can be known that its overall power consumption has actually increased, with an increase of about 9.2%. Fortunately, the ray - tracing performance of G1 - Ultra can reach twice that of the previous generation. For future “ray - tracing - intensive” mobile games, it is still expected to bring a frame - rate increase far exceeding 20%.
The new flagship mobile phones are destined to be faster, but the future of the entry - level models is uncertain
After talking so much, what does this batch of new architectures from Arm mean for the upcoming new - generation smartphones and SoCs?
First of all, it must be noted that whether in Arm's plan or the current rumors, the new flagship mobile platforms will not use the C1 - Nano “small core”. According to Arm, the top - level flagships may use two C1 - Ultra cores paired with six C1 - Pro cores. We even do not rule out the possibility that manufacturers reduce the number of “medium cores (C1 - Pro)” and add more “second - tier big cores (C1 - Premium)” to achieve higher benchmark scores.
Based on the previous analysis and the industry trends in recent years, unless TSMC's N3P process “performs miraculously” again, on the premise that manufacturers will probably further increase the peak clock frequency of flagship SoCs, the peak power consumption of the new flagship platforms based on the new CPU and new GPU is likely to increase further rather than decrease.
Of course, there is no need to be nervous. Because the IPC of the new architecture has indeed increased, which means that in scenarios other than benchmarking, including heavy - load games, as long as there is no new “performance killer”, the actual operating frequency of the new flagships is destined to be lower than the current platforms. As a result, the energy efficiency in daily use will definitely be significantly improved, and even the power consumption of heavy - load games may be further reduced.
In contrast, what is more worrying is those low - power devices simply based on C1 - Nano. Although current evidence shows that the new architecture has objectively improved compared with Cortex - A520, on the one hand, its performance improvement range is obviously much smaller than that of other “big cores”. On the other hand, as more and more flagship and second - tier flagship platforms “abandon” the small CPU cores, software developers, chip design manufacturers, and even future Arm may lose more and more “motivation” to improve the small cores.
You should know that in Apple's Apple Watch, the CPU architecture driving the watch has long been the “Sawtooth” solution derived from the energy - efficient core of A16, which is a scaled - down “medium core (with a positioning closer to A725 or this C1 - Pro)”. For mainstream consumer electronics products, the exit of pure low - power “small cores” may only be a matter of time.
This article is from the WeChat official account “3eLife” (ID: IT - 3eLife), author: 3eLife Jun. It is published by 36Kr with authorization.