Observations from a senior CPU architect
With the advancement of process technology, the potential for performance and transistor density improvement is increasingly limited by power consumption and heat dissipation. Although innovations in materials, interconnects, and device structures remain crucial, they must now be closely integrated with architectural strategies to fully achieve system-level efficiency. Meanwhile, the explosive growth of artificial intelligence computing requirements has exceeded the traditional scaling curve, intensifying the pressure on architecture and process technology to achieve unprecedented performance under strict power and heat dissipation constraints.
This paper focuses on how the co-design of microarchitecture and process technology can address the growing challenges of heat density, power consumption, and performance requirements. It also urges process researchers to consider the impact of architecture in their scaling roadmaps.
Introduction
Moore's Law is not dead, but it is undergoing profound changes. Driven by new research in many fields such as atomic-level material engineering, conductive metal layers, three-dimensional transistor layers, backside power delivery, and new high-density three-dimensional packaging, the transistor size continues to shrink. However, the traditional advantages of size reduction are increasingly challenged by power density and heat dissipation limitations. As the transistor size shrinks and three-dimensional structures become more prevalent, the integration level increases, and the performance bottleneck has shifted: today's systems are no longer limited by the switching speed or quantity of transistors but increasingly rely on their ability to effectively manage energy and heat dissipation.
Meanwhile, the explosive growth of artificial intelligence workloads - characterized by massive models, intensive training processes, and low-latency inference - has increased the computing requirements by an order of magnitude, further exacerbating the power consumption and heat dissipation pressure on data centers and edge devices.
In this new era, microarchitecture innovation is no longer a secondary optimization; it must develop in tandem with process technology. Power supply, heat dissipation management, and computing efficiency must be considered holistically at the device and system stack levels. This paper presents a collaborative perspective: how the evolving requirements of microarchitecture can guide the development of process technology, and how process breakthroughs can be fully considered at the architectural level to translate into actual performance improvements.
Heat Density
A. Higher integration amplifies heat density.
Heat density is defined as the power per unit area, and the rapid reduction of area amplifies heat density. Smaller feature sizes and higher integration levels can improve performance, but they also increase local heat generation. Fred Pollack pointed out in his keynote speech at MICRO32 in 1999 (Figure 1) that the power density had exceeded that of hot plates and was expected to reach the level of nuclear reactors.
Debbie Marr showed in her keynote speech at MICRO56 in 2024 (Figure 2) that the power of Intel's core processors has now exceeded this level. Although the statement about the power density of nuclear reactors is often controversial, there is no doubt that today's silicon chips can reach critical temperatures in a very short time.
The silicon chips heat up from a safe temperature to a critical temperature so quickly that heat sensors and heat dissipation measures must be considered from the very beginning. The heat dissipation challenges that were once only encountered in high-performance systems now also affect mainstream and mobile devices.
B. Limitations of traditional heat dissipation management
Traditional heat dissipation strategies such as heat sinks and fans are no longer applicable. Although liquid cooling, vapor chambers, and new phase-change materials are helpful, these heat dissipation solutions have limitations in terms of cost, reliability, and size. Therefore, microarchitecture and chip layout have become the primary tools for heat dissipation management.
C. Architectural strategies for heat dissipation management
Microarchitects now use a variety of techniques to disperse and avoid hot spots. These techniques include:
- Thermal-aware layout planning: Place low-activity logic near high-activity modules to achieve heat diffusion.
- Hot spot mitigation through replication: Duplicate critical heat-generating logic and rotate activities for local cooling.
- Sensor-driven control: Embed temperature sensors to dynamically and quickly adjust workloads and voltage/frequency settings.
- Utilize area for heat dissipation: Instead of just minimizing the area, use the area to spatially disperse power and reduce peak temperatures.
Energy-Efficient Performance
A. Performance vs. Power Consumption: Voltage Regulation Diagram
Figure 2 shows the performance vs. power consumption curve of CPU design.
Among them, performance and power consumption are controlled by the following relationship:
Here, IPC refers to the average number of instructions per cycle, or the rate at which the CPU core executes instructions. C is the average dynamic capacitance required for the transistors in the design to switch during program execution, and V is the voltage applied to the transistors. The voltage scaling region of the curve is where most CPU cores execute instructions. Voltage scaling shows how performance improves with increasing voltage (due to higher frequency), but power consumption increases exponentially, highlighting the need for process technologies to reduce leakage and capacitance. Figure 4 shows the thermal image of the chip, which shows the hot and cold spots.
B. Process Technology Advancements
As shown in Figure 5, process technology advancements have enabled higher performance at constant power consumption (e.g., through faster transistors and reduced capacitance) and lower power consumption at constant performance (e.g., through low-leakage materials and stacked devices).
However, aggressive size reduction may exacerbate heat density, so architectural countermeasures are needed. Process researchers must recognize that material and layout innovations that can improve thermal conductivity and support non-uniform voltage domains are key drivers for next-generation architectures.
C. Microarchitecture Performance Characteristics
As shown in Figure 6, adding microarchitecture performance features can achieve higher performance, such as larger structures or more laminated structures. But usually, increasing capacitance also improves performance.
As shown in Figure 7, by simplifying the microarchitecture (smaller structure size, less speculation), the area can be reduced, thereby reducing the target frequency, and further reducing capacitance and leakage (if capacitance and leakage are crucial in the overall system design).
Combining high-performance and low-power CPU cores is an effective way to achieve the required performance and optimize the overall system power consumption.
System-Level Scaling
A. Amdahl's Law and Multiprocessor Scalability
Figure 8 shows the limitation of Amdahl's Law on the scalability of multiprocessor performance. Parallel programs usually contain serial and parallel execution regions. Amdahl's Law states that the performance of a parallel program will asymptotically approach a limit determined by the serial part of the program.
Figure 9 shows the limitations of multiprocessor scalability due to shared hardware and software resources (such as locks, caches, memory, network latency, and bandwidth). Although the process node allows each chip to accommodate more cores, Amdahl's Law and multiprocessor scalability limit the performance that can be achieved by actual workloads. In fact, as shown in Figure 9, the multiprocessor scalability rarely exceeds 0.97 for integer workloads and 0.90 for floating-point workloads.
Another key consideration is the utilization of the number of active cores under typical workload conditions. In fact, when measuring various workloads, the distribution of workloads across cores is usually as shown in Figure 10, where the most common situation is that only one core is active. The next is when all cores are active, followed by 2, 3, and so on.
B. Impact on Processor Design
Power consumption and bandwidth are shared by the number of active cores, which may change dynamically. This will affect the number, type, and microarchitecture optimization applied to each type of core. The thermal and power constraints and solutions described in Section 2 can also be applied to the entire system to optimize for various workload scenarios.
Key process research directions align with architectural requirements
To support architectural goals, the following process research areas are crucial:
- Low-leakage, low-capacitance materials: Support frequency scaling while avoiding runaway power consumption.
- Thermal-aware 3D integration: Manage vertical heat flow in stacked chips.
- Fine-grained power gating: Enable power consumption control for each module.
- On-chip thermal sensors: Enable real-time architectural thermal management.
- Heterogeneous integration: Support high-performance and high-efficiency cores on the same chip.
The process and architecture teams used to work independently in their respective design phases. Today, feedback loops are crucial: architectural thermal maps must guide device layout and packaging; process limitations must guide architectural layout planning and performance goals. Collaborative optimization enables more informed trade-offs and faster path planning.
Conclusion
Advanced semiconductor process technologies are unlocking excellent performance - but without architectural awareness, their advantages will be limited by power consumption and heat dissipation. A new co-design model of architecture and process must emerge. Next-generation computing requires not only smaller transistors but also smarter systems. By treating energy efficiency and heat dissipation constraints as a shared responsibility, we can extend the trajectory of Moore's Law into a sustainable, high-performance future.
This article is from the WeChat official account "Semiconductor Industry Observation" (ID: icbank), author: Debbie Marr, published by 36Kr with permission.