ASIC: The Great Savior

As AI continues to move closer to the physical world, physics-based ASICs are expected to provide a physical embodiment for AI in the future.

The growing demand for artificial intelligence (AI) has exposed a severe "computational crisis," characterized by unsustainable energy consumption, high training costs, and the approaching limits of traditional complementary metal-oxide-semiconductor (CMOS) scaling technology. "Physics-based application-specific integrated circuits (ASICs)" offer a transformative paradigm that directly harnesses inherent physical dynamics for computation, rather than expending resources to enforce idealized digital abstractions.

By relaxing the constraints required by traditional ASICs, such as enforced statelessness, unidirectionality, determinism, and synchronicity, these devices are designed to operate as precise realizations of physical processes, thereby achieving significant improvements in energy efficiency and computational throughput. This approach enables novel co-design strategies that align algorithmic requirements with the inherent computational primitives of physical systems.

Physics-based ASICs can accelerate critical AI applications, such as diffusion models, sampling, optimization, and neural network inference, as well as traditional computational workloads like materials and molecular science simulations. Ultimately, this vision points towards a future of heterogeneous, highly specialized computational platforms that can overcome current scaling bottlenecks and open new frontiers in computational power and efficiency.

Introduction: The Computational Crisis

Over the past decade, the rapid expansion of artificial intelligence (AI) applications has significantly increased the demand for computational infrastructure, exposing key limitations in the underlying hardware paradigm. The infrastructure supporting AI models was never designed to accommodate today's scale, complexity, or energy requirements. As a result, the current computational stack leads to a severe inefficient utilization of the inherent physical computational capabilities in current hardware systems.

Traditional scaling is facing multiple limits:

1. The energy demand of AI is increasing unsustainably, as shown in Figure 1(a). Data centers, which are at the core of AI operations, consumed approximately 200 terawatt-hours (TWh) of electricity in 2023. Projections indicate that this figure could increase to 260 TWh by 2026, accounting for about 6% of the total electricity demand in the United States.

Figure 1. Projected computational energy consumption and the supply-demand situation of computational power. Although the "computational crisis" has multiple aspects, two key aspects are: (a) the continuous increase in computational energy consumption; (b) the widening gap between the supply and demand of computational power (here exemplified by AI model training). In the past few years, both of these issues have been largely driven by the AI revolution. Figures (a) and (b) are adapted from references [3] and [4], respectively.

2. Computational costs are rising sharply, centralizing access. The development of cutting-edge AI models has led to a significant increase in training costs. It is estimated that by 2027, the cost of the largest-scale training runs will exceed $1 billion. This is naturally related to the supply-demand gap shown in Figure 1(b).

3. As transistor sizes shrink to the nanoscale, the long-standing scaling laws - Moore's Law and Dennard's Law - are reaching their limits. Miniaturization effects such as randomness, leakage current, and variability make it difficult to achieve reliable operation at these scales. We can no longer proportionally reduce the threshold voltage as we did in the past, resulting in higher power densities, which in turn lead to heating, limiting clock speeds and operating times.

These limitations not only impede performance improvements but also reveal deeper inefficiencies: today's general-purpose architectures fail to fully utilize the physical potential of the hardware itself. The abstraction layers designed to manage complexity have now become bottlenecks, especially in terms of energy efficiency and computational throughput. Without changing the computational paradigm, we face the risks of stagnant innovation, rising energy costs, and the potential concentration of AI capabilities in the hands of a few large companies and government agencies.

Physics-based application-specific integrated circuits (ASICs) offer a transformative approach by harnessing physical phenomena for computation, rather than suppressing them. By aligning hardware design with the intrinsic properties of physical systems, these ASICs can improve efficiency, reduce energy consumption, and make AI and computational resources more accessible.

What are Physics-based ASICs?

A. Motivation

If we want to improve computational efficiency (e.g., reduce energy consumption or shorten time), we can design more efficient algorithms for idealized general-purpose hardware, create faster or more efficient hardware (either general-purpose or specialized), or co-design algorithms and hardware with the aim of maximizing the effective computation obtained. Although there are many exceptions in the research fields of contemporary computer science and engineering, in the past fifty years or so, explicit efforts to improve computation have mainly focused on the first two approaches, namely general-purpose computational hardware and highly abstract software development strategies, which have enabled the continuous expansion of software applications and the modern digital economy.

However, more specialized hardware, such as GPUs, has still been a key driving force behind recent advancements in the computational field. The implicit algorithmic preferences of hardware have long been a guiding force for algorithmic success.

Is it a coincidence that the most popular algorithms in machine learning mainly involve matrix multiplication operations, and GPUs are particularly efficient in this operation? Of course not: these algorithms achieve an excellent match between software and hardware, allowing them to scale well and achieve better results than those that fail to effectively utilize GPUs. This general trend, where the co-optimization of algorithms is unconsciously guided by the characteristics of existing hardware, is known as the "hardware lottery [5]." The prominence of the hardware lottery indicates that the co-design of software and hardware is inevitable, whether consciously or unconsciously.

The idea of physics-based ASICs essentially transforms this mainly unintentional trend into a fully intentional and principled approach: it aims to deliberately co-design algorithms and hardware starting from the lowest physical level of available, scalable hardware infrastructure. Similar to how the intensive matrix multiplications in transformers are cleverly adapted to the preferences of GPUs, can we similarly design algorithms and electronic chips to leverage the deeper preferences in the physics of silicon electronic circuits (and thereby unlock greater scalability)?

Of course, this is not a free lunch: it will require the development of new algorithms and hardware that are different from those designed by most modern computer scientists and must take into account the details of each other. On the other hand, this path may enable us to utilize modern computational hardware more efficiently than we do today. How much can the efficiency be improved? It's hard to say, but we can get some clues by considering a related question, namely how abstraction affects the cost of digital analog circuits. For example, a physical device performing a simple CMOS NOT gate, when abstracted as a binary logic gate, performs one binary operation per clock cycle. However, if we instead simulate the transient (and analog) dynamics of the circuit that composes it, typical numerical methods (e.g., those used in SPICE) may require millions of floating-point operations. If we model each transistor in detail (as is often done during the design phase), we inevitably have to solve a system of partial differential equations in 3 + 1 dimensions, requiring billions or even trillions of floating-point operations (just for one clock cycle). Clearly, the physical level at which we abstract a physical system can affect how many digital logic gate operations it is equivalent to. However, this is only part of the challenge: just because simulating a physical system at a certain abstraction level is expensive does not necessarily mean that we can use the same physical system and abstraction to perform other interesting computations. This is the core challenge of physics-based ASICs: to design abstractions, algorithms, and hardware architectures that allow us to effectively and more fully utilize the physical computational capabilities provided by today's highly scalable electronic circuits by better respecting the physical laws of the underlying hardware.

B. Definition

Broadly speaking, physics-based ASICs are ASICs that rely on the natural physical dynamics of a system to perform non-trivial operations on data. This definition is somewhat vague; since all circuits follow physical laws, all computation is, in a sense, accomplished through the natural evolution of the computational system.

However, traditional ASIC design deliberately suppresses or abstracts away certain physical effects to achieve an idealized, symbolic computational model. By doing so, it relies on a set of approximations that allow the construction of complex systems from simple, idealized components.

One of the most important approximations is:

1. Statelessness: In traditional ASICs, there is usually a clear separation where memory and computation are handled by independent components in different locations. Components not responsible for storing information are assumed to have their outputs depend only on the current input, rather than on previous history. For example, a NOT gate should invert the current value of its input, regardless of past values.

2. Unidirectionality: The basic components of traditional ASICs are designed to propagate information in a single direction; they have specified input and output ports. For example, a NOT gate should respond to changes at the input end, but its output should not affect the input. For this reason, creating feedback loops in traditional ASICs requires explicitly connecting the output of a certain module to its input.

3. Determinism: Given the same input and initial conditions, the circuit is expected to produce the same output every time.

4. Synchronicity: Usually, the signals in different parts of traditional ASICs are synchronized with each other according to a centralized clock.

These properties cannot be physically realized in a strict sense: actual components exhibit memory effects, feedback, noise, and thermal fluctuations. Enforcing these ideal behaviors incurs costs in terms of energy, delay, or complexity, and these costs increase as the accuracy of the approximations improves.

Physics-based ASICs, on the other hand, are designed to operate without relying on these properties (or at least not on some of them). Unlike traditional ASICs, these devices are designed to leverage (or at least tolerate) statefulness, bidirectionality, non-determinism, and asynchronicity, as shown in Figure 2. Therefore, computation on physics-based ASICs is not an approximation of non-physical processes but the realization of physical processes.

Figure 2. Traditional ASICs vs. physics-based ASICs. As shown in the figure, traditional ASICs separate storage and computation and assume that computational components are stateless. A single logic gate transfers information in a unidirectional manner, with dedicated input and output ends. To build a feedback loop, the output must be explicitly connected back to the input. Physics-based ASICs may contain stateful computational components and have bidirectional information flow between couplings.

Due to the lack of the simplifying assumptions present in traditional ASICs, the behavior of physics-based ASICs is usually more complex and difficult to analyze. However, the circuit components in physics-based ASICs also have a wider range of possibilities when performing operations. Therefore, physics-based ASICs can usually perform significantly more computations with fewer components. For example, scalar multiplication in traditional ASICs may require dozens to hundreds of transistors, while in physics-based ASICs, it only requires a small number of components.

C. Platforms

Many existing unconventional computational paradigms can be regarded as examples of physics-based ASICs. Despite the great diversity among these different approaches, physics-based ASICs are distinguished from other physics-based platforms (e.g., computing with soap bubbles [6]) by their scalability. Scalability and manufacturability are key elements in this exciting new field. Now, we present some examples of these scalable platforms, some of which are shown in Figure 3.

Figure 3. Common building blocks of physics-based ASICs. Although not exhaustive, the figure shows several basic physical structures that can be used as building blocks for physics-based ASICs. For each component, the physical laws it follows can be mapped to some computational primitive operations.

As mentioned earlier, physics-based ASICs differ from traditional ASICs in that they relax certain requirements that are usually expected to be approximately met, including statelessness, unidirectionality, determinism, and synchronicity. In physics-based ASICs, we can roughly classify devices based on subsets of these requirements.

Several paradigms have been proposed in which the circuit components in ASICs are deliberately designed to be stateful, sometimes relying on long-term history. For example, circuits using memristors are a typical example, where their resistance depends on the amount of charge passing through them. Other components may also exhibit memory effects when used in analog circuits, thereby removing the assumption of statelessness.

Bidirectional coupling is common in ASICs implementing Ising machines (both digital and analog), as well as in analog devices designed to solve linear and nonlinear algebraic and (possibly stochastic) differential equation problems. Interactions between physical degrees of freedom are also used in platforms based on nonlinear photonics and self-adjusting resistor networks.

Since suppressing stateful behavior and bidirectional information flow requires dissipation, we can expect that higher energy efficiency may be achieved when these requirements are relaxed. If we take this idea to the extreme, reversible computation attempts to significantly reduce energy loss by avoiding any information erasure. Notably, quantum computation, as a subset of reversible computation, exhibits bidirectional information flow between interacting qubits.

In recent years, there has also been a growing interest in non-deterministic ASICs (both analog and digital). In the digital case, there has been extensive research on p-bits, which are binary variables undergoing continuous-time Markov processes (CTMC). Magnetic tunnel junctions (MTJs) exhibit bistable stochastic behavior in voltage and can be used as a source of analog or digital randomness. Similarly, thermodynamic computers use analog circuits with the stochastic dynamics of continuous variables (i.e., Brownian motion).

In some physics-based ASIC technologies, including p-bits, designs without a central clock are adopted, and different signals in a single device will change asynchronously. There are also some ASICs that utilize multi-synchronous clock designs, where instead of a single central clock, multiple local clocks are used, and these clocks are not fully synchronized.

D. Intuition for Performance Advantages

As mentioned earlier, traditional ASICs incur time and energy costs associated with ensuring that the requirements of statelessness, unidirectionality, determinism, and synchronicity are approximately met. Generally, these costs are usually worthwhile because they allow computational systems to be designed in a very modular way and can be used for various purposes. However, for specific types of problems, there are usually some algorithms or solutions that do not rely on these properties. In this case, it may be more advantageous to design an ASIC to solve that specific type of problem and relax the design constraints associated with ensuring statelessness, unidirectionality, and/or determinism.

From a practical perspective, this may manifest as an increase in the clock frequency beyond the range where stateless or deterministic behavior can be relied upon. Similarly, reducing the supply voltage can also result in non-deterministic behavior in exchange for lower power consumption. In fact, a common feature of physics-based ASICs is that they usually save power and energy costs by relaxing the above constraints.

Interestingly, we often observe that when the natural dynamics of a system are harnessed for computation, many operations can be fused into one operation. That is, we can see that, in a sense, the physical dynamics "automatically" perform part of the computation (e.g., solving linear algebraic or optimization problems). This provides some intuition for the potential sources of time and energy savings.

Although there is still much work to be done in scaling various physics-based ASIC approaches, there are already indications of the potential for significant advantages in terms of time and energy costs.

Design Strategies

A. Top-down vs. Bottom-up

Designing physics-based ASICs is challenging. A principled strategy usually involves considering the intersection between top-down and bottom-up perspectives, as shown in Figure 4. In the top-down approach, start with a key application A of broad interest or significant impact (e.g., generative AI for images or materials). Then map this application to the algorithm space, i.e., list a set of algorithms L(A) that could run the application (e.g., diffusion models, transformers, etc.).

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

ASIC, the great savior