HomeArticle

Breaking free from GPU dependence, Nature publishes a review on "physical neural networks": enabling large-scale and efficient AI training and inference

学术头条2025-09-08 09:04
Faster, more energy-efficient, and more practical.

In recent years, AI has profoundly changed our lives through tools such as chatbots and has been applied in fields such as healthcare, meteorology, and materials design. This progress mainly relies on the computing power of GPUs and the growth of data scale. However, as the model scale continues to expand, the limitations of traditional digital GPUs become increasingly obvious. To break through this bottleneck, AI needs to reduce the latency and energy consumption of training and inference while ensuring accuracy and throughput.

A highly regarded research direction is “Physical Neural Networks (PNNs)”, which use physical systems such as light, electricity, and vibration for computation. It is expected to get rid of the dependence on traditional digital chips and achieve more efficient and large - scale AI training and inference.

Recently, a research team from the Swiss Federal Institute of Technology in Lausanne and its collaborators published a latest review in the authoritative scientific journal Nature, comprehensively reviewing the development of physical neural networks from a training perspective and exploring its universal methods from a “starting from scratch” perspective.

Paper link: https://www.nature.com/articles/s41586-025-09384-2

The research team said that “as long as there is sufficient research investment”, future physical neural networks may change the way of artificial intelligence (AI) computing.

Faster, more energy - efficient, and more practical

Physical neural networks are a type of neural - like networks that use analog physical systems for computation. They can use analog physical computation more directly, flexibly, and randomly than traditional computing hardware, and may change the feasibility and practicality of AI systems. Currently, they are divided into two categories:

  • Isomorphic PNNs: By designing hardware, they achieve a strict operational - level isomorphism with predefined mathematical transformations to perform mathematical transformations. A typical example is an electronic crossbar array, which is designed to directly perform matrix - vector multiplication. The conductance value of each cross - node in the array corresponds one - to - one with an element in the matrix to be multiplied.
  • Broken - isomorphism PNNs: These directly train the physical transformations of hardware. These physical transformations should be roughly similar to the mathematical operations in traditional neural networks but do not need to be associated in a precise one - to - one manner.

Figure | Physical neural networks

Although physical neural networks are still in the laboratory stage, they have shown great potential. They can utilize physical laws more directly. In theory, they are more energy - efficient and faster than traditional hardware and can ultimately be applied in data centers and edge computing scenarios. They can both drive the operation of large - scale generative models and assist local inference or intelligent sensors.

Regardless of the application scenario, neural networks need to be trained, but the specific constraints vary depending on the application field. The main training techniques include:

1. In silico training

The most direct way to train PNNs is to train them in a computer simulation environment. This method uses a digital twin model of PNNs to achieve weight gradient calculation and backpropagation operations. Digital twins are usually constructed in two ways: one is to directly describe the characteristics of PNNs, and the other is to use a data - driven method, that is, to obtain input - output sample data of PNNs and fit the digital twin model to these data. During training, gradients are calculated and parameters are updated in the digital world, and then the results are applied to the physical hardware.

2. Physics - aware backpropagation (PAT)

The physics - aware training method (PAT) reinforces a core concept: as long as an approximate prediction model is established for the physical system, gradient extraction can be reliably achieved. Its core mechanism is that the physical system performs forward propagation, while backpropagation is completed through a differential digital model. The key lies in the non - matching of forward and backpropagation. Similar to most training algorithms, it only requires that the estimated gradients generated by the digital model be approximately aligned with the real gradients. Compared with the strict condition of requiring a perfect digital model, this loose standard allows PAT to directly replace computer simulation training in most scenarios while retaining many advantages of in - situ training algorithms.

This method has been verified in optical, mechanical, and electronic systems. It can reduce the influence of physical noise and maintain the accuracy of backpropagation. The disadvantage is that when the physical parameters are updated slowly, the training will slow down.

3. Feedback alignment (FA/DFA)

In physical neural networks, weights are directly reflected in hardware components rather than traditional memories. Different from the simple computational operation of matrix transposition in digital systems, this transposition operation does not naturally exist in physical neural networks. Extracting or calculating the transpose usually requires more hardware modules or reconfiguration of physical structures to achieve weight transposition.

The two methods of feedback alignment (FA) and direct feedback alignment (DFA) allow physical neural networks to be trained without transferring forward - propagation weights to backpropagation, thus improving efficiency, but usually at the cost of performance. They still rely on the derivatives of activation functions and the activation states of each layer, and there is a problem of accuracy attenuation. The core advantage of FA is to use fixed random feedback weights and train by transmitting error signals layer by layer. DFA broadcasts error signals synchronously to all layers by using a fixed random feedback weight matrix, thus achieving efficient training of deep networks.

4. Physical local learning (PhyLL)

PhyLL learns through the cosine similarity between two passes of positive and negative sample data, eliminating the challenging layer normalization operation in physical implementation. This method has been experimentally verified in the three major physical neural network fields of acoustics, microwaves, and optics, achieving supervised and unsupervised training modes without the need to master the detailed characteristic parameters of the nonlinear physical layer.

5. Zeroth - order gradient and gradient - free training

This type of algorithm can be divided into two major categories: the first category is the perturbation method, which estimates the gradient by sampling the objective function (i.e., the loss function) at different coordinate points (weight values) and then optimizes the weights using the traditional gradient descent method; the second category of gradient - free methods uses a population - based sampling strategy. Instead of directly pursuing gradient approximation, it generates better candidate solutions iteratively. Genetic algorithms, evolutionary strategies, and swarm - based algorithms follow heuristic criteria, while reinforcement learning uses an iterative optimization candidate generation strategy.

6. Gradient descent training through physical dynamics

The gradient descent optimization algorithm is the core technology of current state - of - the - art machine learning systems. Researchers have proposed four physical training methods to achieve gradient descent without a digital twin.

  • Matrix - vector multiplication operations through linear reciprocal physical systems: The goal is to map traditional neural networks and backpropagation to analog hardware. The core idea is that the matrix - vector multiplication operations required for forward propagation (inference) and backpropagation (training) can be achieved through linear reciprocal physical systems.
  • Nonlinear computation based on linear wave scattering: This method encodes input data as untouchable physical parameters, while other parameters are optimized during training. Finally, the scattering response is output through a neuromorphic system. The gradient update is directly calculated based on the transmission signal between the output resonator and the update point.
  • Equilibrium propagation (EP): This method is applicable to energy - based systems. The input is provided as a boundary condition, and physical laws drive the system to reach the energy minimum (i.e., the equilibrium state) to generate a response (output). In the original formula of EP, weights are updated through a local contrast rule based on comparing two equilibrium states corresponding to different boundary conditions. Compared with other contrastive learning algorithms, the main advantage of EP is that it can calculate the weight gradients of any cost function.
  • Hamiltonian echo backpropagation (HEB): Based on extracting weight gradients, it directly uses physical dynamics principles to generate correct weight updates without any feedback mechanism. During the training process, in the forward propagation stage, the signal wave and the trainable parameter wave pass through the nonlinear medium together and interact. The error signal is superimposed on the signal wave, and the two waves pass through the medium again through time - reversal operation. After the backpropagation process, the trainable parameter wave will automatically update in the direction of the cost function gradient.

Figure | Training methods for physical neural networks. Each sub - figure shows the computational requirements and learning characteristics of different methods. By comparing three core indicators: (1) the ability to perform gradient descent on the cost function; (2) the amount of digital operations required; (3) the performance on large - scale data sets. The trained physical system is shown in light gray, and the fixed physical system is shown in dark gray. Forward and backward passes are represented by green and red arrows, respectively.

Commercial feasibility?

Large - scale AI models are indeed very large in physical size, but this does not mean that physical neural networks have no application prospects.

In fact, for this scale of computation, any hardware device inevitably requires a large physical space. This may reveal the most important scalability consideration for future large - scale physical neural network AI systems: if the physical neural network hardware is properly designed, its underlying physical characteristics may enable it to exhibit different energy scaling characteristics from digital electronic devices.

This means that when the model scale is large enough, compared with digital systems, analog hardware may have a higher efficiency advantage in the implementation scheme of physical neural networks, despite its many overhead costs.

Figure | Simulating large - scale models

It should be emphasized that the expansion of computing power does not only depend on hardware upgrades. The reason why the Transformers architecture has become the mainstream is not only due to its algorithm breakthrough but also because of the synergistic effect with scalable hardware. Looking forward to the development of ultra - large - scale physical neural networks, it may be limited by the adherence to existing algorithm frameworks. In the future, a new combination scheme of software and hardware collaboration must be constructed.

Considering the path dependence of infrastructure and the rapid progress of efficient digital large - scale models, for physical neural networks to be commercially viable, their energy efficiency must be thousands or even millions of times higher than that of digital electronic devices. To achieve this goal, it is necessary to design physical computers that can comprehensively address scale challenges, with the core of collaborative optimization of hardware and software, and the primary goal of efficiently exploiting physical computing power.

Future challenges

In addition to training issues, physical neural networks also face some prominent challenges that require in - depth research:

A severe challenge faced by physical neural networks is the noise in the computation process and its cumulative effect. Noise sources include internal random processes, manufacturing defects, and parameter drift. Although neural network computation has a higher tolerance for noise than traditional computation, when multiple types of noise coexist, how to maintain computational accuracy becomes a key bottleneck for practical applications. In addition, to minimize power consumption, physical neural networks often need to operate under conditions close to the level of internal noise, which further exacerbates the difficulty of maintaining accuracy.

Another major challenge is the adaptation problem between modern physical neural networks and analog physical hardware. Currently, most architectures have not been optimized for the natural operations that analog physical hardware is good at. Although broken - isomorphism physical neural networks provide a way to use the native transformations of physical systems for machine learning, researchers still need to conduct time - consuming case - by - case evaluations to determine whether the transformations of specific hardware are suitable for neural network computation.

In addition, the balance between neuromorphic and physical forms is the core challenge faced by physical neural networks. The optimization design and training algorithms for specific hardware, such as complementary metal - oxide - semiconductor (CMOS), electronic, or photonic physical neural networks, may have significant differences from the human brain in key characteristics. How to draw inspiration from neuromorphic while fully conforming to the physical characteristics of actual hardware is the key to solving the contradiction between the two.

In this study, the research team mainly focused on the inference problem of large - scale models, which is the most practical and potential application direction of physical neural networks. That is to say, neural networks driven by physical systems may not only have an advantage in energy consumption over traditional methods but also may achieve further improvements in computational scale and speed. Although physical neural networks are mostly studied in analog electronic or photonic systems, their greatest highlight is that the platform is almost unrestricted: as long as the physical system is reconfigurable, it can be used to build physical neural networks.

In terms of application, the challenge faced by physical neural networks is not to find the single “best” training method, but to select the most suitable scheme for different scenarios and understand the trade - offs between various methods. Future breakthroughs are likely to come from the development of training methods that are both general, efficient, and robust, enabling physical neural networks to truly enter practical application scenarios.

This article is from the WeChat public account “Academic Headlines” (ID: SciTouTiao), author: Academic Headlines, published by 36Kr with authorization.