HomeArticle

Optical chips, some opinions

半导体行业观察2026-01-07 11:34
AI energy consumption crisis: Silicon photonics technology is the key to sustainability.

In recent years, the rapid development of generative artificial intelligence has led to an unprecedented acceleration in the deployment of ultra-large-scale artificial intelligence clusters globally. As Moore's Law slows down, higher performance can only be achieved through parallel computing. Therefore, the improvement of data processing and/or transmission performance will inevitably lead to an increase in energy consumption. That is to say, the rapid growth of artificial intelligence infrastructure has brought about a serious energy crisis. As shown in Figure 1, with the exponential growth of data volume, the required energy supply will also increase exponentially. In this sense, the only effective way to solve this energy problem is to develop a technology that can separate energy growth from data growth.

Photonics has great potential because the propagation and interference of light waves do not consume energy. Therefore, scalable functions can be achieved through engineering design without increasing energy consumption. Silicon photonics has been widely developed in the past two decades and is now fully capable of providing an almost ideal platform to unleash its great potential. In fact, silicon photonics can provide efficient high-density interconnections, enabling high-bandwidth and long-distance links; it can achieve low-energy optical path switching without being limited by signal bandwidth; and it can perform photon neural networks with light-speed computing, thereby accelerating artificial intelligence computing.

In this article, we will review the development trends and progress of these photonic technologies. We will argue that in order for these photonic technologies to become an important part of the sustainable infrastructure in the era of artificial intelligence, hardware and software, as well as electronics and photonics, need to be developed in a complementary manner.

Optical Transceivers and Switches

A. Energy Consumption Scalability

Figure 2 plots the energy efficiency (in pJ/bit) of optical transceivers and application-specific integrated circuits (ASICs) for electrical switches commonly used in ultra-large-scale data centers over time. By comparing with the trend of optical transceivers, it is found that the scalability of switch ASICs is not as good as that of optical transceivers, indicating that the bottleneck lies in the switches rather than the transceivers. Surprisingly, the energy efficiency of optical transceivers has kept pace with Moore's Law, and the energy efficiency of near-package/co-package optics based on silicon photonics has exceeded 5 pJ/bit, while the improvement of the energy efficiency of switch ASICs is significantly slow.

In fact, the power consumption of ASIC switches increases with the increase of throughput. At a throughput of 100 Tbps, the power consumption of each chip will exceed 1000 W; while the power consumption of optical switches is extremely low and remains stable as the throughput increases (Figure 3). Therefore, the more optical switches are used to replace electrical switches, the higher the system efficiency will be. Some practical issues will be discussed below.

B. System Application Cases of Optical Switches

A key drawback of optical switches is their inability to perform packet processing, which is the core function of ASIC switches. Optical switches only operate as "optical circuit switches (OCS)", so they cannot simply replace ASIC switches. To control OCS, a control plane is required, and the orchestrator or operating system needs to know the state of OCS and send corresponding commands through the control plane to control the optical switches according to system requirements. This system is completely different from the traditional packet system that relies on application-specific integrated circuit switches (ASIC). Therefore, using an optical communication system (OCS) requires rebuilding the entire system from scratch and comprehensively optimizing the architecture. Obviously, at present, no company in the world except Google can do this. After Google announced that it has massively used OCS in its data centers and artificial intelligence infrastructure, optical switches began to be widely developed.

Long before Google launched the OCS system, the National Institute of Advanced Industrial Science and Technology (AIST) in Japan had already started researching and developing large-scale silicon photonic switches. Figure 4 shows the silicon photonic switch blade developed by AIST. This switch provides 32 x 32 strictly non-blocking connections and comes with a digital control interface. By configuring a 9-stage Clos network, it can be expanded to 131,072 x 131,072 connections. Experiments have proved that in a composable decoupled infrastructure, these switches can reduce network power consumption by 75%.

The manufacturing equipment used to manufacture these large-scale silicon photonic switches is an in-house experimental production line based on standard CMOS technology at the National Institute of Advanced Industrial Science and Technology (AIST). This technology uses a 45-nanometer process rule, achieving sufficiently high uniformity and yield to mass-produce large-scale photonic integrated circuits containing thousands of devices (such as Mach-Zehnder interferometers (MZI)).

Photonic Neural Networks

Silicon photonic devices based on standard CMOS manufacturing technology have high uniformity and high yield, which are crucial for realizing photonic neural networks (PNN). In PNN, a large number of Mach-Zehnder interferometers (MZI) are integrated to form a mesh topology and perform matrix-vector multiplications (MVM) in the optical domain. The MVM process on PNN is extremely fast and does not consume energy, which can significantly improve the computing power of artificial intelligence (AI). Therefore, people expect PNN to share the computing tasks of high-energy-consuming digital processors such as GPUs. However, PNN lacks a good nonlinear activation function, which is another important function in AI computing.

To solve this problem, we propose to use the electro-optic (EO) nonlinear effect to complete the AI computing process only through propagation without digital processing in the intermediate stage. This can be easily achieved using the Mach-Zehnder interferometer (MZI) device, which takes an electrical signal as input and outputs a modulated optical signal. The electro-optic nonlinearity has a sinusoidal transfer function, which is completely different from traditional activation functions (such as ReLU, Sigmoid, and hyperbolic tangent functions). Therefore, a new artificial intelligence model suitable for probabilistic neural networks (PNN) needs to be found.

A. Probabilistic Neural Networks Based on Photoelectric Nonlinearity

Currently, we have proposed and demonstrated several AI models based on photoelectric nonlinearity, as follows: The first model includes a nonlinear projection mapping from the input parameter space to a higher-dimensional space. The photoelectric transfer function is trained by adjusting the operating point of the Mach-Zehnder interferometer (MZI). The nonlinear mapping data in the transformed optical complex space can be separated by finding a hyperplane, similar to a support vector machine.

Figures 5(a) and 5(b) show the silicon photonic chip and experimental setup we developed, respectively. We used two algorithms, BFO (bacteria foraging optimization) and forward difference, to train on the chip. Figure 5(c) shows their effectiveness in classifying multiple Boolean logics, and Figure 5(d) shows their high-precision classification of the iris dataset. This PNN can complete the calculation only through the physical propagation of signals in the passive photonic circuit, thus ensuring low-power and low-latency computing.

The second model we discuss here is a cascaded version of the above model, namely the "vertically layered photoelectric probabilistic neural network" (as shown in Figure 6). In this model, the length of all optical paths does not increase with the increase of the number of layers, thus enabling a deeper learning model.

Figure 7 shows the test accuracy of the MNIST, Fashion, and KMNIST datasets. The accuracy of the three-layer model is better than that of the two-layer model. The last (but equally important) model we introduce here is the photoelectric Hopfield network.

Figure 8(a) shows the architecture we proposed, in which the Mach-Zehnder interferometer (MZI) acts as a nonlinear neuron, encoding the input data and feedback signal onto the input single-frequency continuous-wave (CW) light (denoted as λ). Figure 8(b) shows that after training, even for a half-damaged input pattern, the stored pattern can be recalled, indicating the associative memory effect unique to the Hopfield network.

B. General Scheme of Streaming PNN

Since running PNN requires non-negligible overhead, the entire system must be thoroughly evaluated and overall optimized. On the other hand, the inherent advantages of PNN are low latency, high speed, low energy consumption, etc. To fully utilize these advantages, PNN works best as a streaming processor with both electrical and optical domain I/O. The concept of streaming PNN is shown in Figure 9. Through this scheme, PNN can stream-process data in both the electrical and optical domains, thus being seamlessly integrated into the digital infrastructure.

Conclusion

Silicon photonic technology has made significant progress and now shows great potential in many aspects, which can improve the sustainability of artificial intelligence infrastructure from multiple aspects such as high-density I/O, bandwidth-independent circuit switching, and light-speed AI accelerators. However, it is not easy to introduce photonic functional devices (such as OCS and PNN) into the traditional digital infrastructure. Therefore, more in-depth research on the overall system design and implementation is needed in the future.

This article is from the WeChat official account "Semiconductor Industry Observation" (ID: icbank), author: AIST, published by 36Kr with authorization.