HomeArticle

Google Sharing: The Next Step in Optical Switching

半导体行业观察2026-01-04 13:05
Discussion on Google's optical circuit switch technology, focusing on the performance of data centers and machine learning systems.

In this article, Google discusses the device technologies for future optical circuit switches, focusing on data center networks and machine learning supercomputers. Device parameters, including insertion loss, crosstalk, number of ports, reconfiguration time, and polarization sensitivity, can all affect the performance and reliability of the final system.

Introduction

Large-scale systems rely on networks to transfer information from the source to the destination through switches. Currently, most large-scale data networks are built around electrical packet switches (EPS) and a fixed Clos topology. Although such networks can support arbitrary communication patterns, they do not scale well in terms of key system metrics such as cost, latency, and reconfigurability. It is these known scalability limitations that have prompted early research to explore the use of optical circuit switches (OCS) to dynamically adjust the network topology to match the required communication patterns.

These early efforts have driven the actual deployment of optical circuit switches in large-scale data center networks and machine learning systems. These optical switches have become a key technology for achieving high-performance, cost-effective, and reconfigurable networks. This article will briefly introduce the existing commercial optical circuit switches and explore the development directions of device technologies that future switches may adopt.

Background

Digital electrical packet switches queue packets in a shared memory and make local routing decisions based on the information contained in the packet headers to forward the packets to the corresponding output ports. End-to-end connections typically consist of multi-hop paths through multiple switches. Local routing decisions are made on a per-packet basis, which may cause packets from the same source to the same destination to experience different transmission delays.

Optical circuit switches establish an end-to-end optical path or circuit between the input and output ports. Packets entering the switch always remain in the optical domain for transmission and are routed to the output ports according to a pre-set path, rather than making local routing decisions by reading the packet headers. Therefore, all packets will propagate along the same optical path and experience the same delay, which is an ideal characteristic for synchronous machine learning workloads. In addition, many optical circuit switches are rate-insensitive, so the same switch can be used across multiple generations of optical transceivers with different data rates.

These simplified characteristics require centralized control of optical switches. In large-scale OCS deployments, the effort to develop this control plane may even exceed the development work of the OCS hardware itself.

Future Optical Switching Technologies

Table I lists four key performance indicators for various device technologies used in commercial and research optical circuit switches (OCS). These indicators depend on whether the switching function is implemented based on space or wavelength, and whether the switching function is implemented in a three-dimensional structure in free space or a two-dimensional structure in a plane.

Table I: Key performance indicators for commercial and research optical circuit switches (OCS).

All devices used in existing commercial OCS are based on customized hardware and control schemes. Currently, there is no single switching device technology that can achieve the best performance in all application scenarios and across all performance indicators. Currently, optical switches designed for existing large-scale system application scenarios mainly focus on a large number of ports, low insertion loss, and low return loss.

Figure 1 shows an example of a device technology currently used in MEMS-based switches. MEMS mirrors are fabricated using a deep reactive ion etching process, which can produce large-diameter, flat, and highly reflective micromirrors. High-voltage signals control four comb drives around each mirror. These drives can rotate the mirror around two axes. Two sets of such devices can be used to construct a three-dimensional optical path from any input port to any output port.

Figure 1: Detailed view of a MEMS actuator with a mirror (shown in false color) and four comb drives for rotation around two axes.

The optical circuit switch based on the customized MEMS device shown in Figure 1 provides significant cost advantages in large-scale data center networks and improves system availability and performance when used in TPU supernodes.

New devices for three-dimensional free-space switching include (non-mechanical) two-dimensional digital liquid crystal (DLC) pixel arrays. This device uses polarization characteristics to digitally control the propagation direction of the light beam. A switch with 2^N ports can be constructed through a folded cascade structure consisting of N binary cascade units, as shown in Figure 2.

Figure 2: Schematic diagram of the prototype structure of a three-dimensional free-space optical switch using a liquid crystal pixel array.

Two-Dimensional Planar Switching Devices

Compared with three-dimensional free-space switches, most two-dimensional devices under research are based on a cross-matrix structure with N waveguides in each direction. A binary switching device is placed at each of the N² waveguide intersections to control the propagation direction of light at that intersection.

Many two-dimensional optical switches under research use some form of silicon photonics (SiP) technology, which is designed to be compatible with standard CMOS processes. A large amount of research work has been carried out on such devices, and various different switching drive mechanisms have been reported. The advantages of this approach are that planar optical switches based on SiP are expected to achieve lower per-port costs, faster switching speeds, easier integration with electronic systems, and potentially higher reliability compared to most commercial three-dimensional free-space switches due to the lower drive voltage.

To date, this advantage has not been realized in mass-produced systems. The current challenges include high losses during fiber coupling and switching, as well as a limited number of ports. Some of these drawbacks (such as insertion loss) apply to almost all two-dimensional switching architectures.

1) Interferometer-based devices: Two-dimensional planar switches based on interferometer devices have been widely studied. Such devices include Mach-Zehnder interferometers, which generate switching states through single-pass interference; and microring resonators, which generate switching states through multiple-pass interference in a ring resonator. In general, resonator-based switching devices can have lower drive voltages, but they have narrower bandwidths and are more difficult to control.

The drive mechanisms of both types of devices are based on changing the refractive index to produce constructive or destructive interference. Common methods include thermal tuning and the use of the electro-optic effect, where the refractive index changes with the applied electric field. The induced refractive index change is wavelength-dependent and affects the available bandwidth of the device. Thermal tuning is slow (microseconds compared to nanoseconds) and requires fine control to prevent thermal crosstalk between devices. Challenges faced by switches based on these two types of devices include reducing overall losses, the need for polarization-diverse designs, and increased signal crosstalk as the number of cascaded devices and switch ports increases.

2) Heterogeneously integrated devices: An emerging application scenario for optical switches is photonic quantum computing. Optical switches are used to generate initial computing resources and perform feed-forward operations between various stages of quantum computing. This dependence means that the overall computing speed is determined by the switching speed of the optical circuit switch. Photonic quantum computing also places extremely strict requirements on losses and crosstalk.

To address these challenges, high-speed optical switches based on heterogeneous integration are being studied. These devices integrate thin films of materials with strong electro-optic effects with silicon photonics fabricated using foundry processes. This integration method can achieve optical switches with lower drive voltages and faster speeds. Other heterogeneous integration processes based on micro-transfer printing are also under development. All challenges faced by interferometer-based devices also apply to these switches, and the problem of realizing a practical heterogeneous integration process needs to be solved.

3) Silicon photonics MEMS devices: MEMS devices can also be used in two-dimensional silicon photonics switches. Figure 3 shows the layout structure of such a device. Input and output fiber array units (FAU) are connected to a two-dimensional cross-matrix structure composed of waveguides. A MEMS-driven coupler is used at each waveguide intersection to direct light in one of two directions. Subsequently, this MEMS photonic integrated circuit (PIC) is integrated with a control CMOS chip.

Figure 3: Layout structure of a research silicon photonics MEMS switch.

Compared with analog MEMS devices used in free-space switches, binary MEMS couplers can be 1000 times faster and have shown a relatively large number of ports. The switching speed of a research device is shown in Figure 4. Challenges faced by such devices include achieving low-loss packaging for 2N fiber-waveguide connectors, which is also a common problem for most two-dimensional silicon photonics switches.

Figure 4: Rise/fall time of a silicon photonics MEMS switch.

4) Wavelength switching devices: Wavelength switching uses a combination of tunable lasers, passive arrayed waveguide devices (AWG), and tunable filters. Compared with other device technologies, tunable lasers are usually more expensive and consume more power, while passive optical devices may have higher losses and operate in a fixed wavelength band. These characteristics limit the number of ports and the available bandwidth per port.

Conclusion

As optical circuit switching technology becomes commercialized, research activities around future optical switch device technologies are rapidly increasing. As the application scenarios of optical switches continue to expand, it is expected that some of the device technologies in the research stage will be introduced into future computing and network systems and achieve mass production applications.

Appendix: The Origin of Google's OCS

Actually, starting a few years ago, Google has been quietly transforming its data centers, replacing its network infrastructure with a radical in-house approach, which has long been a dream of the network community.

This project is called the "Apollo Project," and its core is to replace electrons with light and replace traditional network switches with optical circuit switches (OCS). At the end of 2023, the head of Google's system and service infrastructure team explained in an interview with foreign media why this project is so important.

Keep Data in the "Light"

There is a fundamental challenge in data center communication, which is inefficiency, stemming from its characteristic of spanning two worlds. Data processing is performed on electronic devices, so information at the server level is stored in the electronic domain. However, it is faster and more convenient to transmit information in the optical domain (i.e., the optical field).

In traditional network topologies, signals are converted back and forth between electrical and optical signals. "It has always been done hop by hop, converting back to an electrical signal and then outputting it as an optical signal, and so on. Most of the work remains in the electrical signal transmission stage," said Vahdat. "This is very costly in terms of both cost and energy consumption."

Through OCS technology, the company "keeps data in the optical domain for as long as possible," using micro-mirrors to redirect light beams from the source and sending them directly to the target port as an optical cross-connect.

"The application of this technology reduces communication latency because there is no need to transfer data so frequently within the data center now," Google said. "It eliminates the power switching stage - which has been the core part of most data centers, including our own." Google further supports

The traditional "Clos" architecture common in other data centers relies on a backbone composed of electronic packet switches (EPS), which are based on silicon chips from companies such as Broadcom and Marvell, and connected to "leaf" or top-of-rack switches.

The EPS system is expensive and consumes a significant amount of power. When signals are transmitted in electronic form, they require high-latency per-packet processing and then need to be converted back to optical signals for subsequent transmission.

Google said that OCS requires less power: "With these systems, the power consumed by these devices is basically only the power required to maintain the position of the mirrors. Since these mirrors are very small, the required power is very low."

Light enters the "Apollo Project" switch through a fiber bundle and is reflected by multiple silicon chips, each containing a micro-mirror array. These mirrors are three-dimensional microelectromechanical systems (MEMS) that can be quickly and individually realigned, allowing each optical signal to be immediately redirected to a different fiber in the output fiber bundle.

Each array contains 176 micro-mirrors, but only 136 are used for yield reasons. "These mirrors are all customized and slightly different from each other. Therefore, it means that the sum of all possible input-output combinations is 136 squared," he said.

This means there are 18,496 possible combinations between two mirror components.

The maximum power consumption of the entire system is 108 watts (and usually much lower), which is far lower than the approximately 3000 watts that a similar EPS can reach.

Over the past few years, Google has deployed thousands of such OCS systems. Google believes that this is the largest-scale OCS application globally, and the advantages are quite obvious. "We have been investing in this area for some time," Google said.

Customize on Your Own

The development of the entire system requires many customized components and customized production equipment.

The production of the Palomar optical control system (OCS) means developing customized testers, alignment, and assembly workstations for MEMS mirrors, fiber collimators, optical cores and their components, and the entire OCS product. In addition, a customized automated alignment tool has been developed, which can place each two-dimensional lens array in position with sub-micron accuracy.

"We also manufactured transceivers and circulators," Google said. The latter can help light pass through different ports in one direction. "Did we invent the circulator? No, but is it a customized component that we designed, manufactured, and deployed on a large scale? Yes."

He added: "These optical circulators incorporate some very cool technologies that can reduce the number of fibers by half compared to any previous technology."

As for the transceivers used to send and receive optical signals in the data center, Google has jointly designed low-cost wavelength-division multiplexing transceivers that span four generations of optical interconnection speeds (40, 100, 200, 400 GbE) by combining the development of high-speed optics, electronics, and signal processing technologies.

"We invented transceivers with appropriate power and loss characteristics because one of the challenges faced by this technology is that we now introduce insertion loss in the path between two electrical switches."

Now, fiber channels are replaced by optical circuit switches, and light loses some of its intensity due to reflection when passing through the device. "We must design transceivers that can balance cost, power consumption, and format requirements to ensure that they can withstand moderate insertion losses," said Vahdat.

"We believe we have one of the most energy-efficient transceivers on the market. This really encourages us to ensure that we can conduct engineering design from start to finish to fully utilize this technology."

Part