The Age of AI Hyper-Connectivity: Is AI Speeding Toward "Light"?
Since ChatGPT emerged at the end of 2022, from computing power (GPU), storage capacity (storage), to command and dispatch power (CPU)… AI has driven one semiconductor super - industry opportunity after another and created one trillion - dollar market - cap company after another.
If there is still a segment in AI infrastructure waiting for a "potential trillion - dollar market - cap giant" to emerge, Dolphin Research is most optimistic about the super - connection in the AI era. If computing power solves the "intelligence" problem of AI and storage capacity solves the "memory" problem of AI, then the transportation capacity aims to solve how to enable long - and short - term memories to enter and exit the brain center at "rocket - like speed".
Or, borrowing the words of Huang Renxun, the "Pope of AI", as the bottlenecks of computing power and memory are gradually alleviated, energy has become a long - term and difficult challenge. The next core bottleneck in the AI era is the high - speed interconnection of the network. Because the network infrastructure in the traditional cloud era cannot meet the transmission requirements of network bandwidth under the Agentic AI era, with trillions of model parameters, Mixture of Experts (MoE), and local activation.
In this article, we will explore the network transmission in the AI era through the photoelectric transmission technology direction - CPO, which is gradually emerging in AI network transmission speed. Dolphin Research's research on CPO is divided into:
1. What is CPO? Can it really replace traditional copper connections?
2. Can it completely replace the current mainstream pluggable optical modules?
3. Under this trend, how will the competitive landscape of upstream and downstream companies in the industry change?
In this article, we will first sort out the basic issues of the industrial chain.
The following is a detailed analysis
01 What is CPO?
In the traditional data center architecture, there is an important component called the "optical module". Its function is to convert the optical signal transmitted by light into an electrical signal and transmit it to the data center, or convert the electrical signal generated in the data center into an optical signal and transmit it to the optical fiber. It plays the role of a "bridge" and a "translator" in data transmission.
In terms of function, the CPO (Co - Packaged Optics) architecture includes the functions of traditional optical modules, but there are two obvious differences:
1. Different structures
Traditional optical modules are pluggable. They look like the crystal heads on the network cable ports at home. However, CPO is completely different. It integrates the optical engine responsible for photoelectric conversion and the chip (mainly the ASIC chip of the switch) directly on the same packaging substrate or interposer.
2. Different application scenarios
Optical modules are usually used between cabinets (i.e., Scale - out); while CPO can be used both between cabinets and inside cabinets (Scale - up). When used between cabinets, it replaces traditional optical modules; when used inside cabinets, it replaces the currently mainstream copper connections.
Figure: Schematic diagram of the traditional pluggable mode and the CPO solution
Source: GTC 2025, Dolphin Research
We can see that recently, both NVIDIA and Broadcom are actively promoting their CPO switch solutions.
So why is CPO technology so highly regarded? As the demand for computing power in data centers continues to increase, the demand for data transmission bandwidth in data centers has also increased explosively. Moreover, data centers are developing towards ultra - large - scale computing clusters. In this process, the old traditional data transmission technologies will pose many obstacles:
1. Bandwidth bottleneck
In the scenario between cabinets, due to the limited space on the traditional switch panel and the difficulty in reducing the size of traditional pluggable optical modules, the number of ports that a single switch can provide is limited, making it unable to support the increasing bandwidth requirements.
Currently, the highest single - module bandwidth of pluggable modules can reach 1.6Tbps, and the maximum bandwidth that a single switch panel can support is 51.2Tbps. In the future, 3.2Tbps modules may be introduced, and the highest bandwidth that a switch can support is 102.4Tbps, which is almost the limit of pluggable optical modules.
2. Signal integrity bottleneck
In the scenario inside the cabinet, as the transmission rate increases, if traditional copper cables are used, the electrical signal will face serious signal attenuation and distortion during long - distance transmission, and the transmission distance will also be increasingly limited.
Currently, the highest bandwidth that copper cables can support is 1.8TB/s (such as NVIDIA's NVLink copper cable), and the distance is strictly limited to within 2 meters. However, the bandwidth demand of a single GPU is approaching 3.6TB/s.
3. Heat dissipation and power consumption bottleneck
As the transmission rate increases, the power consumption of traditional communication links has increased significantly, and heat dissipation has also become increasingly difficult. We know that the construction of data centers in the United States is facing great energy obstacles, so the power consumption problem will bring significant cost pressure.
In theory, CPO can better solve the above problems. According to NVIDIA, after applying CPO, the power efficiency can be increased by 3.5 times.
02 Specifically, what are the data transmission scenarios in data centers?
Here, we break down the data transmission technology routes in different scenarios and links in data centers:
Figure: Examples of Scale - out and Scale - up
Source: NADDOD, Dolphin Research
1. Scale - up, mainly involving interconnection inside the cabinet
It mainly involves the interconnection of hardware inside the cabinet, especially inside the server, including but not limited to the interconnection between CPU, GPU, network card, DDR memory, and hard disk.
Currently, copper is the main connection medium for this part of the connection, including the PCle slots and memory slots (PCB copper traces) used to connect the CPU, GPU, and network card, and various copper cables such as SATA cables. CPO may subvert the current mainstream solution.
2. Scale - out, mainly involving interconnection between cabinets
It mainly involves the interconnection between cabinets, servers, and switches.
For this part of the connection, light needs to be used as the connection medium. Currently, optical fibers and pluggable optical modules are the main solutions. Similarly, CPO is an important development trend and is progressing faster than the scenario inside the cabinet.
3. Furthermore, there is also the interconnection between data centers and between data centers and the outside world. This part is not the focus of this article.
From the layout of industry giants, at this stage, CPO is mainly targeted at the scenario between cabinets, but it may be targeted at the scenario inside the cabinet in the future.
03 CPO is still in the initial promotion stage. What are the main bottlenecks it faces?
1. Maturity of advanced packaging technology
From the perspective of underlying technology, CPO is completely different from traditional solutions such as pluggable optical modules. The production technology of traditional optoelectronic components is not very different from that of optoelectronic components and modules in a broad sense. However, CPO needs to package the optical engine onto the substrate or interposer, mainly relying on advanced packaging technologies such as CoWoS.
Meanwhile, compared with the advanced packaging we usually understand, CPO is also different because it not only needs to integrate electronic integrated circuits but also integrate photonic integrated circuits. This heterogeneous integration requires hybrid bonding through technologies such as TSMC's COUPE technology.
The problem is that, on the one hand, the above - mentioned advanced packaging technology is extremely difficult in terms of process. Both NVIDIA and Broadcom rely on TSMC's production capacity, but the production capacity is limited. In addition, there may also be obstacles in the supply of materials such as the required optocouplers, equipment, hybrid bonding equipment, testing equipment, and ABF substrates;
Moreover, at this stage, the production yield of the above - mentioned advanced packaging technology, especially heterogeneous integration, still has a lot of room for improvement, resulting in a cost much higher than that of the pluggable solution. Currently, TSMC is working hard to improve the yield of advanced packaging, but it still takes some time.
2. Maintenance and repair issues
For traditional pluggable solutions, since they are "pluggable", maintenance and repair are very convenient. However, CPO is completely different. Its optoelectronic module is directly packaged with the substrate, interposer, and even the chip, so the difficulty of maintenance and repair is significantly greater than that of traditional solutions.
However, the above problems can be solved. For example, increasing a certain fault - tolerance rate in the design or arranging a certain amount of redundancy at the operational level.
3. Thermal management issues
The high - density packaging of the optical engine and the chip will cause a significant local temperature rise during operation, even exceeding the tolerance limit of the laser. Therefore, thermal management is also a major problem. To solve the above problems, a more efficient heat dissipation solution needs to be introduced, but this also involves costs.
4. Standardization issues
Currently, NVIDIA, Broadcom, etc. are actively launching their own complete and independent CPO switch solutions to seize the market opportunity. However, at the same time, industry standards (interface standards, packaging standards, etc.) have not yet been formed. As a result, it is difficult for upstream and downstream companies to conduct R & D, production, and configuration based on unified standards, which is also the difficulty in commercial promotion.
In short, we can see that solutions exist for the above problems, but they depend on the maturity of technology and the formulation of standards, which all take time.
On the other hand, fundamentally, CPO technology needs to form an advantage in terms of comprehensive cost.
This leads to a question: No matter what kind of solution, cost is always the core consideration factor. In addition to CPO, there are other more advanced or more conservative routes in progress. What is the relationship between them? Here, we first distinguish the differences between different technology routes.
04 Comparison of technology routes
1. CPO
The CPO we are discussing, that is, Co - Packaged Optics, as mentioned above, refers to packaging the optical engine and the chip on the same substrate. The chip here can be a switching chip (Asic) or a computing chip such as a GPU, but usually refers to a switching chip.
2. NPO
NPO is Near - Packaged Optics. It is a bit more primitive than CPO and has not reached the scale of being packaged on the same substrate or even interposer. It is only packaged on the same PCB motherboard.
Domestic companies in China, including Alibaba and Huawei, are promoting the NPO solution. This can be regarded as a compromise solution in the absence of advanced packaging production capacity, but it may become the mainstream solution in the Chinese market for some time, which will affect the penetration of NVIDIA's solution in the Chinese market to a certain extent.
Figure: Demonstration of different integration methods: (from top to bottom are the pluggable method, NPO, CPO (integrated on the packaging substrate), CPO (integrated on the interposer), and OIO to be mentioned below)
Source: ASE, Dolphin Research
3. OIO
OIO (Optical I/O) can be regarded as an advanced version of CPO. Here, the switching chip is not involved. It is mainly related to the computing chip, referring to packaging the optical engine and the computing chip together, or even combining them directly at the chip level. This is completely targeted at the scenario inside the cabinet.
Figure: Demonstration of different integration methods: pluggable, CPO, OIO
Source: TSMC, Openlight, Dolphin Research
Now, let's clarify the architecture of the data center:
A data center can be regarded as the interconnection of the following parts:
Servers focus on computing tasks and are equipped with computing chips such as GPU and CPU, memory, hard disks, etc.;
Switches are responsible for network communication between servers and between servers and the outside world, and realize data exchange through ASIC chips;
In addition, there is also a storage system. In the current mainstream data center architecture, storage devices are mainly distributed in server nodes and placed inside the servers, combined with the servers.
Based on the above architecture, we can imagine the application scenarios of CPO. On this basis, let's discuss why CPO starts with the switching chip first?
Here, we make an analogy for the role of the switch - the switch can be regarded as an overpass inside the data center. Then we can imagine that the switch bears the greatest pressure on data transmission bandwidth, port density, and the accompanying power consumption bottleneck. Therefore, the demand for CPO is more urgent.
4. CPC
CPC, Co - Packaged Copper, refers to directly integrating high - speed copper connectors on the packaging substrate.
The cost advantage of this technology route is very obvious, but it still cannot solve the bandwidth bottleneck and attenuation problems of the copper medium. Therefore, its application scenarios are relatively limited and can be partially applied to the connection between GPU/CPU nodes and switches and storage chips inside the cabinet. Currently, NVIDIA's in - cabinet solution still uses copper connections, but it may switch to optical interconnection in the future.
5. LPO
LPO, Linear - Drive Pluggable Optics, is a slimmed - down version of pluggable optics. By removing the internal DSP/CDR chips and only retaining and strengthening the analog chips Driver and TIA (we will talk about the functions of these components later), it realizes direct signal drive.
To put it simply, it directly removes the high - power - consuming DSP chip from