Jiliu Technology: Achieve World-Class Innovation in Long-Distance Networking and Joint Training for GPU Clusters
The following article is from "Z Plan Supports AI Model Startups".
Jiliu Technology completed its Pre-A round of financing at the beginning of this year, led by Lightspeed Photon.
Previously, "Jiliu Technology" had received investments from Qiji Venture, Weimeng Media, Zhuoyuan Asia, Tsinghua Alumni Fund, Zhipu AI, Zhuoyuan Capital, Fangxin Capital, TusStar Venture Capital, and well - known strategic investors.
Jiliu Technology is an open computing power network provider. Its products include the GPU - RDMA network communication framework and high - speed lossless network switches.
Jiliu Technology's network communication solution can improve performance by over 20%, saving tens of millions of yuan in a thousand - GPU environment and hundreds of millions of yuan in a ten - thousand - GPU environment.
Build a High - Performance Computing Super System
In 1967, Gene Amdahl, a computer architect at IBM, proposed an empirical formula, indicating that the potential for system performance improvement is limited by the parallelizable part of the system. Even if the number of parallel processors increases infinitely, the upper limit of overall performance improvement is greatly restricted.
Put simply, the computing speed of a computing power cluster does not increase infinitely with the growth of the number of GPUs. Just as one person can build a house in 10 days, 10 people can do it in 1 day, but 100 people still need 1 day - the other 90 people may have to idle because they can't fit into the construction site.
The same goes for training large models. According to a report by Gartner, during the training process of GPT - 3.5, a high - performance computing power cluster composed of 10,000 NVIDIA A100 GPUs was used, and this number increased to about 25,000 A100 GPUs for GPT - 4. However, the computing power utilization rate was only 32% to 36%, resulting in serious waste of computing power.
Jiliu Technology's job is to design a system that can organize thousands or even tens of thousands of people to build more houses as quickly as possible.
Hu Xiaohe, the CEO of Jiliu Technology, said that Jiliu Technology's products are mainly targeted at three dimensions, including the computing power management and scheduling platform, the computing power optimization and operation and maintenance platform, and high - speed interconnection hardware. Currently, in addition to the complete computing power cluster construction solution, the company has productized and gradually implemented products at three levels: cluster management, computing engine, and high - speed network, helping AI enterprises organize GPUs reasonably and improving delivery efficiency and GPU utilization as much as possible.
Currently, Jiliu Technology's computing power cluster solution can improve the performance of GPU clusters by over 20%, helping customers save tens of millions of yuan in a thousand - GPU environment.
Focus on Groundbreaking Work and Specialize in the Construction of Large - Scale Computer Systems
Hu Xiaohe studied at Tsinghua University from undergraduate to doctoral and post - doctoral levels. Under the guidance of Researcher Li Jun, he conducted research on high - performance network systems in the Network Security Laboratory for ten years.
During his visit to the University of California, Berkeley, he studied under Academician Scott Shenker, the proposer of the SDN network.
He is very proficient in distributed computing and high - performance networks. Before starting his business, he had implemented the first operator - level Tbps programmable network product in the country and ran a domestic thousand - GPU large model in a supercomputing environment.
Focusing on the construction of large - scale computer systems was the goal Hu Xiaohe set at the beginning of his entrepreneurship. What Jiliu Technology is currently developing is a distributed GPU system designed for artificial intelligence, also known as a computing power cluster.
"In the past year and a half of entrepreneurship, Jiliu Technology has built the largest private single - body computing power cluster in China," Hu Xiaohe said. "We have broken many established consensuses in the industry. For example, we have proven that AI training is not latency - sensitive but bandwidth - sensitive. We have achieved large - model training over a 30 - kilometer wide - area network without loss of computing power, and can maintain 98% - 99% of computing power over a 50 - kilometer distance." This is groundbreaking worldwide.
Overcome Technical Difficulties and Establish Core Advantages
With the explosive growth of the computing power market, Jiliu Technology has focused its development on the specific implementation of projects. It actively participates in the construction and operation and maintenance of medium - and large - scale computing clusters, tries to turn the previously accumulated tools into more standardized products, and explores the adaptation of domestic hardware and going global.
In the one - and - a - half years since Jiliu Technology was founded, the projects have been implemented in the production environments of first - tier manufacturers. It has designed, constructed, optimized, and maintained computing power clusters for multiple data centers, serving manufacturers such as Zhipu AI, SenseTime, Yindun Cloud, and Century Internet, with a total of multiple computing power clusters ranging from a thousand to ten thousand GPUs. It has also launched a solution for a hundred - thousand - GPU cluster.
"We hope to form a high - performance computing power network by building such a super system, ultimately supporting the application and implementation of artificial intelligence models and the IT iteration of enterprises."
"High - performance computing power infrastructure is the trend. In future competition, technology will be our core competitiveness."
In Hu Xiaohe's view, entrepreneurship and scientific research have similarities: "In scientific research, we need to follow a general direction and make breakthroughs at key points to gain the recognition of review experts.
In entrepreneurship, we also need to find a general direction, establish our own advantages in the field, and come up with solutions and products that enterprises need to ultimately gain the recognition of customers and investors."
Jiliu Continues to Forge Ahead
"I'm very honored that many mentors and friends have given their full support on Jiliu's entrepreneurial journey, giving Jiliu the opportunity to participate in the construction of super systems and witness the implementation of general artificial intelligence in China. I'm very proud of the team's hard work. In the wave of the rapid development of artificial intelligence, we have left our mark," Hu Xiaohe said with emotion.
Hu Xiaohe summarized: "Whether it's scientific research or entrepreneurship, 'Talk is cheap, show me the code' is the most important.
This industry is just starting to develop. Our products and technologies are in a leading position in the domestic open market, but there are many challenges to be solved in the future. We need to expand and optimize the established computing power clusters, achieve 'backward compatibility', improve the automation capabilities of computing power scheduling, operation and maintenance, and fault location, and support the implementation of long - distance distributed computing power clusters.
We will forge ahead in the direction of high - performance computing power networks, contribute to domestic computing power, and support the implementation of domestic large models. We believe that Jiliu will definitely have a place in future hundred - thousand - and million - GPU clusters and will enter the era of general artificial intelligence with domestic leading large - model manufacturers."
END
This article is from the WeChat official account "Starlink Capital", reprinted by 36Kr with permission.