HomeArticle

Capital Bets on AI "Complete Industrial Chain": Zhejiang Xin Siwei Secures Seed Round Financing

晓曦2025-10-14 10:34
Shifang Ronghai made a strategic investment in Zhexin Siwei and collaborated to create AI hardware products, focusing on the fields of smart home, sports, and embodied AI.

Recently, Shenzhen Shifang Ronghai Technology Co., Ltd. completed a strategic investment in Zhexin Siwei Intelligent Technology (Hangzhou) Co., Ltd., and the two parties officially reached a deep strategic cooperation.

Tan Xue, the general manager of Zhexin Siwei, pointed out that the cooperation between Shifang Ronghai and Zhexin Siwei constitutes a complete industrial chain from the system layer to the hardware end, with great development potential. This may be more valuable than investing in a single technology or product - type company. Du Jun, the marketing director of Xiaozhi AI of Shifang Ronghai, also said that the cooperation between the two parties is not based on single - point capabilities but on the complementary comprehensive strength formed by years of industry accumulation.

This round of financing attracted the attention of many investment institutions that had previously laid out in cutting - edge fields such as computing power, large models, and robots, reflecting the capital's optimism about the voice interaction direction. At the same time, this investment event also reflects a new trend in the AI field: capital increasingly favors "scenario - based" enterprises that can form strong synergy with their own businesses.

How to Create a "Killer" AI Hardware Product

Shifang Ronghai was founded in early 2016. After three strategic upgrades, the company has gradually evolved from a knowledge - sharing platform into a technology enterprise centered on AI, focusing on the implementation of AI.

In Du Jun's view, whether it is AI hardware or AI applications, to become a "killer" product, it needs to meet three characteristics:

  • Combined with application scenarios, it can solve users' real needs. Whether it is practical problems or emotional needs, it has strong scenario - based attributes.
  • The experience is seamless. The essence of a large number of optimizations is to reduce the difficulty of user interaction. For example, the shift from traditional physical button interaction to voice interaction is to make the user experience more seamless. In the future, after adding multi - modality, AI can even understand users' needs through a look or a gesture. The ultimate experience is precisely "seamless".
  • It has humanistic care, and its core is emotional intelligence. People are paying more and more attention to their inner needs. In the future, products may not all be powerful in function, but excellent products must have the ability of humanistic care, which will become a basic attribute of products and provide users with a more comfortable experience.

Du Jun believes that not all devices need to be highly anthropomorphic, but all devices need to have strong humanistic and emotional care capabilities in specific links. At this time, users will be more willing to pay for the product, but not simply for emotional value.

Initial Exploration: A Hardware - Native Interaction System

While the software - side applications of large models are more widely concerned, Shifang Ronghai is more optimistic about the huge improvement in hardware evolution after AI is connected to terminals. In practice, leveraging its software advantages, Shifang Ronghai launched the Xiaozhi AI project in August last year based on a large amount of technical experience in real - time voice interaction, emotion perception, and large models accumulated in its past education business.

Xiaozhi AI is fundamentally different from mainstream large models such as the GPT series and Qwen series in the market.

First of all, Xiaozhi AI is a hardware - native interaction system, focusing more on the intelligence of hardware terminals. From the underlying architecture to function development, it is centered around how to make the large model perform better in hardware scenarios and have stronger integration with products. Now, Xiaozhi AI has achieved a fast response of an average of 500 milliseconds, anthropomorphic interaction based on emotion recognition, and supports function expansion through the MCP protocol. The entire system is deployed in the cloud, effectively reducing the configuration requirements for the terminal side.

Secondly, Xiaozhi AI pays more attention to application and the effect of large models in human - machine interaction. Xiaozhi AI uses a three - stage architecture in the voice interaction process: ASR (voice - to - text), LLM (large language model), and TTS (text - to - voice output). It is worth mentioning that Xiaozhi AI also supports third - party models such as Tongyi and DeepSeek.

The team has made a large number of optimizations around the interaction experience of C - end users, especially in emotion and sentiment recognition. The team has systematically optimized the entire link to enhance the authenticity and affinity of human - machine interaction:

  • On the model side, 1024 - dimensional personality digital features are extracted from the user's full - volume dialogue corpus, compressed into 26 - dimensional key features through clustering and induction, and embedded in the dialogue training set. During interaction, the user's personality characteristics can be recognized.
  • In the ASR stage, an optimized model is independently deployed. In addition to voice recognition, it can also capture emotional signals such as tone and intonation. The system will identify keywords to infer potential emotions and transmit this information to the large model, so that the generated content takes into account both objective semantics and subjective emotions.
  • In the TTS stage, the traditional "sound imitation" is upgraded to "human imitation". AI integrates emotional intonation during voice broadcasting, making the expression more natural and warm.

Regarding the important context ability to provide emotional value, Xiaozhi AI currently supports short - term memory, and the long - term memory function is in internal testing and is expected to be launched soon. Du Jun said that in many cases, the matter of "memory" becomes similar to a philosophical problem. The deeper one thinks, the more subjective or abstract it becomes. The difficulty lies in constantly re - understanding what memory is, when to retrieve memory, and in what form memory exists. Therefore, the long - term memory function will continue to be iterated.

However, Du Jun frankly said that the industry is still in its early stage. As one of the first companies in China to integrate the capabilities of large emotional models into terminal devices, Shifang Ronghai has only been promoting this for one year. Nevertheless, good applications have emerged in various industries, but it still takes time to achieve marketization and commercialization, and there is still a lot of room for exploration.

The "Soul Brain" Finds Its "Body"

For Shifang Ronghai, the core positioning of Xiaozhi AI is a complete set of interaction systems for hardware terminals. Its value needs to rely on hardware products. That is, the "soul brain" of Xiaozhi AI must have the "body" of hardware. Therefore, the combination of software and hardware is the key for Shifang Ronghai to optimize the system and push the products to the consumer market in the future.

The core of the cooperation with Zhexin Siwei lies here. The two parties give full play to their respective advantages: Xiaozhi AI is responsible for polishing and doing well in system - level work, while Zhexin Siwei gives full play to its hardware and product R & D advantages and experience accumulated over the years. The two parties jointly explore and create contemporary AI products.

"We are very glad to cooperate with a team like Zhexin Siwei, which has both technological vision and practical spirit. Their in - depth exploration of the end - side application of multi - modal emotional large language models is highly consistent with our judgment on the future development trend of AI technology. We believe that this strategic investment will strongly accelerate the iteration of Xiaozhi AI technology and the expansion of scenarios, and jointly open a new chapter in intelligent interaction." said Huang Guan, the chairman of Shifang Ronghai and the founder of Xiaozhi AI.

Actually, after the initial version was completed, Xiaozhi AI achieved ideal results in interaction experience, anthropomorphism, and response speed during internal testing. However, the team did not determine its specific application direction at that time. Instead, it decided to establish a community ecosystem in an open - source way and hoped to explore with the community.

Currently, there are more than 60,000 developers in the community. During this period, many developers conducted secondary development or product innovation in the community, and many good ideas emerged. The cooperation exploration between Shifang Ronghai and Zhexin Siwei also started at that time.

Zhexin Siwei is an AI innovation enterprise founded by a team of doctors jointly trained by Zhejiang University and the Singapore Management University. It focuses on the software and hardware applications of multi - modal emotional large models. Smart home is Zhexin Siwei's core advantage scenario. Qin Bing, the founder, has more than 20 years of experience in the smart home industry. He also has a smart manufacturing factory in Zhejiang and Sichuan respectively, and many leading smart home enterprises are its customers. Therefore, Zhexin Siwei can quickly combine the new needs and scenarios of leading customers to better apply Xiaozhi AI to hardware products.

The development speed, efficiency, and performance of Xiaozhi AI have become one of Zhexin Siwei's core capabilities. Zhexin Siwei believes that the next - generation smart home products will inevitably be deeply integrated with AI and adopt voice interaction as the infrastructure. It is worth noting that in the past six months, the two parties have connected some different devices of leading home appliance brands.

"The embodied field represents the future", which is also the direction that Zhexin Siwei focuses on. Currently, the embodied field mostly focuses on the development of robot bodies and intelligent brains. Zhexin Siwei believes that the next direction is the combination of "intelligent brain + voice interaction" to serve the humanoid robot body and even cooperate with the "cerebellum" of the embodied robot. Users will not be satisfied with the tool attribute of products alone, and now is a good time to make a layout.

Another important scenario is sports, which actually has a certain emotional color. Many members of the founding team are long - time marathon enthusiasts and have completed nearly 100 marathons around the world. Sports have been integrated into daily life. With a deep understanding of the sports industry, the Zhexin Siwei team quickly entered the sports track and pioneered the AI commentating intelligent agent.

Zhexin Siwei's industrial layout currently focuses on these three scenarios, but it will not be limited to them in the future.

It is understood that Zhexin Siwei's current strategic choice is to target the B - end, aiming to build the infrastructure for voice interaction. Tan Xue said that the key to the continuous and vigorous development of language interaction lies in the density of language use and the quality of language knowledge. Zhexin Siwei's first step is to increase the density of language use - through a large number of smart home devices, increase the number of users, the number of interactive people, and the number of interactive devices, and the relevant data and feedback will also feed back into the iteration of Xiaozhi AI.

However, Zhexin Siwei will cover both the B - end and the C - end in the future. It's just that it's not yet the time for large models to directly target the C - end and achieve rapid growth. Zhexin Siwei also has relevant talent reserves. For example, Tan Xue has been deeply involved in the To C field for nearly 20 years, covering businesses related to emotional value. She revealed that elderly care companionship may be one of the future exploration directions.

A Collective Upgrade of the Industrial Chain

"It is the market development choice and opportunity that have gradually clarified the development direction of hardware for us." Du Jun said.

Actually, as early as last August when Xiaozhi AI was just launched, the two parties had in - depth exchanges and produced many landed products. For example, in less than half a year, Zhexin Siwei conducted in - depth secondary development based on Xiaozhi AI in the sports industry and successfully created the first domestic AI commentating intelligent agent, which was applied to the commentary of the Zhejiang BA games and received wide - spread attention across the network. Secondly, relying on the team's years of development and manufacturing experience in the smart home field, Zhexin Siwei also applied the secondarily developed and trained Xiaozhi AI to this field and has reached cooperation and obtained orders from several leading home appliance manufacturers. In addition, Zhexin Siwei has also achieved limb control and language interaction with multiple embodied robots based on Xiaozhi AI. In addition, the two parties are also developing other products, which will be launched gradually in the future.

During the cooperation, the two teams cooperated closely and efficiently, and the communication was almost barrier - free. After customers put forward requirements or hardware problems occurred, Zhexin Siwei would respond immediately, quickly locate the cause, and put forward innovative improvement ideas for the large model capabilities. At the same time, the Xiaozhi AI team would also give specific suggestions for hardware performance optimization.

In addition to the R & D link, the two parties also maintain high - frequency communication at the strategic level to ensure the same direction. The two parties have established a perfect communication mechanism and can quickly respond to temporary project requirements. In addition, the scope of cooperation between the two parties has also expanded from technology to the market and brand levels, forming a full - link cooperation from technology to the market.

Previously, Shifang Ronghai, which focused on software, gradually realized that the industrial chain of the hardware industry is very long, covering multiple roles from upstream cloud providers and chip platforms to hardware developers, product demanders, and then to channel parties and brand parties. In Du Jun's view, AI hardware is not simply adding AI to hardware, but an overall upgrade and evolution of the entire industrial chain.

At the chip level, cloud - edge collaboration will become an important development direction. Edge - side chips are responsible for local response and privacy protection, while the cloud undertakes complex computing and reasoning tasks. The combination of the two has become a clear trend. However, due to the still relatively high cost of edge - side chips, their large - scale popularization in the consumer - grade market still takes time.

At the system level, the terminal AIOS system represented by Xiaozhi AI constructs a complete closed - loop from voice interaction to task execution, bringing users a smooth and natural experience. This type of system not only provides AI computing power interfaces but also gradually reconstructs the system architecture of intelligent devices.

In contrast, the transformation of the application developer ecosystem is particularly crucial. The interaction logic of AI applications is significantly different from traditional APPs. Developers need to redesign the application architecture, delivery method, and user experience to adapt to the operation mode driven by natural language or voice. In the future, whether it is navigation, food delivery, reminder tools, or workflow - type applications, they will all be reconstructed based on the Agent logic.

"In the chain, each link needs to perform its own duties. It is impossible for one party to do everything alone. Cooperation is definitely the best way to achieve win - win results." Du Jun said. This is not a "reshaping" of the original system but a re - evolution based on the interaction logic, and the entire industry is actively adapting to this change.

"Don't Wait, Act Soon"

Although the prospects are broad, the implementation of multi - modal emotional large models still faces challenges.

Tan Xue pointed out that the problems mainly focus on the technical path and hardware adaptation. On the one hand, the traditional "R & D first, promotion later" path may not be applicable. Zhexin Siwei chooses to start from user needs and actual scenarios and reverse - drive technology iteration. This "scenario - driven" model not only improves the market acceptance of products but also enhances the ability to cope with uncertainties. In terms of hardware adaptation, although optimization is not a fundamental obstacle, it directly affects the smoothness of the user experience. Therefore, the system must achieve a high degree of consistency in software - hardware collaboration, which puts forward higher requirements for the overall architecture.

Du Jun