Horizon Robotics delivers its "Made-in-China FSD". Exclusive early access is here.
What's the future of China's intelligent vehicles, especially intelligent assisted driving systems?
It depends on the "common denominator"... Well, Horizon, known for being the common denominator, has submitted its answers.
The No.1 autonomous computing hardware - Horizon J6P, is on board.
The industry's first open-source end-to-end - UniAD, has been iterated into the most advanced one-stage end-to-end HSD and is on board.
The shipments of the Journey series have exceeded 10 million units - Horizon's complete engineering capabilities are finally serving its self-developed full-stack system.
There are only three players in the entire industry with their own computing hardware architecture, self-developed large AI driver models, and well-tested mass production and delivery experience. Tesla and Huawei have already created miracles before.
This is also the core reason why Horizon has attracted so much attention. Moreover, with Horizon's SOTA, it will surely drive the SOTA of a wider range of mass-produced intelligent vehicles.
Because now, Horizon has made it clear - the "Chinese version of FSD" - HSD, is here.
The question is, how is the experience?
What's different about the complete HSD?
In a word, it's no longer in the testing state but infinitely close to mass production:
The underlying computing hardware is the Horizon J6P with a computing power of 560 TOPS; the software algorithm adopts one-stage end-to-end and no longer relies on rules as a fallback.
The car is not a test vehicle modified from the Volkswagen ID series, but the Exeed Sterra ES under Chery.
Actually, the Sterra ES is the project for the first mass production and delivery of Horizon's HSD. The new model will be unveiled at the upcoming Chengdu Auto Show:
The biggest improvement in this version of HSD is the longitudinal speed control. It is reflected in starting at traffic lights and automatic car-following. The vehicle starts very quickly, usually being the first to rush out, but there is no discomfort in the driving experience:
When following slowly or in a game situation, the dashboard sometimes shows 0, but actually the vehicle is still moving to adjust its posture, aiming to prevent the following distance from being too long and getting cut in:
The "delicacy" can also be felt when turning left or right at intersections. Whether facing motor vehicles or non-motor vehicles, it won't stop completely but keeps moving forward steadily, and the steering wheel won't jerk:
In terms of experience, end-to-end has always been like an experienced driver, but the near-mass-production version of HSD is obviously even more of an "experienced driver".
Previously, it was "see first, then think", but now it starts moving the steering wheel the moment it sees. The path is shorter, the reaction is faster, and the fault tolerance rate is higher.
Players following the VLA route emphasize the "L" - the language model's ability to perceive the scene environment and thus guide the output of trajectories.
It is reflected in the system's judgment ability, which can be clearly perceived:
After a large vehicle blocking the view has turned, an old man is cycling towards. It's actually difficult for an inexperienced driver to accurately judge the distance, but HSD isn't timid at all and just passes through because there is still a considerable distance from the target.
In addition, HSD has very strong detouring ability. When conditions permit, it will actively cross the line or borrow a lane to avoid other social vehicles. Needless to say, on roads without markings, the detouring and avoidance are even smoother:
At this three-way intersection, HSD needs to turn left, and the car in front wants to make a U-turn. HSD just seizes the opportunity, completes the left turn, and at the same time gives way to a non-motor vehicle.
Of course, we also found some problems during the testing process:
In a two-way single-lane scenario, HSD misidentified the normal traffic waiting at a red light as illegally parked cars by the roadside and directly detoured to the oncoming lane, resulting in a reverse overtaking.
At least it proves that HSD is truly end-to-end and basically doesn't rely on rules.
But to be honest, this bug is quite serious. It may be attributed to the efficiency-first development idea, which means that in the selection of training data, the system is not encouraged to "wait stupidly" behind slow cars.
However, the methods and cycles for solving problems are very different from the past.
"100% data-driven", truly end-to-end.
HSD "is not a technological iteration"
In January this year, we had already experienced Horizon's HSD once.
The chip was the most powerful J6P, but the software was still a two-stage end-to-end system with certain rules as a fallback, and the vehicle was also a test car.
This is actually the way most automakers mass-produce end-to-end systems at present.
But Horizon stated that during the R & D process, it found many problems with the two-stage system:
The training process is extremely unstable: the model is very prone to divergence or non-convergence, making it difficult to stably learn effective strategies.
There is causal confusion: the model has difficulty accurately understanding the causal relationship between past actions and current decisions, which may lead to incorrect associations and learning.
It's a huge challenge for the model to naturally learn safety behaviors such as defensive driving and emergency braking (AEB). It requires going through different training stages, and the process is very difficult.
In short, these shortcomings are not due to wrong design concepts but technical barriers in the implementation path.
So in half a year, Horizon quickly completed the switch to one-stage end-to-end. The core idea is that a unified deep learning model directly receives raw sensor data (or preprocessed features) and outputs the final vehicle control trajectories (or control instructions).
There are mainly three innovations in the core architecture:
Dense modal information processing: High-dimensional, lossless abstract features (Feature), rather than the simplified results (such as Bounding Box) produced by the perception module. This retains the uncertainty information of the environment and provides a basis for anthropomorphic decision-making.
Horizontal and vertical joint optimization: The model directly outputs the original trajectories containing horizontal (direction) and vertical (speed) information, fundamentally avoiding the mechanical feeling and jerks caused by the decoupling of horizontal and vertical control in traditional architectures, such as "turn first, then accelerate" or "brake first, then turn".
Post-processing and safety verification: The original trajectories will go through a lightweight post-processing layer for smoothing optimization and be finally checked by a high-priority safety verification module to ensure that the instructions output to the by-wire system are absolutely safe. The system aims to gradually reduce the reliance on post-processing as the model's capabilities continue to improve, making the code more and more concise.
The perception, understanding, and control modules have all been iterated.
Perception
Although it is an end-to-end architecture, accurate perception ability is still the foundation of the system. HSD has also made innovations in the perception stage.
For example, it uses a deep learning model to implement general obstacle detection (OCC - Occupancy Network) for alarm modeling. It can not only identify standard vehicles and pedestrians but also model non-conventional obstacles (such as soil mounds and fallen objects) and output high-precision 3D occupancy grids.
It also achieves extremely high-precision distance and position estimation through advanced vision algorithms and model training, providing reliable input for scenarios such as narrow passage and extreme parking.
Finally, there is long-term sequence information fusion. Since the model processes not single-frame images but fuses continuous visual information over time, this is the key to achieving defensive driving and accurately predicting movement trends.
Cognitive decision-making
This is the "brain" of the system, responsible for understanding the scene and making anthropomorphic driving decisions. HSD's method is to introduce a "fast and slow thinking" dual system.
Fast thinking is an end-to-end model that handles immediate responses and copes with most driving scenarios. Based on the perceived dense features, it outputs smooth and continuous trajectories by imitating the behavior of human drivers through imitation learning.
Slow thinking consists of a large language model (LLM) and a world model: it handles complex scenarios that require logical reasoning and common-sense understanding (such as understanding special traffic signs, judging whether the vehicle in front is "stalled" or "in a queue", and planning at complex intersections).
Among them, the large language model is used to understand symbolic information such as traffic rules and sign semantics and conduct cross-domain common-sense and logical analysis. The world model is used to build the causal relationship of the physical world, predict the intentions of other traffic participants, and conduct longer-term sequence reasoning.
Finally, reinforcement learning is used to connect and enhance the fast and slow thinking systems. By allowing the AI to conduct autonomous exploration in a simulation environment, it learns how to handle rare (Corner Case) and dangerous scenarios, thereby continuously strengthening the model's reasoning and generalization abilities.
Control execution
It is responsible for converting the trajectories output by the cognitive decision-making module into precise control instructions that the vehicle can execute. This includes direct trajectory vehicle control: the trajectories output by the end-to-end model are smoothed and safety-verified and then directly sent to the by-wire system for execution, ensuring the integrity and smoothness of control.
In addition, by learning a large amount of human driving data, the control instructions output by the model are highly anthropomorphic in terms of acceleration, deceleration, and steering rhythm, avoiding unnecessary point braking, heavy braking, and large steering wheel movements, enhancing the riding comfort and sense of security.
Data closed-loop and simulation
This is the "infrastructure" that supports the continuous evolution of the entire system.
Especially with large-scale data-driven development, the improvement of system performance no longer depends on engineers writing rules but on continuously training the model with large-scale, high-quality real and simulated data.
There is a high-precision simulation platform that has built a simulation test set covering a large number of long-tail scenarios (Corner Cases). It can efficiently and safely reproduce rare and dangerous scenarios for model testing and training, greatly improving the efficiency of solving long-tail problems;