Yann LeCun's Team's Latest Research: Empowering World Models to Learn "Adaptiveness" and Continuously Evolve via Action

learning while planning

Latent world models enable robots to simulate the future before taking actions and then plan their movements accordingly. In 2022, the team led by Yann LeCun, the Turing Award laureate, proposed the Joint Embedding Prediction Architecture (JEPA). By predicting the evolution of future states in the latent space, it promoted the prediction in the representation space to become an important paradigm in the research of world models.

The problem is that the parameters of most world models no longer update after training, making it difficult to adapt to the changing visual conditions and physical attributes in real scenarios in a timely manner. Once the encoder or predictor is inaccurate, the error will gradually amplify in subsequent planning, ultimately leading to task failure.

However, humans are different. The sensorimotor adaptive mechanism is our core ability to adapt to environmental changes. We calibrate action predictions based on sensory feedback and continuously adjust our understanding of the environment according to new experiences.

Inspired by this biological principle, Yann LeCun's team proposed the self-adaptive latent world model AdaJEPA that can continuously learn during deployment. It embeds adaptability into the closed loop of Model Predictive Control (MPC): after each action is executed, the model is corrected with the actually observed state transition, and then the updated model is used to re - plan.

Paper link: https://arxiv.org/abs/2606.32026

The results show that AdaJEPA can stably improve the planning success rate in both in - distribution tasks and various distribution shifts. Even if only a single lightweight update is performed before each re - planning, it generally outperforms world models whose parameters no longer update after training.

This work opens up a promising direction for adaptive world models: world models should continuously calibrate predictions and update representations based on real feedback during actions to better adapt to changing environments.

AdaJEPA: A World Model with "Adaptability"

AdaJEPA is an adaptive latent world model that can continuously correct predictions during the robot's task execution. After each step of action is completed, the model uses new real observations to correct deviations and re - plans accordingly. The entire process does not require additional offline data, reward labels, or expert demonstrations. The entire process can be summarized as four steps: planning, execution, correction, and re - planning. Details are as follows:

Figure | AdaJEPA adapts during the closed - loop MPC execution test.

Planning: The model internally simulates the state changes in the next few steps, compares multiple candidate schemes, and selects the action sequence most likely to approach the target.
Execution: After planning, the model only executes the first action or a short sequence of actions, and then observes the real feedback from the environment. The state changes before execution are recorded as learning samples for subsequent adaptation.
Adaptation: After the action is executed, AdaJEPA writes the state transition generated in this step into an online buffer and uses it to check the accuracy of the model prediction. If there is a deviation between the predicted next state and the real result, the model uses this part of the error as an update signal to perform a lightweight correction in preparation for the next round of planning.
Re - planning: After adaptation, the model starts from the latest observation, uses the updated world model to re - predict the subsequent trajectory, and generates a new action sequence. During the entire task process, the cycle of "planning - execution - adaptation - re - planning" will be repeated continuously, making each round of planning based on the latest observations and the latest model.

In addition, to avoid slowing down real - time planning, AdaJEPA only performs lightweight updates : adjusting a small number of parameters, maintaining a small online buffer, and using the objective function from the pre - training stage. Details are as follows:

Update only key layers: Instead of updating the entire world model, AdaJEPA only adjusts a small number of key layers in the encoder and predictor. This approach can not only reduce computational overhead but also minimize disturbances to existing representations.
Maintain a small online buffer: By default, the buffer stores the latest 5 real state transitions. The research team compared two retention methods: recent - N retains the latest transitions, and hard - N retains the transitions with the largest prediction errors. The results show that the difference between the two methods is small, but recent - N is more stable.
Use the objective function from the pre - training stage: During the adaptation stage, the same prediction target as in pre - training is maintained, and the representation corresponding to the real observation is used as the supervision signal. To reduce disturbances to existing representations, the target representation is only used as a reference and does not participate in gradient backpropagation.

What's the Effect?

Overall, AdaJEPA can stably improve the planning success rate in both in - distribution tasks and various distribution shifts. The research team evaluated the model on the object - pushing tasks PushT / PushObj and the maze - navigation task PointMaze, covering scenarios with changes in shape, vision, dynamics, and layout. Even if only a single lightweight update is performed before each re - planning, AdaJEPA still generally outperforms world models whose parameters no longer update after training. The specific results are as follows:

1. In - distribution tasks

The results show that , during testing, AdaJEPA's adaptation does not sacrifice its original capabilities and can further improve the task success rate. Whether directly optimizing the action sequence using GD or searching by sampling and screening candidate actions using CEM, the success rate of AdaJEPA is higher than the baseline without adaptation during testing. The most significant improvement is in the object - pushing task, where the maximum success rate increases by more than 20%. In the maze - navigation task, the original model already performs strongly, and AdaJEPA can still maintain a similar level without significant degradation.

Figure | Planning success rate of PointMaze under dynamic and layout changes.

2. Out - of - distribution tasks

In tasks with more obvious environmental changes, the advantages of AdaJEPA are more prominent. After each round of planning and execution, it updates the world model with new real feedback, making subsequent planning closer to the current environment and improving the task success rate. In contrast, models that no longer update after training cannot utilize these new observations, and their success rates often quickly reach the upper limit.

Figure | Planning success rate under shape and visual changes.

Specifically, in the multi - shape object - pushing task, if object shapes not seen during training appear during testing, the improvement of AdaJEPA is the most obvious, and the success rate nearly doubles. In visual disturbances, the gains brought by blurring, noise, and dim lighting are more obvious. If only the anchor points or object colors change, the advantages of AdaJEPA are relatively limited. In the PointMaze maze - navigation task, AdaJEPA can also adapt to dynamic changes and new maze layouts and plan a trajectory closer to the shortest path under new layouts.

Figure | Planning trajectories in diverse mazes.

Figure | Planning trajectories of PointMaze - Medium under dynamic changes.

3. AdaJEPA Improves in Multiple JEPA Implementations

To verify whether AdaJEPA depends on a specific model implementation, the research team tested it on the PushT object - pushing task by changing the representation form, model architecture, training target, and planner respectively. The results show that AdaJEPA can improve the planning success rate in these settings ; even if the baseline model is fully trained and the evaluation is still in - distribution, adaptation during testing still brings stable gains, with only about 0.01 - 0.03 seconds of additional delay for each re - planning.

Figure | Performance of AdaJEPA under different implementations.

4. AdaJEPA Corrects Existing Predictions Instead of Learning a New World from Scratch

The visualization results show that AdaJEPA's adaptation is more like calibration rather than re - learning. After decoding the predicted trajectories after adaptation, the research team found that even when encountering visual disturbances or unseen shapes, the decoding results still tend to retain the structural features in the training distribution. For example, red squares will be decoded into gray squares common in training, and unseen shapes will be decoded into similar seen shapes.

Figure | Examples of AdaJEPA planning trajectories under visual and shape changes.

5. Ablation Experiments and Analysis

The results of ablation experiments show that AdaJEPA does not require large - scale updates and does not depend on complex parameter tuning; updating a small number of key layers, performing one - step gradient update, and maintaining a buffer of recent state transitions can already bring stable benefits.

First, when AdaJEPA only updates part of the layers in the encoder or predictor or uses LoRA for lightweight updates, its overall performance is better than the baseline without adaptation during testing, indicating that it does not need to retrain the entire model.

Second, different distribution shifts have different requirements for update locations. Under shape changes, the differences between various update schemes are small, and mainly adjusting the predictor is sufficient. Under visual and layout changes, only updating the predictor has limited effects, and the encoder also needs to participate. In layout changes, updating the first layer of the predictor has the best effect, possibly because it is the first to fuse latent state and action information, making it easier to correct new local transfer relationships.

In addition, the default hyperparameters are already stable enough. In terms of hyperparameter settings, AdaJEPA defaultly uses the learning rate from the training stage, performs only one - step gradient update before each re - planning, and retains recent state transitions as a buffer. A larger learning rate or more update steps may enhance the adaptation effect, but it will also increase instability and computational overhead. Overall, the default settings can already strike a good balance between effect, stability, and delay.

Figure | Influence of adaptation hyperparameters and replay buffer on planning success rate.

6. Influence of Training Data Scale and Shape Diversity on AdaJEPA

The experimental results show that the effectiveness of AdaJEPA depends not only on the amount of training data but also on whether the training data is diverse enough. For the PushObj multi - shape object - pushing task, shape diversity is more crucial than simply stacking trajectories of the same shape; at the same time, adaptation during testing can make up for some generalization gaps when data is insufficient.

Specifically, when the total number of trajectories is the same, distributing the data to more object shapes is more conducive to generalizing to unseen shapes than concentrating on a single shape. For example, when the total number of trajectories is 16k, the success rate of AdaJEPA covering four shapes on unseen shapes is 51.9%, higher than 45.8% when covering only a single shape.

In addition, AdaJEPA can improve the success rate at different data scales, and the benefits are particularly obvious in low - data scenarios. Even if only a few shapes and trajectories are covered during training, the model can use new observations to correct predictions during deployment. For example, on seen shapes, training AdaJEPA with 1 shape and 1k trajectories achieves a success rate of 60.8%, higher than the model trained with 4 shapes and a total of 64k trajectories but not updated during testing.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Yann LeCun's Team Latest Research: Enabling World Models to Learn "Adaptiveness" and Continuously Evolve Through Action

AdaJEPA: A World Model with "Adaptability"

What's the Effect?