Breaking the Plasticity Bottleneck: Tsinghua Team's New Work Leads in Continual Learning with Transferable Task Relationships Guiding Training

A team from Tsinghua University uses H-embedding to guide hypernetworks, reducing the AI forgetting rate through task relationships and leading in tests such as ImageNet-R.

A research team from Tsinghua University has solved the problem of AI forgetting what it has learned by using "task relationships". The proposed H-embedding guided hypernet first calculates the closeness between new and old tasks, and then allows the hypernetwork to generate exclusive model parameters according to the relationships. Low-dimensional small vectors can be used plug-and-play, reducing the forgetting rate by another 10% in tests such as ImageNet-R.

Continual Learning (CL) is an important ability for artificial intelligence systems to achieve long-term intelligence. Its core goal is to enable the model to continuously absorb new knowledge in a sequence of tasks while maintaining or even improving performance on old tasks.

However, under the mainstream deep learning framework, models often significantly forget old knowledge when learning new tasks, which is known as "Catastrophic Forgetting". This is a key bottleneck limiting the large-scale practical application of continual learning.

Existing CL methods can be roughly divided into three categories: replay methods based on data replay, regularization methods based on parameter constraints, and dynamic expansion methods based on model structure. Although they all alleviate forgetting to varying degrees, a fundamental problem has always been overlooked:

Most CL methods still start from a "model-centric" perspective and lack the modeling and utilization of the intrinsic relationships between tasks.

However, task relationships directly determine the direction and efficiency of knowledge transfer: which tasks have high synergy, which tasks have large conflicts, which old tasks are helpful for new tasks, and which new tasks may damage existing capabilities - this information is crucial for robust continual learning.

To address this long-standing gap, researchers from Tsinghua University proposed a new "task-relation-centric" CL solution: an H-embedding guided hypernetwork continual learning framework.

Paper link: https://arxiv.org/pdf/2502.11609

Its core idea is: before learning each new task, construct a transferability-aware task embedding H-embedding through information-theoretic metrics, and use the hypernetwork to generate task-specific parameters according to the embedding, thereby explicitly encoding task relationships in the CL process.

Motivation of the method: Task relationships should be explicit guiding information for CL

In a typical CL setting, the model can only conduct "post hoc analysis" based on parameter changes after training on a new task to judge the interference and transfer between tasks.

This mode naturally has three major problems:

1. Lack of task-level prior knowledge, the model cannot plan the transfer path before training starts

The model neither knows which old tasks are helpful for the current task nor which knowledge needs to be protected.

2. Forward and backward transfer are difficult to optimize simultaneously

Traditional methods often can only focus on one aspect: strong regularization reduces forgetting but weakens the ability to learn new tasks; strong learning of new tasks improves forward transfer but leads to significant forgetting.

3. As the number of tasks increases, interference accumulates, making the method difficult to scale

The longer the task sequence, the higher the cost of the model's "blind learning".

Therefore, a natural question arises:

"If continual learning can construct a learning path from task relationships rather than simply from model parameters, can it improve both forward and backward transfer capabilities?"

In this context, the research team introduced the "task-relation-centric" design idea, transforming task transferability into learnable prior information and directly driving parameter generation and knowledge protection strategies.

Core contributions

Proposed H-embedding: Task transferability embedding based on H-score

Diagrammatic relationship between transferability and task embedding

The team used the information-theoretic indicator H-score to characterize the transfer value from any old task to the current task. The H-score can reflect the effectiveness of source task features for the target task and is an efficiently computable transferability metric in practical scenarios.

Subsequently, these transferability values were normalized through the Analytic Hierarchy Process (AHP) to be consistent with the distance metric in the embedding space, and then the low-dimensional H-embedding of the task was obtained through distance consistency optimization.

This representation has three important characteristics:

Prior availability: It can be obtained before task training starts
Low-dimensional and compact: Easy for long-term storage and quick retrieval
Aligned with transferability: The distance between embeddings reflects the relationship between tasks

This enables continual learning to have an "explicitly manageable task relationship structure".

Proposed an H-embedding-driven hypernetwork parameter generation framework

This framework uses a hypernetwork to generate exclusive parameters for each task according to the task embedding. More importantly, a lightweight decoder is introduced inside the model to force the hypernetwork to explicitly absorb task relationships by reconstructing the H-embedding.

The training process includes three types of key losses:

Task loss: Learning the current task
Continual learning regularization term: Reducing the overwrite of old knowledge
Embedding guidance loss: Ensuring that task relationships participate in parameter generation

This design enables the model to: automatically adjust the generated parameters according to task differences, perform forward transfer when tasks are related, and strengthen knowledge protection when tasks conflict, thus solving the core contradiction of CL at the structural level.

High availability: Can be trained end-to-end and is compatible with various parameter-efficient fine-tuning techniques

This framework has strong engineering feasibility:

Only one embedding needs to be saved for each task (extremely low storage cost)
Supports mainstream architectures such as CNN and ViT
Can be combined with parameter-efficient fine-tuning techniques such as LoRA and deployed on various pre-trained models

Experimental results: Leading across multiple CL benchmarks

The research team conducted extensive evaluations on multiple mainstream continual learning benchmarks, including CIFAR-100, ImageNet-R, and DomainNet, covering different model architectures (such as ResNet, Vision Transformer) and learning settings (such as full-model training, parameter-efficient fine-tuning). The main results are as follows:

The results show:

1. FAA comprehensively outperforms existing methods and achieves better final performance on all datasets.

2. Strong forward and backward transfer capabilities appear simultaneously. The difference between DAA and FAA is extremely small, indicating that learning new tasks has almost no interference with old tasks, and at the same time, it can effectively absorb knowledge from old tasks.

3. The algorithm is more robust to the increase in the number of tasks. In the expansion experiments from 5 to 10 to 20 tasks, the performance gain of this method continues to increase, showing good scalability. And in the later tasks, introducing embedding guidance brings significant convergence acceleration.

4. Ablation experiments verify the effectiveness of the components. Removing H-embedding guidance or AHP normalization will result in a significant performance decline.

Conclusions and prospects

The researchers proposed a "task-relation-centric" continual learning paradigm. By introducing the information-theory-driven task relationship embedding H-embedding before training, the model can:

Predict transferability rather than adapt passively
Consciously manage the knowledge interaction between tasks during the learning process
Significantly reduce forgetting and improve transfer efficiency

The H-embedding guided hypernetwork framework has achieved leading performance on multiple benchmarks, demonstrating the key role of task relationship modeling in continual learning.

In the future, task structure-aware methods are expected to be extended to more complex scenarios such as cross-modal incremental learning, long-term task adaptation of large models, task discovery, and automated learning sequence planning. It provides a new direction for building a more scalable and growable general AI system.

References:

https://arxiv.org/pdf/2502.11609

https://yangli-feasibility.com/home/group.html

This article is from the WeChat official account "New Intelligence Yuan". Author: LRST. Republished by 36Kr with permission.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Breaking the plasticity bottleneck, a new work by a Tsinghua team tops the leaderboard in continual learning: Transferable task relationships guide training