LLMs have entered the "drag-and-drop era". With just prompts, you can customize a large model in a few seconds, and the efficiency soars by 12,000 times.
Recently, researchers from institutions such as NUS and UT Austin have innovatively proposed a "Drag-and-Drop Large Language Model" (DnD), which can quickly generate model parameters based on prompts and adapt to tasks without fine-tuning. Not only does it improve efficiency by up to 12,000 times, but it also has excellent zero-shot generalization ability.
Most current large models have zero-shot generalization ability. However, to make specific adaptations in real scenarios, it still takes several hours to fine-tune the model.
Even parameter-efficient methods like LoRA can only alleviate, not eliminate, the fine-tuning cost required for each task.
Just now, researchers from institutions including the National University of Singapore and the University of Texas at Austin, including Professor Yang You, have proposed a brand-new "Drag-and-Drop Large Language Model" — Drag-and-Drop LLMs!
Paper link: https://arxiv.org/abs/2506.16406
DnD is a parameter generator based on prompts that can perform training-free adaptive fine-tuning on LLM.
Through a combination of a lightweight text encoder and a cascaded hyper-convolutional decoder, DnD can generate LoRA weight matrices for a task within seconds based solely on unlabeled task prompts.
Obviously, for scenarios that require rapid model specialization, DnD can provide a more powerful, flexible, and efficient alternative to traditional fine-tuning methods.
In summary, the core advantages of DnD are as follows:
Extreme efficiency: Its computational overhead is 12,000 times lower than that of traditional full fine-tuning.
Excellent performance: In zero-shot learning common-sense reasoning, mathematics, coding, and multi-modal benchmark tests, its performance is 30% higher than that of the most powerful trained LoRA models.
Strong generalization: With only unlabeled prompts, it can demonstrate strong generalization ability across different domains.
Implementation Method of DnD
Through observation, researchers found that the LoRA adapter is simply a function of its training data: gradient descent "drags" the base weights to an optimal state for a specific task.
If one can directly learn the mapping from prompts to weights, then the gradient descent process can be completely bypassed.
DnD obtains the "drag" ability through two core steps: preparing training data (top left) and training the parameter generator (top right).
When preparing data, model parameters (weights) are explicitly paired with the conditions (prompts) of a specific dataset.
During training, the DnD model takes the conditions as input to generate parameters and uses the original LoRA parameters as the supervision signal for learning.
Based on these insights, the team proposed the "Drag-and-Drop Large Language Model", which can generate task-specific weights without fine-tuning.
The team first trained and saved corresponding LoRA adapters on multiple different datasets.
To endow the model with the "drag" ability, the team randomly paired the prompts of these datasets with the collected LoRA weights to form the training data for the DnD model — that is, "prompt-parameter" pairs.
The parameter generator is a decoder composed of cascaded convolutional blocks.
The module details of the parameter generator are as follows: Each hyper-convolutional block contains three hyper-convolutional modules, which are used to extract and fuse feature information in different dimensions.
During training, the team uses an off-the-shelf text encoder to extract the embedding vectors of the prompts and inputs them into the generator.
The generator predicts the model weights, and the team optimizes it using the mean squared error (MSE) loss between the predicted weights and the real LoRA weights.
During the inference phase, the team only needs to input prompts from a brand-new dataset (unseen during training) into DnD, and with just one forward pass, they can obtain parameters tailored for the task.
Effect Evaluation
Zero-Shot Learning Effect
Generalization ability on new (test) datasets.
On all unseen datasets, DnD significantly outperforms the trained LoRA models in terms of accuracy.
DnD can generate parameters for more complex tasks such as mathematics, coding, and multi-modal Q&A.
It still demonstrates strong zero-shot learning ability on these tasks.
DnD outperforms the base LLM on multiple tasks, demonstrating a significant "drag" enhancement effect.
DnD can be well extended to a larger 7B base model and maintain strong performance in the more complex LiveCodeBench benchmark test.
By using the fine-tuned LoRA as training data, DnD successfully establishes a connection between the input prompts and the model parameters.
The team inputs prompts from datasets that were never seen during the training phase of DnD and lets it directly generate parameters for these new tasks to test its zero-shot learning ability.
The parameters generated by DnD in the weight space are close to the distribution of the original parameters and perform well in terms of performance.
The experimental results show that on the zero-shot test set, the team's method achieves an amazing improvement compared to the average performance of the LoRA models used for training and can generalize well to various real-world tasks and LLMs of different sizes.
Comparison with Other Fine-Tuning Methods
To further demonstrate the powerful ability of DnD, the team compared it with full-shot tuning, few-shot learning, and in-context learning.
Surprisingly, the performance of DnD surpasses that of full LoRA fine-tuning, and it is 2,500 times faster.
Although after more rounds of iteration, the performance of full fine-tuning will exceed that of DnD, the cost is an inference delay of up to 12,000 times.
In addition, when the number of samples is less than 256, the performance of DnD is stably better than that of few-shot learning and in-context learning.
Notably, both few-shot learning and in-context learning rely on labeled answers, while DnD only requires unlabeled prompts.
DnD can achieve performance comparable to or even better than that of full-shot tuning, while being 2,500 - 12,000 times faster.
Author Introduction
Zhiyuan Liang
Zhiyuan Liang is currently an intern at the High-Performance Computing AI Laboratory of the National University of Singapore, under the guidance of Professor Yang You. He also receives guidance from Dr. Kai Wang and Wangbo Zhao.
Previously, he obtained a bachelor's degree in artificial intelligence from the University of Science and Technology of China. He has also interned under the guidance of Professor Huaxiu Yao at the University of North Carolina at Chapel Hill and spent two years in the Data Science Laboratory of the University of Science and Technology of China under the guidance of his supervisor Xiang Wang.
His research interests mainly focus on efficient machine learning and parameter generation. He hopes to explore effective paths to achieve higher-level intelligence from the perspective of weight space learning.
Zhangyang (Atlas) Wang
Zhangyang Wang is currently a tenured associate professor in the Chandra Family Department of Electrical and Computer Engineering at the University of Texas at Austin and holds the Templeton Foundation Donation Chair #7.
He is also a core faculty member in the Department of Computer Science and the Computational Science, Engineering, and Mathematics Program at the Oden Institute of the university.
He obtained a Ph.D. in electrical and computer engineering from the University of Illinois at Urbana - Champaign in 2016, under the guidance of the computer vision pioneer Professor Thomas S. Huang, and a bachelor's degree in electronic engineering and information science from the University of Science and Technology of China in 2012.
His research interests mainly focus on laying a solid theoretical and algorithmic foundation for generative AI and neuro-symbolic AI.
The core goal is to create structured and modular model representations: 1) achieve efficient and robust learning in the over-parameterized model space; 2) seamlessly connect with symbolic knowledge and reasoning.