NeurIPS 2025 Best Papers Winners Announced: Decade-Old Classic Work by He Kaiming, Sun Jian, Etc. Wins Award

Half of them are ethnic Chinese.

Today, the best papers of NeurIPS 2025 have been announced! Among the four best papers, the majority are from Chinese researchers. The paper "Faster R-CNN" proposed by He Kaiming, Sun Jian and others has won the "Test of Time Award", which is well-deserved.

The results of the best papers of NeurIPS 2025 are out!

Today, the organizing committee of NeurIPS announced the list of winners of this year's "Best Paper" award. There are a total of four best papers.

In addition, there are also three runner-up papers (Runners Up) that have won awards. These seven award-winning papers span multiple fields:

Theory of diffusion models, self-supervised RL, attention mechanism, reasoning ability of LLM, theory of online learning, neural scaling, and benchmark evaluation methods for measuring the diversity of language models

What's more significant is that this time, the "Test of Time Award" has been awarded to the paper "Faster R-CNN", co-authored by Ren Shaoqing, He Kaiming, Ross Gisshick, and Sun Jian.

This year is the 39th annual meeting of NeurIPS. Different from previous years, NeurIPS 2025 is the first dual-city conference, which will be held respectively:

From December 2nd to 7th at the San Diego Convention Center

From November 30th to December 5th in Mexico City

Currently, during the session of the Mexico City branch, the best papers have been announced together.

Let's take a look at which big names have won the awards?

Among the best papers, Chinese researchers hold a significant position in AI

Paper 1: Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)

Authors: Liwei Jiang, Yuanjun Chai, Margaret Li, Mickel Liu, Raymond Fok, Nouha Dziri, Yulia Tsvetkov, Maarten Sap, Yejin Choi

Institutions: University of Washington, Carnegie Mellon University, Allen Institute for Artificial Intelligence, Lila Sciences, Stanford University

Paper link: https://openreview.net/forum?id=saDOrrnNTz

Large language models often struggle to generate diverse and human-like creative content, raising concerns that long-term exposure to homogeneous outputs may lead to the convergence of human thinking.

However, currently, there is still a lack of scalable methods for evaluating the diversity of LM outputs, especially when going beyond narrow tasks such as random number generation or single-model repeated sampling scenarios.

To fill this gap, researchers from institutions such as the University of Washington have launched the large-scale dataset Infinity-Chat.

Infinity-Chat contains 26,000 real-world open-ended user queries that allow multiple reasonable answers to coexist without a single standard solution.

Figure 1: Clustering of responses to the query "Write a metaphor about time" (visualized by reducing the sentence embeddings to a two-dimensional space through principal component analysis)

This is the first time a complete classification system for open-ended prompts of LMs has been proposed, including six top-level categories (such as creative content generation, brainstorming and ideation) and 17 subcategories under them.

Through Infinity-Chat, researchers conducted a large-scale study on the model collapse of LMs and found a significant "Artificial Hivemind effect" in open-ended generation, specifically manifested as:

Internal repetition within the model - a single model continuously generates similar responses;

Homogeneity between models - different models produce surprisingly similar outputs.

The dataset also contains 31,250 human annotations, including absolute ratings and pairwise preference comparisons. Each example was independently judged by 25 annotators, making it possible to study group and individual preferences in open-ended queries.

The study shows that the most advanced LMs, reward models, and LM judges, when faced with model-generated results that trigger individual preferences of annotators, although maintaining comparable overall quality, have difficulty calibrating human ratings.

Overall, Infinity-Chat is the first large-scale resource for systematically studying real-world open-ended LLM queries, providing key insights for alleviating the long-term AI security risks brought about by the artificial hivemind.

Paper 2: Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

Authors: Zihan Qiu, Zekun Wang, Bo Zheng, Zeyu Huang, Kaiyue Wen, Songlin Yang, Rui Men, Le Yu, Fei Huang, Suozhi Huang, Dayiheng Liu, Jingren Zhou, Junyang Lin

Institutions: Alibaba Qwen Team, University of Edinburgh, Stanford University, MIT, Tsinghua University

Paper link: https://openreview.net/pdf?id=1b7whO4SfY

The gating mechanism has been widely used since the early LSTM and highway networks, and can still be seen in recent state space models, linear attention, and Softmax attention.

However, existing studies rarely conduct in-depth analyses of the specific effects of gating.

This study comprehensively explored the gated enhanced Softmax attention variants through systematic experiments: a 15B mixture-of-experts model (30 variants) and a 1.7B dense model were trained on a 3.5 trillion token dataset for comparative analysis.

The core findings show that simply introducing a head-specific Sigmoid gating after the scaled dot-product attention (SDPA) can continuously improve the model performance. This improvement also enhances the training stability, allows for a larger learning rate, and improves the scaling characteristics.

By comparing different gating positions and computational variants, the researchers attributed its effectiveness to two key factors:

(1) Introducing non-linear transformation in the low-rank mapping of Softmax attention;

(2) Using query-dependent sparse gating scores to regulate the SDPA output.

Notably, this sparse gating mechanism can alleviate "activation explosion" and "attention sink", and improve the long-context extrapolation performance.

To promote subsequent research, the relevant code and models have been open-sourced. This most efficient SDPA output gating technology has been applied to the Qwen3-Next model series.

Qwen3-Next-80B-A3B-Thinking-FP8 architecture

Paper 3: 1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities

Authors: Kevin Wang, Ishaan Javali, Michał Bortkiewicz, Tomasz Trzcinski, Benjamin Eysenbach

Institutions: Princeton University, Warsaw University of Technology

Paper link: https://openreview.net/pdf?id=s0JVsx3bx1

The progress of large-scale self-supervised learning has continuously driven breakthroughs in the fields of language and vision, but comparable breakthroughs have not been achieved in the field of reinforcement learning (RL).

This paper focuses on the core building blocks of self-supervised reinforcement learning and finally achieves a qualitative leap in scalability by exploring the key value of network depth.

In sharp contrast to the shallow architectures (about 2 - 5 layers) used in most reinforcement learning studies in recent years, this experiment proves that increasing the network depth to 1024 layers can bring significant performance breakthroughs.

Network architecture

Under the unsupervised goal-conditioned setting, the researchers conducted experiments - without providing any demonstration data or reward signals, the agent must explore the environment from scratch and learn autonomously how to maximize the possibility of achieving the specified goal.

The evaluation results on simulated motion and manipulation tasks show that the new method improves the performance of the self-supervised contrastive reinforcement learning algorithm by 2 to 50 times, significantly outperforming other goal-conditioned baseline models.

The increase in network depth not only improves the task success rate but also triggers a qualitative change in the learning behavior of the agent.

Paper 4: Why Diffusion Models Don’t Memorize: The Role of Implicit Dynamical Regularization in Training

Authors: Tony Bonnaire, Raphaël Urfin, Giulio Biroli, Marc Mezard

Institutions: PSL University of Paris, Bocconi University of Milan

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

The winners of the best papers at NeurIPS 2025 were announced. A classic work from a decade ago by He Kaiming, Sun Jian, etc. won the award.

Among the best papers, Chinese researchers hold a significant position in AI

Paper 1: Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)

Paper 2: Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

Paper 3: 1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities

Paper 4: Why Diffusion Models Don’t Memorize: The Role of Implicit Dynamical Regularization in Training