HomeArticle

The opening of the interdisciplinary attention mechanism interview series

绿洲资本2025-09-05 11:42
In 2017, a paper titled "Attention is All You Need" proposed the Transformer architecture, which has become the structural foundation of generative AI. From language models to multimodal models, from BERT to GPT, and then to the rise of the Diffusion model, the Attention mechanism has always been at the core of technological leaps. The widespread application of Stable Diffusion has even broken through the original logic of image generation, pushing "denoising" as a structural way of thinking to the forefront: instead of trying to "construct" an image, it is based on a brand - new assumption: The image has always been there, just covered by noise. This is also the underlying methodology that Oasis has always adhered to.

Looking back on the main line of AI technology paths over the past seven years, Attention has been the common underlying foundation for almost all key advancements. It is not merely a model component but a paradigm regarding structure, focus, and information distribution, and more importantly, a transfer of technical methodology.

From that moment on, we have entered a new era.

Standing at today's juncture, we will conduct a series of in - depth interviews centered around Attention, focusing on the interdisciplinary research of the "attention mechanism."

This article is a Q&A before Oasis kicks off this series of interviews, attempting to clarify a question: If this is not a review or tribute to a classic topic, then why do we need to revisit Attention today?

Actually, it's not just images that are covered by noise; the market is also affected.

From 2022 to the first half of 2023, the mainstream market discussions were hesitant and debated whether AI was a huge bubble and whether there was a difference between this generation of AI and the previous one. Amid the noise, Oasis completed the construction of most of our core investment portfolios in the fields of AI and embodied intelligence in the first half of 2023, including nearly twenty projects such as MiniMax, Vast, Boson, Zhujidongli, Qianxun Intelligence, and Jike Technology.

Because we believe that this is an innovation beyond the level of the industrial revolution, with a shorter time frame and greater energy level.

Then we launched Oasis' first in - depth interview series with the theme of "AI."

The motivation for the launch came from a realization when we were building these AI portfolios: This is not a revolution brought about by changes in products or operating models, but a scientific exploration centered around cutting - edge artificial intelligence technologies. Therefore, we must step back and return to the most fundamental question of "What exactly is AI?" to explore—we need to communicate with first - line professors and scholars around the world to discuss what AI is? What is GPT? And what technologies and understandings are the changes we are witnessing based on?

At that time, Oasis interviewed dozens of professors globally. Through these in - depth interviews, we formed a realization: The large - scale models we see are essentially the future infrastructure. In the binary world, intelligence will be standardized, managed, and distributed, just like the human power system today, supplying model capabilities to all places that need "electricity," and these supplied terminals are the "electrical appliances" in the era of artificial intelligence.

This realization became the conclusion of Oasis' first in - depth interview series on AI and also the beginning of a new question: If we have understood the form of the "power supply system," then what will the future "household appliances" be?

So Oasis launched the second in - depth interview series—Agent.

In July 2023, the mainstream views in the market focused on two directions of bets: One believed that the future belonged to large - scale models in vertical fields, while the other believed in the evolution of general large - scale models themselves. At that time, few people stepped out of the model itself to focus on the system form that carried the model capabilities—Agent.

Although we continuously wrote articles and emphasized a judgment in the interview "The People Who Invest Most Aggressively in AI": We do not think that the future belongs to vertical models, and model generality will inevitably become the end - game. But just focusing on generality is not enough. We should pay more attention to how model capabilities are encapsulated into interfaces, which is what Oasis sees as Agent.

Agent is the future.

To this day, Agent has become a prominent discipline in the field of AI. But going back to mid - 2023, it was neither favored by the mainstream market nor had a unified theoretical understanding.

So Oasis launched the second in - depth interview series with Agent as the theme, once again looking for top - level researchers and professors globally to speculate and discuss an essential question: What exactly are we talking about when we talk about Agent?

This series lasted for almost a whole year until August 2024. During the interviews, the answer gradually emerged: Agent is not a fragmented thing or some kind of shell. Microscopically, Agent is an activatable and adaptable behavioral unit, similar to a living being; macroscopically, Agent is more like a river.

Essentially, Agent is a service that integrates needs and intelligence, driven by large - scale models and presented based on specific scenarios. Its core is not a tool but a way of existence.

So far, the second in - depth interview series has come to an end.

We thank the many researchers who had in - depth conversations with Oasis in the two previous series on AI and Agent. They jointly constitute the key path for exploring this transformation. Now, we are going to launch the third in - depth interview series with the theme of Attention.

Then, going back to the question at the beginning of this article: What prompted our third topic selection?

Just as the famous paper "Attention is All You Need" mentioned at the beginning of the article, humans have always been trying to teach machines one thing:

What is attention?

But why are humans so persistent in teaching machines to understand what attention is?

Take a simple example. When humans are driving a car, they will instinctively notice changes in road signs or a suddenly jumping rabbit, but AI may not be able to do so. This is not because AI is not smart enough. On the contrary, it is because the computing power of the human brain is extremely limited, and its processing ability is far less than the total amount of information received by the retina, which forces us to evolve a mechanism called the attention mechanism. This mechanism allows humans to quickly lock in the most important information at present and filter out some less important noise.

Unfortunately, AI itself does not have this mechanism. In the world of AI, all pixels are equal. With infinite computing power, it will try to process all inputs completely. Therefore, humans have long been trying to find a methodology and build a new paradigm to enable AI to have attention and good scalability (Scaling Law), believing that in this way, AI will be able to process information better.

With the development of technological exploration, we are glad to see that, for example, Flash Attention (Lightning Attention Mechanism) recently launched by MiniMax, an enterprise invested by Oasis, optimized the attention module within the Transformer architecture, significantly improving the computing power efficiency in the training and inference stages and achieving a breakthrough in the algorithm level of Attention itself. And the significance of the attention mechanism has long gone beyond the optimization of the model structure itself. In the past few years, Attention has not only promoted breakthroughs in language models but has also gradually penetrated into multiple disciplines such as brain science, cognitive science, and psychology. We have begun to realize that the process of AI learning attention is, in turn, helping us to re - understand human perception and cognition.

So what's the conclusion?

The conclusion is that we see AI showing a dual - evolution path: On the one hand, scholars around the world are trying to conduct larger - scale training on the Transformer structure; on the other hand, they are also trying to innovate further in terms of cognitive structure and algorithm framework to improve and promote AI to learn the question we have always hoped it would learn— What is attention?

If we want to continue to understand the future of AI in - depth today, then the next step of exploration should point to a more fundamental question:

In a system composed of humans and AI, what exactly does attention mean?

Taking one more step forward, when we move from technological research to the self - examination of human society, when Agents become the main producers in society and they understand humans better and better, the human attention mechanism will be challenged as never before. Twenty years ago, we read books; ten years ago, we watched movies; five years ago, we watched short videos; now we will be lost in the infinite fragmented information generated by artificial intelligence.

Any of our thoughts can lead to infinite information, and the world will be further fragmented.

Then, a deeper question begins to emerge: When humans are helping AI to learn and improve attention, how can we protect our own attention?

The answer may not be so optimistic.

Data shows that a person picks up their phone more than 500 times a day on average, and the duration of attention is being compressed to less than 100 seconds. From long - length movies to short videos, from in - depth reading to information slicing, the attention window that humans can maintain is in a continuous state of decline. At the same time, AI is increasing the speed of information acquisition and response to an unprecedented level. If a super AI emerges in the future that can accurately capture human preferences, predict needs, and generate all the desired content, then the human attention mechanism will decline further. Will attention itself even be outsourced? Will humans eventually completely hand over the "power of attention" to machines?

The Buddhist saying "consciousness mind" means that where a person's mind and consciousness stay, the world appears there. From the perspective of signal theory, attention determines the frequency of consciousness, and where our frequency is, the time domain is there. Translated into the scientific language system, it is similar: A person's ultimate self - management is actually the management of attention. In this era where AI and humans are destined to coexist, understanding "attention" is not only necessary for clarifying the development of AI technology but also a necessary path for human self - development.

While helping AI build attention, also help ourselves protect attention.

This is the answer to the question at the beginning of this article and the starting point of our third in - depth interview series.

This series will last longer than our previous two. Oasis will also spend more time and resources to complete it. We believe that in the process of doing this, we will meet like - minded friends. We look forward to building new understandings with you.

The first - phase content of this series will be released in August, and subsequent content will be updated monthly. We hope you enjoy it.

Support the vitality.

This article is from the WeChat official account "Oasis Capital Vitalbridge," written by "Exhibition of Vitality," and is published by 36Kr with permission.