Where Are the Eight "Fathers" of Transformer Now?

None of the eight authors stopped searching for the next answer

Editor | Panda

A few days ago, Google lost two key figures in quick succession.

On June 18th, Noam Shazeer, one of the co - authors of the Transformer paper, announced his departure from Google on X and joined OpenAI. Two days later, John Jumper, the winner of the 2024 Nobel Prize in Chemistry and the leader of the AlphaFold team, also announced his departure from Google DeepMind, heading to Anthropic.

The consecutive announcements had a significant impact on the capital market: the stock price of Google's parent company, Alphabet, plummeted by more than 7% at one point, resulting in a market value loss of over $300 billion. Many analysis institutions attributed this sell - off to the "brain drain". Gil Luria, an analyst at D.A. Davidson, stated bluntly that Shazeer's move to OpenAI and Jumper's move to Anthropic made the market worry that Google was falling behind in the AI talent war.

Shazeer's departure this time is particularly noteworthy - this is his second time leaving Google.

In 2021, he left Google to found Character.AI because he was dissatisfied with the company's reluctance to publicly release the chatbot he led the development of. In August 2024, Google spent approximately $2.7 billion to acquire the technology license of Character.AI and invited him back to DeepMind, appointing him as the engineering vice - president of the Gemini project, working with Jeff Dean to co - lead the project. Less than two years later, he left again, this time for OpenAI, Google's arch - rival.

So far, all eight co - authors of the paper "Attention Is All You Need" published nine years ago have left Google.

Tyler Maran, a user on X, created a chart showing their current destinations, which has been widely shared on social networks.

However, this chart may soon be outdated. In the past two days, there have been rumors in the market that NVIDIA is quietly recruiting the core team of Essential AI, including Ashish Vaswani, one of the authors of the Transformer paper and the co - founder and CEO of Essential AI. As of the time of publication, neither NVIDIA nor Essential AI has officially responded to this matter.

Taking this opportunity, let's comprehensively review the nine - year resumes of these eight people known as the "Fathers of Transformer" and their current actual destinations.

It should be noted that the order of the authors of the paper "Attention Is All You Need" was randomly arranged. The footnote of the paper clearly states: All authors contributed equally, and the order was random. Therefore, there is no so - called "first author" or "corresponding author". This article will introduce these eight people in the order of their names in the original paper.

"The Origin of Everything": Eight Unconventional Google Employees

To understand their current destinations, we need to go back to 2017. At that time, the mainstream approach in the field of machine translation was the Recurrent Neural Network (RNN). The model had to process sentences word by word in sequence, like queuing to cross the road on a one - way street, unable to perform parallel computing. Training was slow and expensive.

Eight people from Google Brain decided to try a rather bold idea: discard the recurrent structure entirely and only keep the "attention mechanism". Let the model look at the entire sentence at once and determine which words should be given more attention. The phrase "Attention Is All You Need" in the paper's title is a play on the Beatles' song "All You Need Is Love" and has since become a form imitated by many paper titles.

The description of the authors' contributions in the paper briefly records what each person did:

Jakob Uszkoreit first proposed using self - attention to replace the recurrent structure and led the early verification of this idea;

Ashish Vaswani and Illia Polosukhin designed and implemented the initial Transformer model together, and were involved in almost every aspect of the project;

Noam Shazeer proposed the scaled dot - product attention, multi - head attention mechanism, and parameter - free position representation method. He was another person who was involved in almost every detail;

Niki Parmar designed, implemented, and debugged countless model variants in the initial codebase and the later tensor2tensor framework;

Llion Jones also tried a large number of new model variants and was responsible for the initial codebase, inference efficiency optimization, and visualization work;

Łukasz Kaiser and Aidan N. Gomez spent countless days and nights building the various modules of the tensor2tensor framework, replacing the early codebase and significantly improving the experimental results and research efficiency.

This description also indirectly reveals a detail: although the order of the authors' names in the paper was random, Uszkoreit, Vaswani, Polosukhin, and Shazeer clearly played more core roles at the architectural level, while Parmar, Jones, Kaiser, and Gomez took the lead in engineering implementation and system building - this is also an early indication of the differences in personality and expertise when the eight people later chose different paths.

The name "Transformer" also has an interesting story. Uszkoreit liked the pronunciation of this word, so the team internally called themselves "Team Transformer", and the cover of the early design documents featured six characters from the Transformers animation.

Since the publication of the paper, it has been cited more than 260,000 times, making it one of the most cited papers in the 21st century.

Ashish Vaswani

Vaswani was born in 1986 in India. He obtained a bachelor's degree in computer science from the Birla Institute of Technology (BIT Mesra) in 2002. Then he went to the United States to pursue a doctorate at the University of Southern California under the guidance of David Chiang, with a research focus on statistical machine translation and neural network language modeling. After completing his doctorate, he worked as a computer scientist at the Information Sciences Institute of the University of Southern California for two years. In 2016, he officially joined Google Brain as a research scientist and worked there until 2021.

According to the description of the authors' contributions in the paper, Vaswani and Illia Polosukhin designed and implemented the initial Transformer model together. He was one of the core figures "involved in almost every aspect of the project".

After leaving Google, in 2021, Vaswani co - founded Adept AI with Niki Parmar, David Luan, the former engineering vice - president of OpenAI, and others, serving as the chief scientist. The goal was to build a "behavioral model" that could perform operations autonomously in any software.

Adept once raised more than $400 million and was valued at approximately $1 billion, but the product failed to be launched, and there were also differences within the team. Vaswani and Parmar chose to leave early - his tenure as the chief scientist at Adept ended in November 2022.

At the beginning of 2023, Vaswani and Parmar joined hands again to co - found Essential AI, with Vaswani serving as the CEO. The company has received strategic investments from Google, NVIDIA, and AMD: the seed round of $8.3 million was led by Thrive Capital, and the $56.5 million Series A round at the end of 2023 was led by March Capital, with follow - on investments from Google, NVIDIA, AMD, KB Investment, Franklin Templeton, and other institutions. In early 2026, the company completed a $175 million Series B financing, led by Lightspeed Venture Partners, with Thrive Capital following. The valuation reached $1 billion, and it officially became a unicorn.

At the end of 2025, the company released its first open - source model series, Rnj - 1 (named after the Indian mathematician Ramanujan).

However, in the past two days, the situation has changed abruptly. It is reported that NVIDIA is recruiting the core team of Essential AI, and Vaswani himself is among them. He will participate in the research and development of NVIDIA's open - source model, Nemotron, in the future.

People familiar with the matter revealed that the reason is quite practical: Essential AI is facing a bottleneck in financing, and pulling Vaswani and his team away from AMD, a competitor of NVIDIA (AMD has been one of the early strategic investors of Essential AI, and the company has long relied on AMD's GPUs), is a cost - effective deal. Several researchers from Essential AI (including Alok Tripathy and Saurabh Srivastava) have updated their LinkedIn profiles, indicating that they have joined NVIDIA. However, as of now, neither NVIDIA nor Essential AI has officially confirmed this news.

Noam Shazeer

Shazeer was born in Philadelphia in 1976 and is an Orthodox Jew. His father, Dov Shazeer, is an engineer with a background in mathematics teaching, and his sister has been awarded the rabbi qualification by the Hebrew College. He showed extraordinary talent in his youth. In 1994, he participated in the International Mathematical Olympiad as a member of the US team and won a full - score gold medal. Then he entered Duke University to study mathematics and computer science. He was the recipient of the Angier B. Duke Memorial Scholarship and won awards in the Putnam Mathematical Competition.

In 2000, Shazeer joined Google. His early well - known achievement was fixing the spelling correction function of Google Search.

According to the description of the authors' contributions in the Transformer paper, he proposed the scaled dot - product attention, multi - head attention mechanism, and parameter - free position representation method. He was, apart from Vaswani and Polosukhin, the person "involved in almost every detail".

After co - authoring the Transformer paper in 2017, he and his colleague Daniel De Freitas developed the chatbot Meena, but Google did not publicly release it due to cautious considerations. The two left Google in 2021 to found Character.AI, which once raised more than $150 million from institutions such as a16z and developed a popular role - playing chat application.

In August 2024, the story took a turn: Google reached a licensing agreement with Character.AI, reported to be worth up to $2.7 billion. Shazeer and De Freitas returned to Google DeepMind with a small group of colleagues. He was appointed as the engineering vice - president and co - led the Gemini project with Jeff Dean and Oriol Vinyals. Since he personally held about 30% - 40% of the shares in Character.AI, this deal is estimated to have allowed him to cash out between $750 million and $1 billion. In 2026, he was elected as a member of the US National Academy of Engineering, and his resume seemed to be at its peak.

But just a few months later, he chose to leave again, this time for OpenAI. It is reported that he will be responsible for a direction called "architecture research", just in time for OpenAI's talent - recruiting period in preparation for its IPO (the company submitted a confidential S - 1 filing to the US Securities and Exchange Commission on June 8th, and the rumored valuation is as high as $852 billion).

Sam Altman, the CEO of OpenAI, publicly stated rarely: "Since the first day of OpenAI's establishment, he has been one of the people I most want to work with." He also said that this recruitment "had been in the works for a full decade".

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Where are the eight "fathers" of Transformer now?

"The Origin of Everything": Eight Unconventional Google Employees