A genius doctor from MIT was snatched away by the former CTO of OpenAI right after graduation, with an annual salary possibly starting at 3 million.
Genius MIT Doctor and Tsinghua Top Student Guangxuan Xiao Officially Announces Joining Thinking Machines, Focusing on Large Model Pre - training Next
A genius MIT doctor joins the startup founded by the former CTO of OpenAI right after graduation!
Recently, Guangxuan Xiao announced on social media that he has just completed his doctoral degree at MIT.
Next, he will join Thinking Machines and focus on the pre - training of large models.
Below the comment section, a group of big names such as NVIDIA scientists, xAI researchers, and those from UCSD sent their congratulations to him.
A Tsinghua Double - Degree Top Student with a Brilliant Life as an MIT Doctor
When you open his personal homepage, his diverse and fulfilling experiences come into view.
Guangxuan Xiao graduated from Tsinghua University with a double degree. He majored in Computer Science and had a second major in Finance.
During this period, he won many awards, including the Comprehensive Excellent Scholarship of Tsinghua University (2019), the First Prize in the National College Students Mathematical Modeling Contest (CUMCM) (2020), the National Scholarship (2020), and the "Future Scholar" Scholarship of Tsinghua University (2021).
He worked as a visiting student in the Department of Computer Science at Stanford University from 2020 to 2021 for scientific research.
In 2022, Guangxuan Xiao joined MIT to pursue a doctoral degree under the supervision of Professor Song Han.
His personal research focuses on efficient algorithms and systems for deep learning, especially large - scale foundation models.
He served as a full - time research assistant in the MIT EECS from September 2022 to January 2026.
During his doctoral studies, Guangxuan Xiao had several internships in top global technology companies for cutting - edge research, gaining rich front - line industrial R & D experience.
In 2023, he interned at Meta, researching the "Efficient Attention Mechanism for Streaming Language Models", and the related results were published on arxiv.
Paper link: https://arxiv.org/pdf/2309.17453
From February to May 2024, as an intern at NVIDIA, his research was focused on accelerating the inference of large language models with long contexts.
He and his team proposed DuoAttention, which combines retrieval and streaming attention heads to achieve efficient inference.
Paper link: https://research.nvidia.com/labs/eai/publication/duoattention/
Subsequently, he participated in several core research projects, including:
XAttention: A Block - Sparse Attention Mechanism Based on Anti - Diagonal Scoring
StreamingVLM: A Real - Time Understanding Model for Infinite Video Streams
FlashMoBA: Efficient Optimization of Mixture of Block Attention
It is worth mentioning that besides research, Guangxuan Xiao has a wide range of hobbies, such as football, table tennis, Go, and the piano.
He was the captain and forward of the football team in his department, and his favorite works are those of Beethoven.
A Doctoral Thesis Solving Three Major Problems of LLMs
Compared with his dazzling resume, Guangxuan Xiao's doctoral thesis itself is more worthy of in - depth analysis and dissection.
It has to be admitted that today's large models can do almost everything, but they are still too expensive.
Video memory explosion, slow inference, and direct OOM (Out of Memory) for long contexts are the realities that almost all LLM engineering teams face every day.
The paper "Efficient Algorithms and Systems for Large Language Models" provides a rare and complete answer from engineering to theory, from algorithms to architectures.
In the paper, they proposed SmoothQuant, which solves a long - standing problem in the industry - activation outliers.
SmoothQuant transfers the quantization difficulty from "activation" to "weights" through a clever mathematical equivalent transformation.
As a result, it achieved the first lossless W8A8 quantization on billion - level models without retraining, with smaller video memory and faster inference.
Regarding the processing of ultra - long sequences, the author discovered the "attention sink" phenomenon in StreamingLLM -
Even without any semantics, the initial tokens will be continuously attended to by subsequent tokens. The role of these tokens is not "understanding" but numerical stability.
As a result, constant - memory streaming inference was achieved, and the model's context length was extended from thousands of tokens to the million - level.
Furthermore, they extended this idea to the multimodal field. StreamingVLM can process video content up to several hours long while maintaining temporal consistency.
For ultra - long context scenarios, the team proposed a complementary solution targeting different performance bottlenecks.
When the KVCache is too large, use DuoAttention
The attention heads have different divisions of labor: a few are responsible for "global retrieval", and most only look at the "recent context".
DuoAttention uses a hybrid strategy to significantly reduce video memory with almost no performance loss.
When pre - filling (Prefill) is too slow, use XAttention
By using the anti - diagonal scoring mechanism, it only identifies and calculates the necessary attention blocks, thus achieving a significant acceleration effect.
At the end of the paper, it doesn't stop at "optimizing existing models". Through the signal - to - noise ratio analysis of MoBA (Mixture of Block Attention), the author proved that:
Theoretically, the smaller the block, the better.
But in reality, GPUs don't allow it. So there is FlashMoBA, a customized CUDA kernel that makes the small - block architecture feasible in practice and achieves a speed - up of up to 9 times.
The value of this paper lies in constructing a complete framework for efficient large models, which not only responds to current real - world challenges but also lays the foundation for the next - generation computationally efficient and accessible AGI.
An Average Annual Salary of $3.5 Million, Surpassing OpenAI
Finally, let's talk about a topic that everyone is interested in - salary.
Last year, the talent war in Silicon Valley was fierce. An exclusive report from BI revealed the salaries offered by Thinking Machines (TML) to its employees -
The basic annual salary is as high as $500,000 (about 3.5 million yuan).
According to the recruitment data obtained by BI, TML paid a basic annual salary of $450,000 to two technical employees, and the annual salary of another employee was as high as $500,000.
The fourth employee, listed as a "co - founder/machine learning expert", also had an annual salary of $450,000.
These salary data are from the first quarter of 2025, earlier than Murati's successful completion of a $2 billion seed - round financing at a valuation of $10 billion.
Overall, the average annual salary provided by TML to these four technical employees reached $462,500.
In contrast, TML's salaries are significantly higher than those of more established LLM companies in the industry -
The average annual salary of 29 technical employees listed by OpenAI in relevant filing documents is $292,115.
The highest salary is $530,000, and the lowest is $200,000.
Anthropic pays an average annual salary of $387,500 to 14 technical employees, with a salary range between $300,000 and $690,000.
Although it is still far from Meta's crazy compensation of over $1 trillion, this level is also among the top in Silicon Valley.
Sure enough, the most expensive thing in the 21st century is still talent.
References:
https://x.com/Guangxuan_Xiao/status/2008779396497502337
https://guangxuanx.com/
https://scholar.google.com/citations?user=sRGO - EcAAAAJ
https://www.eecs.mit.edu/eecs - events/doctoral - thesis - efficient - algorithms - and - systems - for - large - language - models/
https://www.businessinsider.com/muratis - new - ai - startup - salary - technical - talent - 2025 - 6
This article is from the WeChat official account "New Intelligence Yuan". Author: New Intelligence Yuan. Republished by 36Kr with authorization.