Yoshua Bengio teams up with XIE Saining again, with NVIDIA participating in the investment. The new company is betting on "what comes after LLM".
On March 10th, APPSO exclusively learned that the World Model Institute/startup AMI has completed a $1.03 billion financing, with a pre - investment valuation of $3.5 billion. The company was founded by Yann LeCun, a Turing Award winner and former chief AI scientist at Meta.
AMI stands for Advanced Machine Intelligence. It focuses on world models as its main research and development direction, aiming to develop world models that can learn abstract representations from the real world.
It's worth mentioning that Xiesaining, a top expert in AI basic research, an old friend and school colleague of Yann LeCun, has officially joined AMI as the Chief Science Officer.
Xiesaining is an absolute authority in visual representation learning and one of the co - authors of diffusion transformers (DiT). The introduction of the DiT architecture allows visual models to benefit from the Scaling Law just like large language models. By replacing the U - Net, which had been used for a decade, with a Transformer backbone, Xiesaining and others' work has enabled the simulation of complex and high - fidelity images/videos, laying the foundation for the launch of top - tier visual generation models and tools such as Sora and SeeDance.
According to a financing memo obtained by APPSO, AMI's current round of financing will be used to support long - term scientific research, global recruitment, and reliable products in the field of world models.
AMI's official website
Yann LeCun has expressed the hope of establishing Europe as the global "third pole" of artificial intelligence, alongside China and the United States. AMI is headquartered in Paris and will set up offices in New York, Montreal, and Singapore.
Four of the six core founders of AMI are directly from Meta's FAIR (Fundamental AI Research) team, and the other two also have deep ties with Meta. Yann LeCun serves as the company's chairman, and the CEO is someone else.
AMI means "friend" in French. Yann LeCun himself has indicated that "it should be pronounced in French."
At the beginning of this year, Yann LeCun was interviewed by MIT Technology Review in his Paris apartment. He had just left Meta not long ago. When asked about his views on Meta's AI strategy, he replied, "I may not agree with all of his (Mark Zuckerberg's) decisions. But people make decisions for reasons, and there's no need to be angry."
He said at the time, "Meta might become our first customer."
The 'opponent' of LLM gets $1 billion
APPSO learned that AMI's current round of financing was supported by several extremely important investors.
In this round of financing, Cathay Innovation, Greycroft, Hiro Capital, HV Capital, and Bezos Expeditions led the investment; strategic investors include NVIDIA, Toyota Ventures, Temasek, SoftBank, Mark Cuban, the Mulliez family, etc.; follow - on investors include Eric Schmidt, Publicis Groupe, Samsung, Tim Berners - Lee, etc.
Cathay Innovation was founded by Cai Mingpo, a well - known figure in the Sino - French economic circle. He has invested in many companies such as Pinduoduo, Yuanqi Forest, and JD Logistics.
Bezos Expeditions is the family office of Jeff Bezos, the founder of Amazon.
Most of the leading investors are top - tier funds based in Europe.
Mark Cuban is a well - known investor in the industry and a former owner of an NBA team.
The Mulliez family is a top - tier business family in France, which controls brands such as Decathlon and Auchan.
Eric Schmidt is the former CEO and chairman of Google/Alphabet.
Tim Berners - Lee is the inventor of the World Wide Web (www).
In 2023, after the explosion of ChatGPT, large language models (LLMs) almost became synonymous with "AI." Yann LeCun is one of the few top - tier researchers who have continuously publicly voiced opposition since then.
His criticism is based on strategic judgment. The essence of LLM is the statistical laws of text. It can manipulate language well, but it cannot understand the physical world and cannot truly perform "reasoning and planning" (at that time).
In the MIT Technology Review interview, Yann LeCun said, "Why don't we have a household robot as agile as a house cat?" Behind this statement is the "Moravec's paradox": Perception, motor coordination, and physical intuition, which are abilities that humans don't need to think about, are precisely the most difficult parts for AI, and LLMs completely bypass these.
A simple analogy is a baby learning about gravity: No one tells the baby the equation of gravity, but the baby knows that things will fall when let go. This is a rule refined from observation, not an exhaustive list of physical details. JEPA enables AI to do the same thing.
In the materials obtained by APPSO, Yann LeCun said:
AI has made significant progress in the past decade. Prediction and generation systems have changed the way we analyze, extract knowledge, and create content globally. Now, as AI transcends the limitations of the screen, intelligence cannot stop at simply generating results. It must understand the context, preserve the context, predict outcomes, and behave more reliably over time.
To achieve this goal, AMI will build a new generation of AI systems that can understand the world, have long - term memory, be capable of true reasoning and planning, and be secure and controllable end - to - end.
Yann LeCun, Geoffrey Hinton, and Yoshua Bengio jointly won the Turing Award
Yann LeCun's solution is the JEPA architecture: Joint Embedding Predictive Architecture, a learning framework proposed during his time at Meta.
The core idea is to let the model learn the "abstract representation" of the world and make predictions in that abstract space, rather than trying to restore all the details.
The V - JEPA series is the most mature engineering implementation of this idea so far, and its leader, world model expert Michael Rabbat, is now the Vice President of World Models at AMI.
Xiesaining has also been working in a related direction at New York University recently. The "Solaris" published by his team is a multi - player video world model built with Minecraft, used to test the prediction and planning abilities of AI in a dynamic environment.
"We will have AI systems that reach human - level intelligence," Yann LeCun said. "But they won't be built on LLMs. This won't happen next year or the year after. It takes time and significant conceptual breakthroughs. And this is exactly what I've been researching, and it's also the direction of AMI."
He also left a message for the academic community:
"Don't work on LLMs. It's meaningless. You can't catch up with the industry. Invent new technologies and solve problems beyond the current systems. Breakthroughs won't come from making LLMs even bigger."
From the convolutional neural network CNN, to JEPA, and to the FAIR founded during the former Facebook era, Yann LeCun has always been doing difficult things that seem to take a long time to prove. This time, he has received $1 billion, a team of former colleagues and old friends - and more importantly, autonomy.
Xiesaining
Xiesaining obtained a Ph.D. in computer science from the University of California, San Diego. He then worked at Meta FAIR (Silicon Valley headquarters) for four years, and later served as a research scientist on the GenAI/nano team at Google DeepMind while also being an assistant professor at NYU's Courant Institute of Mathematical Sciences. His Google Scholar citations exceed 96,000.
Xiesaining's most well - known work, as mentioned earlier, is the "Scalable Diffusion Models with Transformers" published in 2022 in collaboration with his student William Peebles, namely DiT.
This paper switched the backbone network of the diffusion model from the U - Net to the Transformer architecture. Before that, the diffusion models in the field of image generation generally used the U - Net, a visual segmentation architecture that had been used for nearly a decade. After the release of DiT, both the quality and scalability were improved, making it the standard reference for the new architecture of generative models. The underlying frameworks of later models such as Sora, the new version of Stable Diffusion, and the currently popular SeeDance visual model all extend from this framework.
It's worth mentioning that Peebles, the one who wrote DiT, is now one of the core leaders of OpenAI's Sora team. Another of his students, Demi Guo, is the founder of the well - known AI video startup Pika.
Xiesaining's outstanding students also include Eric Mintun (OpenAI Sora), Zihan Zheng (OpenAI Technical Staff), Liu Zhuang (Professor at Princeton University), You Jiaxuan (Professor at UIUC), etc.
Xiesaining himself has focused more on world models. According to the list of AMI's co - founders obtained by APPSO, Xiesaining's introduction only has one sentence:
Training world models over word models.
Xiesaining's earlier representative works include:
ConvNeXt: In 2022, in collaboration with colleagues at Meta FAIR, it brought the convolutional network back to the level of direct competition with the Vision Transformer, proving that architectural innovation can revive a route that was considered "outdated."
MAE: Masked Autoencoders, in collaboration with He Kaiming, was a CVPR 2022 Oral presentation. It transferred the BERT - style self - supervised learning method to the visual field, influencing many subsequent visual pre - training methods.
MoCo: Momentum Contrast, also in collaboration with He Kaiming, is one of the foundational works in self - supervised visual representation learning.
Before the news of his joining AMI was exposed, there was a line on his personal website: "I will take a sabbatical in the spring and summer semesters of 2026."
When Yann LeCun was asked in the MIT Technology Review interview whether Xiesaining would join, he didn't confirm directly. He said, "I've hired him twice. I hired him at FAIR, and later convinced my colleagues at NYU to recruit him. I think very highly of him."
The two have a large number of co - written papers in the public domain, covering research on the visual limitations of multi - modal LLMs to spatial reasoning.
This cooperation had been in operation long before the establishment of AMI.
FAIR veterans gather
Meta's FAIR was one of the most respected industrial AI research institutions globally during the peak of the previous AI cycle. Its research style is more academic: publishing papers, doing open - source work, and encouraging long - term research, which is fundamentally different from the product - commercialization - centered models of OpenAI and Anthropic. The most important AI training framework to date, PyTorch, and the earliest batch of open - weight large models, Llama, both came from FAIR.
Yann LeCun mentioned in the interview that the robot research team at FAIR was later laid off, and he thought this was a strategic mistake. This might be one of the reasons for his departure, although he didn't say it directly.
In a sense, the AMI team he leads is a "re - combination of the best" of the FAIR teams in Montreal and Paris. Four of the six core founders are directly from FAIR, and the other two also have deep ties with Facebook/Meta.
Xiesaining, the Chief Science Officer, has been mentioned earlier.
Michael Rabbat is in charge of world model research. He will be responsible for the Montreal office at AMI. Rabbat is one of the founding members of the former FAIR Montreal laboratory. After teaching at McGill University for more than a decade, he joined Meta full - time, where he led the research and development of three world model series: I - JEPA, V - JEPA, and V - JEPA 2.
V - JEPA 2 has the greatest influence: Through video self - supervised training, with less than 62 hours of robot operation data, it can control a robotic arm to complete grasping tasks in a completely unfamiliar laboratory environment with zero - shot learning. The core logic of this method is directly connected to AMI's entire company's technical route.
The CRIO (Chief Research and Innovation Officer) of the company is the well - known Chinese computer scientist Pascale Fung, a fellow of top - tier academic institutions such as AAAI, IEEE, and ACL, and a chair professor at the Hong Kong University of Science and Technology.
She was born in Shanghai and has furthered her studies at Kyoto University, the French National Center for Scientific Research, and Columbia University. Her later work at Meta FAIR focused on embodied AI and visual - language world models, with the main application scenario being smart glasses. Pascale Fung is currently a professor in the Department of Electronic and Computer Engineering at the Hong Kong University of Science and Technology