Top Chinese scientists from Google Gemini and Apple have left their jobs to start their own businesses, targeting AGI.
In the AI startup boom in Silicon Valley, the most expensive bets are always placed on the most experienced "brains."
Andrew Dai, a senior researcher who worked at Google DeepMind for 14 years, is in the process of founding an AI startup called Elorian.
This little-known company aims to raise up to $50 million in its seed round of financing.
Joining Andrew Dai is Yinfei Yang, an Apple research scientist who left the company in December last year.
These two tech veterans from Google and Apple are trying to solve the next core problem in the field of large models: Visual Reasoning.
It is highly likely that the lead investor in this round of financing will be Striker Venture Partners, founded by Max Gazor, a former general partner at CRV.
If the deal is finalized, this will be one of the most high-profile early-stage financings in Silicon Valley recently, and it also once again confirms the capital market's crazy pursuit of "Google graduates."
14 years, from the early days of BERT to the behind-the-scenes of Gemini
In the AI research circle, the name Andrew Dai represents a kind of "long-termism."
Different from those entrepreneurs who rushed into the field after the Transformer wave exploded, Andrew Dai's employee number at Google dates back to 2012.
This means that he has witnessed the entire cycle of deep learning evolving from an edge discipline to the center of the world.
In his LinkedIn profile, the most eye-catching part is his role as the co-lead of the pre-training data work for the Gemini model.
In the current large model war, data quality and pre-training strategies are considered the key factors determining the upper limit of a model's intelligence.
His ability to take charge of this core part is sufficient proof of his influence within Google.
Andrew Dai's academic contributions are not limited to this.
He has co-authored several papers with Google's chief scientist Jeff Dean and Quoc V. Le (a legendary figure at Google Brain).
As early as 2015, a paper he published on semi-supervised sequence learning was considered to have had a profound influence on OpenAI's GPT series of models.
https://proceedings.neurips.cc/paper/2015/file/7137debd45ae4d0ab9aa953017286b20-Paper.pdf
A person familiar with Andrew Dai commented, "He is one of the pioneers of language models and has been focusing on pre-training-related research for the past two decades. What he is best at is how to extract high-quality 'knowledge' from massive and noisy data sources."
If Andrew Dai represents Google's brute-force aesthetics in big data processing, then co-founder Yinfei Yang brings the refinement and multi-modal perspective of the Apple ecosystem.
Yinfei Yang previously served as a Principal Research Scientist on Apple's machine learning team, mainly participating in the development of Apple's self-developed AI models.
Before joining Apple, he also worked at Google Research for four years, focusing on multi-modal representation learning.
His expertise in the field of image-text co-embedding exactly fills the perceptual shortcoming of pure language models.
Visual Reasoning
Not just "seeing," but also "understanding"
What exactly does Elorian want to do?
According to Andrew Dai, Elorian is not trying to recreate a ChatGPT, but to build a native multi-modal model that can "simultaneously understand and process text, images, videos, and audio."
Most current AI models are trained based on text and then connect visual capabilities through "patches."
Elorian's vision is to build a natural "synesthete."
This model no longer converts pictures into text labels, but directly perceives the logic of the physical world through vision, just like humans.
"Visual reasoning" is considered a necessary step towards AGI.
Andrew Dai mentioned that robots will be a potential application scenario for Elorian's technology, but he emphasized that the company's vision goes far beyond that.
In the eyes of Silicon Valley investors, this usually means that Elorian is targeting the broad market of AI agents - a super assistant that can look at the computer screen like a human, understand the graphical user interface (GUI), handle return processes, review legal documents, and operate other software.
It doesn't need you to feed it data through an API, but directly "looks" at Excel spreadsheets, "listens" to phone recordings, and "reads" emails on the screen like you, while making real-time decisions.
This is the future that Elorian is trying to build.
The logic of capital
Paying for "bloodline"
A $50 million seed round financing sounded like a fantasy a few years ago, but in today's AI bubble, it seems to have become the "entry fee" for top teams.
Striker Venture Partners, which is in talks with Elorian to lead the investment, is itself a very topical new fund.
Its founder, Max Gazor, was a partner at the veteran venture capital firm CRV and is known for his sharp investment vision.
He founded his own firm in October last year, and Elorian is likely to be one of the first iconic bets of this fund.
For investors like Max Gazor, they are betting not only on the technical path but also on the scarce genetic combination of "Google DeepMind + Apple."
Google provides experience in large-scale training infrastructure, while Apple has a practical culture of implementing AI in specific products.
The emergence of Elorian also reflects the shift of the large model battlefield.
The first stage of the war was about "text generation," and OpenAI took the lead with ChatGPT;
The second stage of the war is about "multi-modal understanding" and "interaction with the physical world."
On this new battlefield, both Gemini and GPT are frantically catching up on visual capabilities.
As a startup, the only bargaining chip for Elorian to survive among the giants is a technological gap or to excel in vertical scenarios (such as complex visual agents).
In Silicon Valley, every top researcher who leaves a tech giant harbors a "rebellious" dream: to use a smaller team and more focused resources to subvert the large and slow bureaucratic system of their former employers.
Andrew Dai left Google after 14 years of service, and Yinfei Yang left Apple, the company that launched Apple Intelligence.
They have chosen the most difficult path - trying to teach machines not only to "see" the world but also to "understand" it.
This reminds people of an old saying in the field of computer vision: "The camera is just the eye, and the algorithm is the soul."
In the torrent of AI, what is truly scarce is never computing power, but the eyes that can see through the fog of data and identify the future direction.
Reference materials:
https://www.theinformation.com/articles/former-google-apple-researchers-raising-50-million-new-visual-ai-startup
This article is from the WeChat official account "New Intelligence Yuan", author: New Intelligence Yuan, editor: Allen. Republished by 36Kr with permission.