HomeArticle

The father of AlphaGo with 300,000 citations has raised nearly 10 billion yuan in just four months of starting his business and firmly believes that RL can achieve ASI.

新智元2026-04-28 18:14
The seed round financing has exceeded that of Yann Lecun, setting a European record.

[Introduction] Ineffable Intelligence, founded by David Silver, the father of AlphaGo, has received $1.1 billion in seed funding, setting a European financing record and reaching a valuation of $5.1 billion. This company is betting on reinforcement learning and self-experience learning, attempting to challenge the mainstream of large models that rely on the Scaling Law.

On April 27, Ineffable Intelligence, a London-based AI laboratory founded by David Silver, a former top researcher at Google DeepMind and a professor at UCL, announced the completion of a $1.1 billion seed round of financing, with a post-investment valuation of $5.1 billion.

https://www.cnbc.com/2026/04/27/deepmind-ineffable-intelligence-record-seed-funding-nvidia-google.html

This is the largest seed round in Europe to date.

Sequoia Capital and Lightspeed Venture Partners co-led the investment, with participation from Nvidia, Google, Index, DST Global, the UK Sovereign AI Fund, and others.

Ineffable's goal is to create a "superlearner" that discovers knowledge from its own experience and continue to push reinforcement learning towards ASI.

The special thing about this funding lies in the stage.

Ineffable has only been established for a few months, and its public products, revenue, and roadmaps are still limited, but it has already achieved a valuation of $5.1 billion right from the start.

AI investment has entered a new stage. The personal credit of top researchers is replacing traditional business verification as the scarcest collateral for early-stage financing.

This huge sum of money is invested in reinforcement learning

In the past three years, the mainstream in the AI industry has been large language models.

Larger corpora, larger clusters, and stronger reasoning have almost become the common script for all leading companies.

Silver has chosen another path: reinforcement learning.

The core of reinforcement learning is to let the model act in the environment and correct its strategy through feedback.

Closed systems such as Go, chess, and StarCraft are where it first made a name for itself.

The game "StarCraft 2"

Silver's new company wants to scale up this method, enabling the system to learn from basic motor skills all the way to breakthroughs in science, mathematics, and technology.

In the company's public statement, Ineffable's mission is to "make the first contact with superintelligence."

This is also where Silver disagrees with the large model approach.

Large language models mainly learn from the texts and codes written by humans, and their ability boundaries are largely restricted by human data.

When interviewed by Wired, Silver compared human data to fossil fuels and self-learning to renewable energy.

This metaphor also explains why investors are willing to write huge checks to a laboratory that doesn't have a fully developed business model.

Is reinforcement learning the way out after the Scaling Law hits a wall?

The traditional Scaling Law that relies on massive amounts of human data has not failed, but the marginal returns are getting worse.

Continuing to increase parameters, corpora, and training computing power will still bring improvements, but high-quality human texts are becoming a bottleneck.

Epoch AI estimates that the effective inventory of publicly available high-quality human texts is about 300 trillion tokens. According to the trend, it may be completely used up as early as this year or at the latest in 2032.

https://epoch.ai/blog/will-we-run-out-of-data-limits-of-llm-scaling-based-on-human-generated-data

In other words, the old paradigm can still work, but it's getting more expensive and slower.

Pure reinforcement learning does provide a path closer to AGI/ASI because it shifts the model from "imitating human texts" to "gaining experience through action and feedback."

AlphaGo Zero has proven that in an environment with clear rules and explicit feedback, the system can reach superhuman levels through self-play without relying on human Go game records.

OpenAI o1 also shows that large-scale reinforcement learning and more thinking time during testing can significantly enhance complex reasoning abilities.

However, pure reinforcement learning is difficult to solely shoulder the AGI path in the short term.

Tasks such as Go, mathematics, and code have clear verifiers, and reinforcement learning is very effective in these areas;

Problems in the real world do not have stable reward functions, the exploration cost is high, and safety and alignment are also more difficult.

Google DeepMind's AlphaProof is more like a model for the real-world direction. It combines pre-trained language models, Lean formal verification, and AlphaZero-style reinforcement learning, achieving a silver medal level in the IMO.

So a more reliable judgment is that in the future, it's not a choice between large model pre-training and reinforcement learning, but a hybrid approach.

Pre-training provides the knowledge and language foundation, reinforcement learning provides action feedback and goal pressure, and search, verifiers, tool calls, and simulation environments provide sustainable new experiences.

The key to ASI is to enable it to continuously make mistakes, verify, discover, and turn experiences back into capabilities.

People from big companies are becoming new startups

Ineffable has caught a window of opportunity.

Companies such as OpenAI, DeepMind, Anthropic, and xAI have gathered the scarcest talents in the previous round of AI competition, and these talents are now spilling over into the startup market.

Large model companies continue to compete with huge computing power and product distribution. Those who leave take new routes, new organizations, and higher upside potential to get a stake at another table.

Similar cases are increasing.

TechCrunch mentioned that Recursive Superintelligence, founded by former DeepMind researcher Tim Rocktäschel, was reported to have a potential financing need of up to $1 billion;

After Yann LeCun left his position as the head of Meta AI, the AMI Labs he participated in announced a $1.03 billion financing in March.

Ineffable is not an isolated case. It is one of the most eye-catching financing cases in the wave of top researchers starting their own businesses.

This also explains why the UK government has entered the game.

The UK Sovereign AI Fund and the British Business Bank participated in this round of financing. The latter confirmed an investment of $20 million and said that it has made 9 AI investments in the past 12 months, including companies such as Wayve and PolyAI.

For the UK, after DeepMind was acquired by Google, London has long had a high density of top AI talents, but there has been a lack of cutting-edge laboratories that can stay in the country and continue to expand.

Ineffable provides an opportunity to place a new bet.

The biggest problem is moving from games to the real world

Ineffable's technical narrative is clear, but there are also visible risks.

Go, chess, and StarCraft have rules, boundaries, and computable feedback.

Real-world scientific discoveries, technological inventions, and social systems do not have such stable reward functions.

How to transfer the strategies learned by an agent in a simulated environment to the open world is an unavoidable problem for reinforcement learning to move towards general intelligence.

Silver's answer is still simulation.

According to Wired, he hopes to put agents into a simulated environment, let them learn to achieve goals, cooperate with each other, and observe how they treat other agents.

This method has an advantage in that the system's behavior can be observed in a more controllable space;

It also has a difficulty. The simulated world must be rich enough to train abilities that are useful in the real world.

Safety issues will also be magnified.

A system that learns from experience and continuously searches for better strategies may discover paths that humans have not preset.

This is where the charm and risk of reinforcement learning lie.

What investors are betting on is whether Silver can bring the "learning from experience" method from the AlphaGo era from the game room to a larger world.

David Silver's second start

David Silver's resume is the most important pillar of this valuation.

According to the UCL official website, he was the head of the reinforcement learning research group at DeepMind, led AlphaGo, and participated in AlphaZero, which achieved superhuman levels in Go, chess, and shogi through self-play.

At the same time, he also met DeepMind CEO Demis Hassabis through a chess competition and became lifelong friends with him.

Even after leaving DeepMind, the two still maintain a close relationship. David Silver said, "I left just because I wanted to open up a brand new path."

https://www.wired.com/story/david-silver-ai-ineffable-intelligence-reinforcement-learning/

In 2020, ACM awarded him the 2019 ACM Prize in Computing for his breakthrough contributions in computer games.

According to the Royal Society of the UK, he has participated in many key projects from Atari, AlphaGo, AlphaZero to AlphaStar.

His Google Scholar profile and public information show that Silver's academic citation count has reached 300,000, and his H-index is 103. He is one of the few people in the field of reinforcement learning who has both academic influence and industrial achievements.

https://scholar.google.com/citations?user=-8DNE4UAAAAJ&hl=zh-CN&oi=ao

Ineffable's $1.1 billion seed round is seemingly another AI financing record, but at its core, it's a vote for a route.

Large models are still racing on the main track, and Silver is trying to prove that ASI can also grow from action, feedback, and self-experience.

In the past, AlphaGo brought reinforcement learning to the public for the first time;

Now, Ineffable wants to make it move from the chessboard to a whole new intelligent system.

Reference materials:

https://www.cnbc.com/2026/04/27/deepmind-ineffable-intelligence-record-seed-funding-nvidia-google.html 

https://www.wired.com/story/david-silver-ai-ineffable-intelligence-reinforcement-learning/ 

https://davidstarsilver.wordpress.com/ 

This article is from the WeChat official account "New Intelligence Yuan". Editor: Allen. Republished by 36Kr with permission.