Brain Cells Turned into Chips to Play Doom: 200,000 Living Neurons Find Own Way to Kill Enemies, Crushing Deep Reinforcement Learning in Learning Efficiency

The code has been open-sourced.

200,000 human brain cells have formed a "brain CPU" and learned to play the classic game Doom.

These living neurons have learned to find enemies, shoot, turn and move, and even manage ammunition through reinforcement learning.

It was the same technology that allowed brain cells in a petri dish to learn to play the ping-pong game Pong five years ago.

The logic of Pong is very simple. When the ball goes up, the paddle goes up. It's a direct input-output relationship.

Doom is completely different. It's 3D, has enemies, requires exploring the environment, and is quite challenging.

It took the Cortical Labs team 18 months to teach neurons to play ping-pong back then.

This time, it was independent developer Sean Cole. Through the cloud platform API opened by Cortical Labs, he completed the adaptation of Doom in less than a week, and the code has been open-sourced.

Although it's still far from reaching the e-sports level, the person in charge of Cortical Labs admits that the current demonstration level is still very basic.

Right now, these cells play like a newbie who has never seen a computer before. But to be fair, they really haven't.

But its real significance lies in the fact that this is a material capable of processing information in a very special way, and this way cannot be replicated on silicon chips.

Learning Efficiency Surpasses Three Major Reinforcement Learning Algorithms

To enable the brain chip to play games, the key lies in how to translate the digital game world into a language that neurons can understand: electrical signals.

The game screen is converted into an electrical stimulation pattern. When a monster appears on the left side of the screen, the electrodes on the left side of the neural culture area on the chip are activated.

The neurons respond to the stimulation. Researchers monitor the "spike signals" of these responses and then interpret them as game commands. A specific discharge pattern makes the game character shoot, another pattern makes the character move to the right, and so on.

This time, Cortical Labs not only presented a demonstration video but also had a series of academic studies behind it.

One of the studies explains how to combine an in vitro cultured neural network with a high-density multi-electrode array through the DishBrain system and conduct a direct comparison of living neurons with the three mainstream deep reinforcement learning algorithms DQN, A2C, and PPO in a simplified Pong environment.

They recorded the neural spike activities of 1024 channels on the HD-MEA, covering 285 game sessions and 147 resting sessions, with a sampling frequency of 20kHz. Through two dimensionality reduction algorithms, t-SNE and Isomap, the team embedded the high-dimensional neural activities into a low-dimensional space for analysis.

The core constraint of the experimental design is "sample efficiency".

The recording duration of each biological culture's game was 20 minutes, during which an average of about 69 to 70 rounds of play were completed. To achieve a comparable comparison, the three deep reinforcement learning algorithms were also limited to the same training volume of 70 rounds. Each algorithm was trained with 150 different random seeds, equivalent to 150 independent neural networks, corresponding to 150 different biological cultures.

The study used cortical cells from two sources:

Human cortical cells (HCC) differentiated from human induced pluripotent stem cells (hiPSC) and mouse cortical cells (MCC) extracted from mouse embryos. Approximately 1 million cells were placed on the HD-MEA chip.

To consider the impact of input information density on the results, the researchers designed three different input methods for the reinforcement learning algorithms: a 40×40 grayscale image input, a four-dimensional vector input containing the coordinates of the paddle and the ball, and a ball position input that simulates the information structure of DishBrain as much as possible.

The results are very clear:

Under all three input designs, the biological cultures comprehensively surpassed all reinforcement learning algorithms in the three core indicators of the average number of hits per round, the direct serve error rate (aces), and the proportion of long rounds.

More importantly, there are differences in the learning dynamics:

When dividing each 20-minute experiment into the first 5 minutes and the last 15 minutes for comparison, only the HCC and MCC groups showed a statistically significant increase in the average round length, while DQN, A2C, and PPO did not show significant in-group improvement under any input design.

The HCC group was significantly better than all reinforcement learning methods in terms of the relative improvement amplitude, and the MCC group also surpassed PPO and DQN in many comparisons.

The biological cultures received extremely sparse input information, with only 8 stimulation electrode points, rate-coded at a frequency of 4 to 40Hz, while the reinforcement learning algorithms in the image input design received 1600 pixels in total from a 40×40 image.

The researchers specifically designed a control group with low-dimensional input to exclude the interference of the "curse of dimensionality" and found that even with sparser information input, the performance of the reinforcement learning algorithms was worse, not better.

When the number of training rounds was extended to tens of thousands, all three algorithms could eventually surpass the level of the biological cultures, which confirms that:

In the real-time scale, the sample efficiency of biological systems is far beyond that of current reinforcement learning algorithms.

What Happens to Neurons in the Game?

The research team conducted an in-depth exploration of this question.

In the game state, both dimensionality reduction algorithms could clearly distinguish the activity patterns of the two stages, showing obvious changes in network dynamics;

while in the resting state, the activity patterns of the first and second halves were almost indistinguishable in the low-dimensional space.

The team further developed a method to select 30 of the most representative channels from the 1024 channels to construct a functional connectivity network.

Comparing the first 2 minutes and the last 2 minutes of each recording, in the game state, the network showed statistically significant changes in multiple indicators such as the number of edges, density, average weight, and modularity index, while there were no significant differences in these indicators in the resting state.

The network in the game state showed more positively enhanced functional connections, and the modularity index decreased significantly, which means that the originally independent neuron communities began to establish more cross-community connections, and the network was reorganizing itself to complete the task.

The research team named this type of system "Synthetic Biological Intelligence (SBI)" in the paper and pointed out that this is the first formal performance comparison between SBI and reinforcement learning systems.

The discussion section of the paper mentions that compared with backpropagation, the forward propagation learning process is more in line with biological laws.

Biological systems may rely on more efficient forward learning processes such as predictive coding, active inference, and Hopfield networks.

The team also tested a bio-inspired algorithm based on active inference and counterfactual learning and indeed observed a faster learning rate than standard reinforcement learning. However, this algorithm is still highly dependent on hyperparameter selection, and its power consumption is much higher than that of biological systems.

CL1: The First Programmable Biocomputer

This demonstration ran on the CL1 released by Cortical Labs last year, which the official calls "the world's first biocomputer capable of deploying code". The core of this device is a multi-electrode array chip on which about 200,000 living human neurons are growing.

The research team developed a supporting API interface that allows any user to interact with the living cells on the chip through simple Python commands.

To verify that it was indeed the neurons learning rather than the algorithm doing the work, the team designed a control experiment: when real neuron discharges were replaced with random signals or zero signals, the learning effect completely disappeared.

Kagan said that the team has solved the interface problem and achieved real-time interaction, training, and behavior shaping with brain cells.

The next goal is to make the neurons not only able to play Doom but also play it well, and then take on more complex tasks, such as controlling a robotic arm.

The team has invited developers and researchers: the API is open, the cloud platform is open, and the neurons are ready.

The only question is, what do you want to teach them?

Reference links:

[1]https://www.youtube.com/watch?v=yRV8fSw6HaE

[2]https://github.com/SeanCole02/doom-neuron

[3]https://pmc.ncbi.nlm.nih.gov/articles/PMC12320521/

This article is from the WeChat official account "QbitAI", author: Meng Chen. It is published by 36Kr with permission.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Brain cells are made into chips to play Doom. 200,000 living neurons find their own way to kill enemies, and their learning efficiency crushes deep reinforcement learning.

Learning Efficiency Surpasses Three Major Reinforcement Learning Algorithms

What Happens to Neurons in the Game?

CL1: The First Programmable Biocomputer