AI guru Andrej Karpathy reveals: Playing with crayfish led to "AI psychosis", and he gets anxious if tokens aren't used up.
On March 23, Intelligence Express reported that in a podcast released last Saturday, Andrej Karpathy, co - founder of OpenAI and an AI expert, systematically sorted out his first - hand experiences and methodologies in AI programming and the OpenClaw wave. He joked that due to the rapid development in the AI field, he felt as if he was in a state of “mental disorder,” constantly rushing between different new things. He also found that the bottleneck of current AI programming agents is no longer just the model's ability: “If an agent doesn't perform well, it's mostly a skill issue.”
“I hardly write a single line of code myself these days.” In Karpathy's view, the workflow of software engineering has been completely rewritten by agents in just a few months. Now, it's not humans writing code, but humans using natural language to dispatch a group of agents to complete system - level tasks. In the past, he wrote 80% of the code himself, but now 80% or even more is done by agents.
Besides using agents for programming, the explosion of OpenClaw has also changed Karpathy's life. He created an OpenClaw named “Dobby,” which directly “took over” his home. It automatically scans and connects to devices such as speakers, lights, and security systems, independently searches for APIs, creates control panels, and can send warnings when strangers approach.
This experience led Karpathy to conclude that many apps should be APIs that agents can call, and agents are the glue. What makes OpenClaw special is not that it has the strongest single function, but that it is closer to the AI form in people's minds.
It's worth mentioning that under the tweet announcing this podcast, Noam Brown, a former colleague of Karpathy at OpenAI and one of the authors of the OpenAI o1 model, posted a rather “fiery” tweet, questioning why Karpathy wasn't doing research in an AI frontier laboratory at this critical moment.
In response, Karpathy also gave a positive response in the podcast. If he were deeply tied to a frontier AI laboratory, it would be difficult to maintain a completely independent stance. Leaving the laboratory actually aligns him more with the overall stance of humanity. There is a “conflict of interest” between financial incentives and social responsibilities, which has been a problem since the founding of OpenAI and has not been solved yet.
Karpathy believes that working in a frontier AI lab for a period, doing some high - quality work, and then leaving is a good approach. It allows one to keep up with the real progress without being completely controlled by a single entity and also contributes to the ecosystem. This statement has a bit of the meaning of “Prometheus stealing fire,” which might be the reason for his hasty withdrawal from OpenAI.
In this podcast, Karpathy also shared his thoughts on automated research, the “jagged” distribution of large - model capabilities, the competition pattern between open - source and closed - source models, and the reconstruction of employment and software forms by AI. Here are the core contents of this podcast:
1. AI Programming: Since last December, the paradigm of AI programming has completely changed. Now, humans are not really programming but expressing their ideas to agents.
2. Productivity Anxiety: The current anxiety in the industry is no longer about fully utilizing GPUs, but about using up tokens. “I get anxious if I don't use up my subscription, which means my token throughput isn't maximized.”
3. Automated Research: AI can highly automate complex research tasks. What humans need to do is to remove themselves from all processes, automate as much as possible, and pursue extremely high token throughput.
4. Model Capabilities Are Jaggedly Distributed: The capabilities of models in different fields are still uneven. The feeling of talking to AI now is like talking to a genius programmer and a 10 - year - old child at the same time.
5. Generalization Problem: Intelligence has not fully overflowed. The improvement of verifiable capabilities does not drive the improvement of the model's soft capabilities. For example, although the model has become better at coding, it still tells the same old jokes as five years ago.
6. Career Choice: Working in a frontier laboratory is not free. There are too many interest entanglements and stance constraints. Outside these institutions, one is closer to the “overall stance of humanity.”
7. Open - Source vs. Closed - Source: Completely closed - source intelligence still has systematic risks. If an open - source model is not the strongest, it's best to be only slightly behind, playing the role of a “common workspace” in the industry to ensure a balanced power structure.
8. Single Large Model vs. Specialized Small Models: More “speciation” will occur in large models, but related technologies such as continuous learning, fine - tuning, and weight modification are still not mature.
9. Robotics: Manipulating atoms (the physical world) is 1 million times more difficult than manipulating bits (the digital world), but the total potential market (TAM) of the physical world may be larger than that of the pure digital world.
10. AI and Education: The era of humans teaching each other knowledge is coming to an end. In the future, the education model may be to let agents understand first and then let them teach humans.
Here is the full translation of the podcast content:
01. Poor Programming Performance of AI “Lobsters”? Mostly a Skill Issue!
Host: I remember walking into your office one time and seeing you very focused. I asked what you were doing, and you said, “I have to ‘program’ for 16 hours every day.” Programming isn't even the right verb anymore. You're actually expressing your ideas to agents. Tell me about your experience.
Karpathy: I feel like I've been in a state of “AI psychosis”, and I still often am. As an individual, you can achieve more things now. In the past, you were limited by factors like typing speed, but now with these agents, the situation is completely different.
Since last December, there has been a real turning point in my work style. Originally, I wrote 80% of the code by hand and left the remaining 20% to agents. Now, it's the other way around, with 20% written by me and 80% or even more done by agents. Since then, I've hardly written a single line of code myself.
If you take any software engineer and look at what they're doing, you'll find that the default workflow for building software has completely changed since last December.
This is an extremely significant change. I've also talked to my parents about this. In fact, ordinary people don't realize that this change is happening or how dramatic it is.
So, I'm in a state of “mental disorder,” trying to figure out what's possible and pushing these possibilities to the limit. I wonder: How can I not be limited to single - session Claude Code or Codex? How can I have more? How can I make better use of these capabilities? What scenarios can these OpenClaws be used in? There are so many new things.
I feel that I have to stay at the forefront. I see many people on Twitter making various attempts, and it all sounds reasonable. If I'm not at the forefront, I'll feel very anxious. This “mental disorder” state is essentially because we're still exploring “what's possible,” and this field is fundamentally unknown.
Host: If you're anxious, then the rest of us are even more so. We have a cooperation team, and their engineers don't write code by hand at all. Everyone wears a microphone and whispers to agents all the time. This is the strangest work scenario ever. I used to think they were crazy, but now I completely accept it. “Oh, this is the right way.” You're just one step ahead. What do you think limits your ability to explore or work on projects now?
Karpathy: If an agent doesn't perform well, it's mostly because the person hasn't mastered the skills well. It's not that the agent is incapable; it's that you haven't figured out how to combine the existing things. For example, the instructions in the agents.md file aren't well - written, or you haven't equipped it with a good memory tool. Ultimately, it's a skill issue.
The best way is to let agents work in parallel, just like Peter Steinberg (the author of OpenClaw). Peter has a really funny photo where he's sitting in front of a monitor with the screen filled with a bunch of Codex Agents. If the prompt is written correctly and you turn on the high - intensity reasoning mode, each task takes about 20 minutes to run. He has about 10 repositories to check, so he switches back and forth to assign tasks to agents.
In this way, you can operate on a larger scale, not just making small changes like “changing a line of code here and adding a new function there,” but rather “assigning this new feature to Agent 1 and that non - conflicting new feature to Agent 2,” and then reviewing their outputs based on the importance you place on the code.
These are the “macro - actions” for operating code repositories. One agent is doing research, one is writing code, and another is planning a new implementation. All things are advancing within these macro - actions. You have to master this way of playing and develop muscle memory. It's very rewarding because it's really useful, and you're learning new things. That's why there's “mental disorder.”
02. “I Get Anxious If I Don't Use Up My Subscription, Which Means My Token Throughput Isn't Maximized”
Host: My intuition is that every time I wait for an agent to finish its work, I feel like I should do more. Right? If there are still tokens left, I should assign more tasks in parallel. It's quite stressful because if you don't think token consumption is a bottleneck, then the real bottleneck in the system is yourself.
Karpathy: At least it means you haven't used up your subscription quota. Ideally, when Codex is fully utilized, you should switch to Claude or something else. I've been trying this mode recently. I get anxious if I don't use up my subscription, which means my token throughput isn't maximized.
I actually experienced something similar when I was doing my Ph.D. I'd get anxious if the GPU wasn't running at full capacity—even though I had GPU computing power, I wasn't squeezing out all the FLOPS. But now, it's not about FLOPS; it's about tokens. What's your token throughput? How many tokens are you directing to run?
Host: I find it quite interesting. For at least the past decade, in many engineering tasks, people didn't think they were limited by computing power. But now, the entire industry has suddenly become resource - constrained. Now that the capabilities have suddenly jumped, you'll find that “oh, it's not that I can't get computing power; the bottleneck is myself.”
Karpathy: It's a skill issue. Research can make you better. I find it quite addictive because as you get better, you'll unlock new things.
Host: Where do you think it will go? For example, if Karpathy iterates for 16 hours a day and others are also getting better with programming agents, what level of proficiency will you reach in a year?
Karpathy: What does proficiency look like? By the end of the year, or in two, three, five, or ten years? I think everyone wants to move up the technology stack. It's not about single - session conversations with agents, but about how multiple agents can collaborate and how teams can cooperate. Everyone is exploring what that will be like.
Then I think OpenClaw is also an interesting direction because the OpenClaw I'm talking about is something that takes persistence to a new level. It runs in a continuous loop, not something you interact with. It has its own small sandbox, does its own thing, and has a more complex memory system, which hasn't been implemented in these agents yet.
The memory system of OpenClaw is much more complex than that of the default large model, which only compresses memory when the context is full.
Host: Do you think OpenClaw impresses users because of this or because of more extensive tool access?
Karpathy: I think OpenClaw has many great ideas. Peter did an excellent job. I met him recently and talked to him. He's quite humble, but I think he's innovating in five different dimensions simultaneously and then integrating them. For example, that “soul.md,” he really carefully crafted an attractive and interesting personality. I think many agents haven't got this right.
Claude has a good personality. It feels like a teammate, getting excited with you. Interestingly, Codex in ChatGPT is very lively and energetic, but the programming agent Codex is very dull. It seems not to care about what you're creating, even though it completes the task, it doesn't seem to understand what we're building.
Host: Indeed.
Karpathy: Another thing, like Claude, I think they've adjusted the “personality” of the model quite well. When Claude praises me, I really feel like I deserve it. Because when the ideas I give it are not well - developed, its reaction isn't strong. But when it's a really good idea by my own standards, it seems to give a little more reward. I kind of want to win its praise. It's really strange.
Personality is very important, and many other tools may not pay as much attention to it. Peter really cares about this, so he got it right. Then there's the memory system, and the design of accessing all functions through a single WhatsApp entry is also very good.
03. “Everything Should Be an API Endpoint, and Agents Are the Glue”
Host: Have you used OpenClaw to do anything interesting outside of programming?
Karpathy: In January, I went through a period of “OpenClaw psychosis.” I built an OpenClaw to take care of my home and named it “Dobby the House - Elf.” I used an agent to find all the smart home subsystems on my home local network, and I was quite surprised that it worked out - of - the - box.
I just told it, “There's a Sonos in the house. Can you find it?” It did an IP scan, found the Sonos speaker, and since it wasn't password - protected, it logged in directly. Then it started reverse - engineering to see how these systems worked, did some web searches, found the API endpoint directly, asked if I wanted to try it, and then the music started playing.
The same goes for the lights. It basically hacked into the system, figured out the whole thing, created an API, and created a dashboard. I can see the command center for all the lights in my house and turn them on and off. I can tell it, “Dobby, it's time to go to bed.” This command can turn off all the lights, and it also controls all my lights, air conditioners, curtains, pool, spa, and security system.
I have a camera pointing outside the house. Every time someone comes in, it first does a change detection, then uses Qwen based on the change detection, and finally sends me a WhatsApp message with an image of the outside. For example, “Hey, the FedEx truck just arrived. You might want to check it out. You've received mail.” This is what Dobby just sent me. It's really incredible.
Dobby takes care of the house, and I send it messages via WhatsApp. These macro - actions are really interesting. I haven't really pushed it further. I think some people are doing even crazier things. But even just for home automation settings, I used to need six completely different apps, but now I don't. Dobby controls everything with natural language. It's amazing. I don't think I've fully pushed this paradigm to the limit, but it's already very helpful and inspiring.
Host: Do you think this shows what people want from a user experience perspective? Because learning new software and new UIs requires human effort, which has been overlooked in the past.
Karpathy: To some extent, it's true. What OpenClaw achieves is essentially deduced from the perspective of “what people think AI should be like”. The AI in people's minds is not the original large model in the strict sense—a large model is just a token generator. But the AI people imagine is something with a personality and an identity. You can share things with it, and it will remember, just like an entity behind WhatsApp, which is easier to understand.