Latest speech by AI guru Karpathy: From fantasy to reality for AGI, three realities must be faced first.
On June 19th, it was reported that at the closed - door class of YC AI Startup School in 2025, Andrej Karpathy, an AI technology expert and co - founder of OpenAI, stated that Software 3.0 is pushing traditional programming into a corner. Programmers either need to learn to "program" with prompts or switch to selling prompt generators.
Karpathy pointed out that Software 3.0 is revolutionizing traditional programming with the paradigm of "prompts are programs." It is not simply a combination of manual code and machine learning but fuses prompts with system design and model tuning into new productivity through the multiple attributes of large language models.
The problem is that current large models have two major flaws: "jagged intelligence" (a gap in ability between high - difficulty tasks and common - sense judgment) and "anterograde amnesia" (limitations in conversation memory). The boundaries of their capabilities need to be defined through methods such as system prompt learning.
He emphasized that the key to achieving human - machine collaboration lies in the "partial autonomy" framework. Just as the Iron Man suit uses an autonomy regulator to balance AI's autonomous decision - making and human trust, the development ecosystem also needs to be re - constructed. As the "bilingual translator" connecting humans and computer programs, agents are driving the development paradigm to shift from "humans adapting to machines" to "machines adapting to humans."
01 Software 3.0: Reconstructing the Ecosystem Where Prompts Are Programs
The "Software Generation Evolution Map" presented by Karpathy in his speech is quite interesting. He divided the development of software into three stages: "Software 1.0" with manual coding, "Software 2.0" relying on machine - learning - trained models, and "Software 3.0" driven by prompts. However, this is not a situation of peaceful coexistence. Just as smartphones replaced keypad phones, Software 3.0 is squeezing the survival space of the previous two generations at an astonishing pace. He called this trend an "irreversible technological iteration."
When it comes to the core of technological change, Karpathy broke the simple additive logic of "1 + 2 = 3." He pointed out that Software 3.0 is not a mechanical combination of the previous two generations of technologies but a brand - new species. Just as current AI engineers can outperform pure prompt engineers, it is thanks to the combined skills of "prompt design + system architecture + model tuning." While most people are still using "single - player game thinking" with prompts, developers with composite capabilities have already entered the "god mode."
What's even more subversive is the "jack - of - all - trades" attribute of large language models. These AI systems can now switch roles like "technological Transformers." They can provide basic computing power as "digital plumbers," directly output program content as "code production factories," support the operation of various tools as "application developers," and serve multiple users interactively as "online butlers." This all - around performance is completely reconstructing the traditional logic from R & D to commercialization of traditional technologies.
This combination of attributes has completely disrupted the commercialization logic of traditional technologies. Previously, new technologies were expensive at first and then became cheaper. In contrast, large language models first let you use them for free. Suddenly, one day, they tell you, "I can actually help you reconstruct the entire industry."
02 LLM Psychology: The Dual Challenges of Jagged Intelligence and Memory Defects
Karpathy proposed the framework of "LLM Psychology" to reveal the "cognitive shortcomings" of current large models.
He put forward two major concepts: "Jagged Intelligence" and "Anterograde Amnesia," pointing directly at the main problems of current large models.
The "Jagged Intelligence" theory compares AI to a "split - personality academic genius." It can solve partial differential equations that even physics Ph.D. students find headache - inducing but may stumble on a simple math problem like "Which is greater, 9.11 or 9.9?" that an elementary school student can solve.
Karpathy used a set of contrasting examples to expose the gap in abilities. When AI can write a well - referenced paper, it may suggest "putting a peeled egg in the microwave." It can derive complex formulas but cannot distinguish common - sense logic. This phenomenon is completely different from the linear growth curve of human intelligence. Humans accumulate experience like leveling up in a game, while AI is like a skill tree struck by lightning. It may have a 20 - level proficiency in natural language understanding but only a 3 - level ability in common - sense judgment.
He joked that current AI is like Sheldon in "The Big Bang Theory," with super - high IQ but unable to take care of itself. Developers need to learn to step on the brakes before it "acts stupidly." Interestingly, the solution is not to throw money at computing power. Karpathy gave an example, saying that LLM is like providing "cognitive therapy" for AI, making the model ask itself "Did I calculate correctly?" before answering, just like a student double - checking during an exam. However, in a production environment, using LLM is still like taking care of a naughty child. Let it do tasks it is good at, such as writing code, while keeping an eye on it to avoid pitfalls in simple problems, which is like "manual supervision for safety."
If jagged intelligence is an IQ problem, then anterograde amnesia is a memory disaster.
Karpathy made a vivid analogy: LLM is like the protagonist in "Memento." Every conversation is a new start. After training, it's like having its memory erased. Except for the few hundred words of context in front of it, all past knowledge has faded away. Imagine having a colleague who was just taught the reimbursement process yesterday but still looks blank today when asked. The "memory" function of ChatGPT is like giving such a colleague a sticky note, but a poor - quality one that can be soaked by coffee.
Humans learn by "taking notes," but AI lacks this function. Pre - training is like stuffing knowledge into the brain, and fine - tuning is like cultivating behavioral habits, but both require parameter adjustment. What we really need is to enable AI to write its own "learning diary." For example, after encountering a complex problem, it can summarize "Next time I encounter a similar situation, I should check historical data first." This is what Karpathy calls "system prompt learning."
For example, pre - training is like going to college, fine - tuning is like an internship, and system prompt learning is like writing a work summary after starting a job.
Karpathy made a vivid analogy for AI's memory defect: Current AI is like a food delivery guy without a notepad, unable to remember customer preferences and always getting lost at "common - sense crossroads." The ideal solution is to give it a "digital diary" so that the model can summarize problem - solving strategies on its own, rather than relying on engineers to feed it prompts like a nanny.
The difficulty lies in teaching an amnesiac to write a diary. First, AI needs to understand "what to remember," and then figure out "how to turn the experience in the diary into muscle memory." The technical hurdles between identifying key information and internalizing historical experience into autonomous decision - making ability will keep engineers busy for a long time.
03 Partial Autonomy: When AI Wears the Iron Man Suit
Karpathy put forward the idea of making AI wear the "Iron Man suit." This suit consists of two parts: one is "Augmentation," which endows the user with enhanced strength, tool integration, environmental perception, and information interaction capabilities; the other is "Autonomy," which enables AI to have autonomous will in most scenarios and execute tasks without human instructions.
But how to implement this cool concept in real - world AI products? Karpathy proposed three solutions.
The first approach is to install a "maturity knob." Karpathy introduced the concept of an "autonomy regulator," which is like the volume knob on an old - fashioned radio and can flexibly control the "autonomy valve" of AI. Take Cursor as an example. From timidly asking "Do you want to type 'hello'?" when using the Tab key for completion to boldly declaring "I'll handle this" in the Cmd - I agent mode, it's like an intern rising to become a project backbone. Perplexity's search function has also been upgraded from the basic version of "giving a link for you to check" to the "researcher mode" that outputs a small paper with references. Even Tesla's autopilot has gradually evolved from the L1 level of "you hold the steering wheel and I'll help watch the road" to the L4 level of "you play with your phone and I'll drive." The essence behind this is the dynamic calibration of human trust in AI.
The second approach is to step on the "fast - forward button" for human - machine collaboration. In the collaboration cycle of "AI posing questions - humans correcting," Karpathy emphasized that semi - autonomy is the key to breaking the deadlock. The verification end must be lightning - fast. For example, AI is required to first output a 100 - word minimalist plan, and humans can approve or reject it within 10 seconds. The generation end should use "restraints" to define boundaries, specifying that the code must contain designated functions to prevent AI from "running wild" and writing "mystical code" that cannot run.
The third approach is to cross the "Mariana Trench" from a demo to a product. Karpathy used Waymo as an example to sound the alarm. In 2014, Waymo's fully autonomous driving prototype with zero intervention made everyone think that "driverless cars would be on the road tomorrow." However, in reality, drivers still need to keep a close eye on the steering wheel like guarding against a thief. This confirms the cruel truth: partial autonomy is the bridge to cross the gap between technology and its implementation. Developers must find a delicate balance between feature richness and reliability.
04 Vibe Coding and the Agent Development Ecosystem
A tweet about Vibe Coding casually posted by Karpathy has now spawned thousands of startups and even has an independent Wikipedia entry. This scenario is comparable to the "craze when the Bitcoin white paper was first released." However, the reality is harsh. Just like waiting in line for two hours at a trendy restaurant to eat for only five minutes, after tools like MenuGen run locally, the "light - speed programming" effect of AI suddenly disappears.
The Fragmented Status Quo of Web Development in 2025
Today's development toolchain is like a "hodgepodge of old and new." Old tools like Clerk have complex and difficult - to - understand documentation, which is like deciphering ancient texts for AI. New tools like Vercel have concise and clear documentation that AI can easily understand, and Karpathy praised them.
This gap between old and new tools is like asking programmers to use an abacus and a computer at the same time, resulting in a huge difference in efficiency. Therefore, "knowledge - organizing tools" like DeepWiki have become a necessity, which can help AI automatically associate knowledge such as payment interfaces and logistics APIs and avoid repeatedly looking up information like a novice.
A New Paradigm for Digital Information Consumption
Karpathy pointed out that tool developers need to re - define three types of users: human users who operate through graphical interfaces (similar to writing notes with a pen), computer programs that interact