AI Completes 10 - Week Workload in Just 4 Days! Full Transcript of Anthropic's Press Conference: Your Proud Complex Projects Are Toys to the Model

After human engineers leave work, let AI fix bugs, run CI, and merge PRs on its own.

If you missed the "Code w/ Claude" developer conference recently held by Anthropic a few days ago, you might be missing out on the biggest paradigm shift in the history of software engineering.

The core message of the entire conference is simple: the capabilities of AI models are growing exponentially, while the development models of most enterprises still remain in the "linear" stage. To help developers bridge this gap, Anthropic has unveiled three key weapons: a more powerful underlying model, the brand - new Claude Platform agent orchestration capabilities, and the Claude Code desktop version that completely revolutionizes daily development.

In this keynote, almost all the speakers were the top - level figures from Anthropic's product and engineering teams. Chief Product Officer Ami Vora first explained the background thoroughly. Dianne Penn, the head of product research, explained why the model layer will continue to experience explosive growth. The Claude Platform team demonstrated a highly science - fiction - like lunar drone operation, revealing new capabilities such as multi - agent, outcomes, and dreaming all at once. In the Claude Code session, Cat Wu and Boris Cherny, the head of Claude, clarified another point - synchronous programming is taking a back seat, and asynchronous development workflows that can continuously run tasks, automatically fix PRs, and automatically handle CI errors are becoming the new default option.

Now, let's follow this main line: how does Anthropic plan to define the next - generation development workflow, and how much progress have the Claude platform and Claude Code made respectively?

Ami Vora: Good morning, everyone! I'm very glad to see you all and thank you for coming.

As I think about why I'm standing here today, my mind goes back to the moment when I first successfully ran a piece of code. I didn't start programming from a young age. I grew up at the foot of the Appalachian Mountains. I never assembled a computer by myself and hardly played video games. My first attempt at building a complex project was in a computer science class in college. That was a long time ago. At that time, we had to line up to log in to the server because it was the only one with enough computing power to run our ray - tracing programs.

Some of you here must be familiar with that scene: the humming of the server, the mixed smell of overnight pizza and coffee, and the unique smell of the windowless basement computer room. But I still remember the feeling of waiting for the result after pressing the "compile" button - the pure joy, the excitement of discovering something new, the relief, and the shock of realizing that I had created something unprecedented in the world. That feeling completely fascinated me, and that's why I'm standing here today.

Times have changed. What I could only get by queuing in the college computer room back then is now easily accessible to anyone, anywhere, and at any time in the world. There's no need to queue, no strange smell, and no threshold, but the excitement, joy, and relief remain the same. I know many of you here feel the same way. People often tell me, "Claude makes me feel like I have superpowers." This is my favorite comment.

We're witnessing how people are using these superpowers. For example, Scott MacVicar, the head of Stripe's development infrastructure, had to rewrite 50,000 lines of Scala code into Java to upgrade the JDK. His team initially estimated that it would take engineers a full 10 weeks. But with Claude, they completed the task in just 4 days.

Sometimes, speed isn't just about efficiency; it's about what it can achieve. Felicia Curcuru, the co - founder and CEO of Binti, has a software system that helps social workers find homes for foster children. Handling documents, home visits, qualification reviews... This year, her team used the Claude API to free social workers from tedious paperwork, shortening the entire foster - family qualification approval process by 20 days. Twenty days - this isn't just a cold efficiency indicator; it means a child can have a home 20 days earlier.

The excitement, joy, relief, and the shock of discovering something new are the resonances I hear from you all. But I guess each of you experiences it in different ways. Some of you are constantly at the forefront of technology; some are trying to bring those around you to evolve together; and some of you are here because, like me, you feel the intense movement of the technological tectonic plates under your feet and want to see the future picture. Believe me, I often experience all these emotions in one morning. I often come to work with a plan, but by lunchtime, I tear the plan to pieces because there's a new breakthrough. Does this sound familiar?

When we step back and look at how fast these models are evolving, it all makes sense. At Anthropic, we often talk about "exponential leaps." I think this is exactly how we all feel right now. Remember? Just two years ago, the ultimate goal in the model field was to write a decent email, and we were already grateful. A year ago, when we stood on this stage, the headline was Opus 4, and "letting an agent run continuously for an hour without human intervention" seemed like a moon - landing project.

However, six months ago, agents could run end - to - end tasks all night, and we could check the results when we woke up every morning. Just last month, Mythos read through the entire source code tree of OpenBSD and found a vulnerability that had been lurking for 27 years, evading human review, fuzz testing, and static analysis for nearly three decades. The technological leaps are getting bigger, and the intervals are getting shorter.

Although the capabilities of models are exploding exponentially, most organizations are still applying AI in a step - by - step manner. This has created a gap between "what AI can do" and "what AI is actually helping humans solve." Bridging this gap and turning the model's capabilities into tools for ordinary people to solve problems is the mission of developers. This is what you're doing, and you're achieving remarkable results. On the Claude platform, the API call volume has increased by nearly 17 times year - on - year. On Claude Code, developers spend an average of 20 hours per week running Claude.

Like you, we've also been in a state of crazy delivery recently. We hope that when you leave today, you'll have a clear picture of the future in your mind so that you can plan ahead and ride the wave of exponential growth with us. I want to clarify that we don't have a new model to release today. The theme today is - how we can make our products serve you better so that you can bridge this gap for the whole world. This morning, we'll show you the whole picture.

First, Dianne will talk about our cornerstone - the model layer. She'll share more about the progress and future plans of our cutting - edge models. On the Claude platform, we'll make significant updates to the Claude Managed Agents, including: Outcomes, Dreaming, and Multi - agent orchestration. Angela and Katelyn will demonstrate how the platform takes care of the underlying infrastructure for you, sparing you from red tape. In terms of Claude Code, Cat and Boris will guide you on how to use new primitives such as Routines, enabling Claude Code to prompt itself and keep working even when you're away from your computer.

But ultimately, it all comes back to you and the products you're about to create. Because most people will never call the Claude API in their lives, let alone open a terminal and type "Claude." They'll only experience AI through the products you build on the Claude platform. Whether it's a designer using Canva to explore new inspiration, a lawyer using Legora to quickly complete legal documents, or any developer using the world's top programming agent. Thank you for shaping the way AI is perceived by the world. We can never build all the tools to solve everyone's problems alone; it depends on you.

To show our gratitude, we have good news. Starting today, we'll increase the rate limits for Claude Code and Claude platform developers to help you continue to bridge the gap for the world. Specifically, we've not only doubled the 5 - hour rate limit for Claude Code for Pro, Max, Team, and seat - based Enterprise plans but also significantly increased the API limit for Claude Opus.

We can do this because we've expanded our computing power cooperation - we're collaborating with SpaceX and fully utilizing the computing power of their Colossus 1 data center. We'll directly invest these resources in independent developers and small teams. Over time, we'll continue to explore various ways to help you unleash the full potential of Claude, whether it's through existing computing power cooperation or more radical future bets.

Thank you all for coming today. Thank you for joining us in defining what AI looks like in the real world. Thank you for putting superpowers in the hands of the public. Now, let's welcome Dianne, the head of our research product team. Thank you!

Dianne Penn: Thank you, Ami. I'm Dianne, and I joined Anthropic in 2023. Since Claude 2, I've witnessed the birth of every model. If anyone's counting, we've brought 18 Claude versions, spanning Haiku, Sonnet, Opus, and now Mythos, to users and developers like you.

We've racked our brains to make Opus 3 perfectly follow the JSON format and be the king of writing long - form code. In Sonnet 3.5 New (which you've finally gotten used to calling Sonnet 3.6), we taught Claude how to operate a computer safely. In Sonnet 3.7, it sometimes acted a bit "impatient," so we found the right way to make it available to users and developers so that you could understand its quirks. At this time last year, we found the perfect balance in adjusting the thinking dials and test - time compute with Claude 4. We've never slowed down. In the past 12 months, we've delivered 8 cutting - edge models to developers and users. Each generation has leaped on the shoulders of the previous one, allowing you to write more elegant code and make the products you build go further than ever before.

The model layer is the foundation of all the innovations you'll hear about today. This is the core consensus. As the model's intelligence leaps forward, your starting line also moves forward, and what you can achieve will break all imaginable boundaries. At Anthropic, we often talk about "exponential leaps," as Ami just mentioned. To me, it means: when the model becomes smarter, the application scenarios you can create for users will also explode exponentially. For example, "agent programming" with autonomous planning capabilities is far more disruptive than simple "code completion." By analogy, new products and experiences will open up new markets and expand the entire market.

In the eyes of the research team, "exponential leaps" aren't just about improving the scores on the SWE - bench. It's about creating and tracking brand - new capabilities that wouldn't exist without deliberate design: tool invocation, taking over the computer operating system, adaptive thinking depth based on problem difficulty; agent loops that can stay on track through hundreds of steps; and an ultra - long context window that allows Claude to absorb new knowledge. These capabilities aren't limited to writing code. Today, Claude can generate and iterate visual designs, analyze and create complex business deliverables, and handle the uncertain business world with ease. This is all because the underlying model brain has become smart and strong enough to support all of this.

When you develop based on Claude, you're building on the model product line that first created these capabilities and spent the longest time refining their stability. Let me give you a real - world example with the newly released Opus 4.7. Amp, a coding agent company, migrated all their "intelligent modes" to Opus 4.7 because it not only outperformed all others in benchmarks but also, more importantly, they found that they could directly cut out a lot of redundant scaffolding and tools - because the model was smart enough not to need these aids. Rakuten ran our model in their benchmark tests and solved three times as many production - environment engineering tasks as before. Intuit found that Opus 4.7 could even detect its own logical flaws during the planning stage, correct itself, backtrack, and finally deliver cleaner and faster - executing code.

The day after the release of Opus 4.7, we launched " Claude Design" in the Anthropic laboratory, one of my favorite projects this year. People have started using the combination of Claude Design and Claude Code to generate production - level UI interfaces. This is because Opus 4.7 has excellent visual aesthetic taste, knows how to strike the right balance, and can present excellent details while following your design principles. In daily communication, we often hear that people like using Claude because it can not only understand the tasks you assign but also detect what's wrong and even question your assumptions.

Of course, as developers, we know that the current models are still works in progress and are still evolving. They sometimes stumble on extremely simple problems and "black out" when given a large amount of context. But that's what makes it exciting. Thank you for accompanying us on this journey.

Let me share some of the goals we're working on: First, higher - order judgment and better code taste. This means that in the future, Claude can handle complex and fully autonomous engineering projects with ease. Second, an almost bottomless context window, combined with a high - quality memory library, will allow the model to handle long - term tasks with ease and get better over time. Finally, multi - agent collaboration will drive a smart team composed of multiple Claude avatars to collaborate on grand goals that a single instance could never handle alone.

A core dimension for me to evaluate the progress of model intelligence is "Task horizon" - the length of time a model can work autonomously and continuously improve its results without human intervention. This time last year, models could only work autonomously for a few minutes. Now, like many of you here, the agents in my hands can often run continuously for several hours. Tomorrow, we'll have agents that can take the initiative, be online at all times, and never "lose themselves."

As developers, how should we view all this? The exponential leap won't stop, so when you build products, you must anchor to "the capabilities that will emerge in the future," rather than being limited to what the current version of Claude can do. Because the new - generation models will be much more powerful than what we have now. In the past, we had to pile up various scaffolding to "patch" the old - version Claude; now, the scaffolding is used to "amplify" the model's wisdom. In the past, you had to carefully design complex iteration loops, feed various tools, and study retry mechanisms; now, these can be internalized into the model's own thinking and execution logic. You can already glimpse the future - Mythos, the preview version of Opus, is the next big thing on this exponential curve, and the leap is extremely significant.

Therefore, the way we all interact with models needs to be reshaped.

At Anthropic, we've summarized the following points: First, design for the next version of Claude, not just the current one. Countless historical experiences have taught us that the ultimate winners are always the developers who optimize their architectures and are ready to embrace the next wave of intelligence explosion, rather than those who focus on the tiny accuracy of the current version. This requires you to establish more rigorous evaluation systems and build prototypes that seem almost impossible today. Because only in this way, when the exponential curve leaps forward, you'll be the first to notice - hey, that thing that didn't work yesterday suddenly works today! This is often a sign: you've found a treasure that will amaze users.

The teams that make the most of Claude understand one thing: model upgrades mean business opportunities. They've already built automated evaluation systems, streamlined scaffolding frameworks, and ambitious prototype systems that others haven't noticed, minimizing the cost of each upgrade.

We firmly believe that as models become smarter, you developers here will have a significant first - mover advantage to explore new scenarios, create amazing new products, and ultimately define new markets and expand the market. All the tools that Katelyn and Angela will show you

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

"AI completed a 10-week workload in just 4 days!" Full transcript of Anthropic's press conference: The complex projects you're so proud of are just toys in the eyes of the model.