HomeArticle

It's confirmed that a former Codex expert has switched sides. They're raving about Claude Code, claiming it speeds up programming by 5 times and pointing out OpenAI's Achilles' heel lies in context.

AI前线2026-02-09 19:14
Is it true that the core developer of OpenAI Codex has become a loyal user of Claude Code?

Calvin French-Owen is the co-founder of Segment, a former engineer at OpenAI, and an early developer of the Codex project. Recently, in a podcast, he gave sharp comments on the current hottest code agents, Codex, Claude Code, and Cursor.

The conclusion was unexpected. The one he uses most often and prefers the most is Claude Code. He said that it's even better when paired with the Opus model.

Calvin used a very vivid metaphor to describe the experience of using Claude Code:

It's like a disabled person getting a bionic knee. The speed of writing code has directly increased by five times.

In his opinion, the real trump card of Claude Code is its extremely effective context splitting ability.

Facing complex tasks, Claude Code will automatically generate multiple exploratory sub-agents. These sub-agents will independently scan the code repository, retrieve the context, and then summarize and feedback the key information. This design significantly reduces context noise and explains why it can stably output high-quality results.

However, he also affirmed his own product, believing that Codex has a lot of "personality", like AlphaGo. When it comes to debugging complex problems, Codex is superhuman. Many problems that the Opus model can't solve, Codex can handle.

"Context management" is the keyword that Calvin French-Owen repeatedly emphasized throughout the podcast.

He believes that the context information density of code is extremely high. As long as the retrieval method is appropriate, the model is often easier to understand the system structure than humans. But at the same time, the context window itself has become the biggest bottleneck restricting the development of code agents.

When it comes to the issue of context pollution, the host said that the LLM will become stupid. Taking this opportunity, Calvin shared a very practical experience: When the context token usage exceeds 50%, he will actively clean it up.

He even shared a "canary detection" method commonly used by entrepreneurs: Embed some irrelevant but verifiable small information in the context. Once the model starts to forget, it means the context has been polluted.

In terms of product concept, Calvin believes that the differences between Claude Code and Codex are already written in the genes of the two companies:

Anthropic focuses more on "creating AI suitable for human use".

OpenAI focuses more on "creating the most powerful AI".

He judged that in the long run, OpenAI's approach may be an inevitable trend, but in terms of the current usage experience, he prefers Anthropic.

When talking about the future, Calvin gave a clear judgment:

Companies will become smaller, but the number will increase.

Everyone will have their own agent team.

And the first to be magnified are senior engineers with a "managerial mindset". They are better at breaking down problems, making judgments and choices, and giving instructions to agents at the right nodes.

In this context, the distribution method of products has become more important than ever.

The bottom-up distribution model is spreading at an unprecedented speed. Engineers won't wait for approval or procurement. They will just vote with their feet.

Compared with large companies' high emphasis on security, compliance, and control, what developers care most about is still the simplest evaluation:

"This thing is really useful."

The following are the wonderful details of the podcast. It's full of valuable insights about AI Coding. Welcome to read:

I'm obsessed with Claude Code. It's so useful

Host: Calvin French-Owen is one of the first developers of the Codex code model under OpenAI. Before that, he founded the Segment company, which was worth billions of dollars and was eventually acquired by a well-known enterprise at a high price, successfully realizing capital realization.

Calvin French-Owen: To be honest, it's a period full of uncertainties for all of us now. I've recently become completely obsessed with Claude Code. To use a metaphor, ten years ago, I was a marathon enthusiast and really loved running. But then I got a serious knee injury. After that, I entered the so-called "managerial mode" and never wrote code again. It's really a pity when I think about it.

But in the past nine days, it's like opening a new world. I've regained all the feelings of writing code. It's like getting a brand-new knee, and it's bionic. It can make my code-writing speed five times faster.

Host: What do you think of this tool? After all, you've always been at the forefront of this field. Many concepts pioneered by Codex are still widely used today, and this model is still being continuously iterated.

Calvin French-Owen: When I worked at OpenAI, I was in charge of the web project of Codex. At that time, the Cursor tool had just come out. They made an adaptation layer based on GPT - 3.5, which could be used in the IDE. Claude Code had also just been released, and it runs based on the CLI. At that time, we had an idea: The future of programming should be more like communicating with colleagues - you pose a problem, the other party handles it, and finally comes back with a PR for feedback. Our web project started from this idea, and this was also our R & D direction at that time.

Looking back now, this general direction was actually correct. But obviously, now everyone has switched to CLI programming. Whether it's Claude Code or Codex, the usage frequency of these tools has increased a lot. At least for me, the inspiration brought by this is that to some extent, you're right. Maybe everyone will become a "manager" in the future. This is my personal opinion. But to reach that stage, it needs to be done step by step. You have to really trust the model and understand its working logic.

Host: You've been using Claude Code recently. After incorporating it into your core technology stack, what changes have you experienced in the usage experience?

Calvin French-Owen: Claude Code is indeed my main tool for daily programming now. To be honest, my main tool changes every few months. There was a period when I really preferred Cursor. Its new model was very fast, and it was really good to use. Then I gradually switched to Claude Code, especially when used with the Opus model. The experience was even better.

Claude Code is a very interesting product. I think people have underestimated its collaborative performance at the product design and model levels. If you study it deeply, you'll find that the most powerful thing about Claude Code is its context splitting ability.

For example, when you need to call functions and let sub-agents work together, you ask Claude Code to perform a certain task. It usually generates one or more exploratory sub-agents. These sub-agents will scan the entire file system through the ripgrep tool, retrieve relevant content, and each sub-agent has an independent context window.

I think Anthropic has done an excellent job in this regard - Facing a task, the model can accurately judge whether this task is suitable to be completed in a single context window or needs to be split and then executed. The model's performance in this aspect is amazing, and this is also the key to its ability to output high-quality results.

What's more interesting is that relying on the terminal operation feature, Claude Code has become the purest form of achieving composable atomic integration. If you're used to starting development from the IDE, such as using Cursor or the early Codex, you'll find that this more flexible context retrieval method is actually not easy to achieve naturally.

Host: This is really unique. I'm quite surprised personally. I don't know if you have this feeling, but it always feels like there's a kind of retro-futuristic feeling. The CLI technology from twenty years ago has actually defeated all kinds of IDEs that were once highly anticipated.

Calvin French-Owen: I completely agree. And the fact that Claude Code is not an IDE is actually very crucial because it allows you to keep a certain distance from the code you're writing. The core of an IDE is to browse files, right? You need to keep all the code states in mind and sort out the logic. But the CLI is completely different, which gives it more room for play in the design of the usage experience.

I don't know if you have this feeling. When I use Claude Code, it feels like I'm "racing" in the code, and all operations are extremely smooth. There will be a small progress indicator on the interface, giving me status feedback at any time, while the code being written itself is not the visual core.

The development environment is very messy. I really like the conceptual simplicity of the sandbox. But in actual use, I encountered many thorny problems. For example, I couldn't even handle simple tests: The sandbox needed to access the PostgreSQL database, but the connection always failed; the codex.md file I wrote was only twenty lines long, but it still couldn't run.

But in the CLI, the tool can directly access the development database. I'm not sure if this is compliant, but I did try to let it access the production database to perform some operations, and it actually did it. For example, once, I encountered a concurrency problem and wanted to troubleshoot it. As a result, I found that this tool could actually debug a five - layer nested delayed task, find the problem, and automatically write test cases. After that, the problem never appeared again. This is really incredible.

Host: That's right. And I think the promotion and acquisition methods of products have been seriously underestimated. Think about Cursor, Claude Code, and the command - line version of Codex. You just need to download them and use them without applying for any usage permissions from the company. The difference in the usage experience brought by this is really huge.

Doing a good job in context management is the key to using top models well

Host: You have a lot of practice in the field of code agents. What suggestions do you have for those who want to build such tools? What practical experiences can you share?

Calvin French-Owen: I think the most important thing is to do a good job in context management.

At that time, we built a checkpoint for an inference model and then carried out a lot of fine - tuning work on it based on reinforcement learning (RL): We would assign various programming - related tasks to the model, such as solving programming problems, fixing test cases, and implementing new functions. Then, through the way of reinforcement learning, we trained the model on how to more accurately handle these tasks. Of course, most people can't do this step at present, but what everyone can do is to think more about what context information should be provided to the agent to make it output the optimal results.

For example, observing the working process of Claude Code, it will generate multiple exploratory sub - agents. These sub - agents will retrieve various code - related contents in the file system. After completion, they will bring back the context information and summarize it for me, so that I can clearly know how to proceed with the work later.

It's very interesting to look at the context construction methods of different agents. For example, Cursor uses semantic search. It converts all content into vector form and then matches the content most relevant to the query requirements; while Codex and Claude Code actually use the ripgrep code search tool. This method works because the context information density of code is very high. A line of code usually has less than 80 characters. There won't be many large data blocks or JSON format files in the code repository. Even if there are, the number is extremely small.

You can refer to the ignore rules of Git (code version management tool), first filter out irrelevant content or packaged files, and then find the context of the code through Git and ripgrep. In this way, you can well understand the actual function of the code. At the same time, such tools can automatically scan the structure of the entire folder, and the LLM (large language model) is particularly good at generating complex Git commands. It would be a torture for humans to write these commands manually. And this whole set of operations is actually the implementation of reinforcement learning (RL) in real scenarios.

I'm also working on an agent integration system in non - programming fields. I've learned a lot from the R & D process of code agents: Convert data into a format close to code, so that the model can quickly retrieve relevant peripheral information and obtain structured and effective data.

Host: The core ability of excellent code agents is context engineering. Then, what are the skills to become one of the top 1% users of such tools? What's your technology stack like? How do you use these tools to significantly improve efficiency?

Calvin French-Owen: The first skill is to minimize the writing of underlying code and infrastructure.

I usually deploy my technology stack on platforms such as Vercel, Next.js, or Cloudflare Workers. These platforms have encapsulated a large amount of boilerplate code, so I don't have to worry about building various services myself, nor do I have to deal with problems such as service discovery, central endpoint registration, and database configuration. All functions can basically be implemented within one or two hundred lines of code. I also prefer to adopt a microservice architecture or use well - structured independent software packages.

Secondly, you need to understand the core advantages of the LLM.

Actually, the characteristics of code agents were also mentioned by Andrej Karpathy on Twitter recently: They have extremely strong execution ability. No matter what problems they encounter, they will always try to solve them and eventually tend to make more expansions on the existing basis. So if you want to guide it to complete a certain task, you must give clear instructions. Here, I can take OpenAI as an example. They have a huge monorepo (monolithic code repository) that has been used for several years, and thousands of engineers are submitting code on it. Among these engineers, there are experienced senior developers who are proficient in writing production environment code; there are also newly graduated doctors with relatively less programming experience. The personnel composition is very different, so the LLM will learn different code styles according to your guiding direction. I think there is still a lot of room for exploration in code agents, such as researching the optimal code generation paradigm. Obviously, providing the model with a self - verification method can significantly improve its performance, such as running test cases as much as possible in code inspection, CI, and other links.

I also use code review robots frequently myself. The robot made by the Reptile company incubated by YC is very handy to use; the vulnerability detection robot of Cursor is also very good, and I also often use Codex for code review. It performs particularly well in verifying the correctness of code.

These are all areas where code agents are particularly good at. In addition, their ability to explore the code repository is also very excellent.

Of course, agents also have shortcomings: They are good at making expansions, but if your requirement is not to expand functions, they often repeat writing code, wasting a lot of time on functions that have already been implemented. At this time, you will feel that "it completely doesn't understand my needs".

Another problem is context pollution. The agent may get stuck in a loop. Because of its strong execution ability, it will always move forward in the wrong direction, and the context information it refers to actually has no help in solving the problem. So a method I often use is to actively clean up the context. For example, when the token occupancy rate of the context exceeds 50%, clean it up in time.

Host: Wow, this ratio is actually very crucial. I don't know if you've noticed that in the YC (abbreviation of Y Combinator, the world's top startup incubator) 2024 fall incubation camp, the founder of the HumanLayer company, Dex Horthy, always talks about this topic and specifically proposed the concept of the "LLM stupid zone": When the number of context tokens reaches a certain threshold, the output quality of the model will start to decline.

Calvin French-Owen: I completely agree with this view. From the perspective of the working logic of reinforcement learning (RL), this point is even more obvious.

Imagine that you are a college student taking an exam. In the first five minutes of the exam, you will feel that you have plenty of time and can definitely answer the questions well and think carefully about each question; but if there are only five minutes left and you still have half of the test paper to finish, you will be in a hurry and just want to finish writing as soon as possible. This is the same principle for the context window of the LLM.

Entrepreneurs have a small trick that I think is very practical: Add a "canary detection" message at the beginning of the context, that is,