Human Defeats AI in Programming to Win Championship, Ultraman Gives Thumbs Up: 16 Top Programmers' Tests Reveal AI Programming as an "Efficiency Illusion"

Can the code written by AI still be used after all!

Humanity has prevailed (for now!) Humanity has temporarily achieved victory!

Yesterday, at the AtCoder 2025 World Tour held in Tokyo, a human contestant, Psyho, temporarily defeated the automated program OpenAIAHC submitted by OpenAI by a large margin and took the top spot.

The AtCoder World Tour Finals is an annual event hosted by AtCoder, aiming to determine the world champion of competitive programming. The first place goes to Psyho from Poland, and OpenAIAHC ranks second.

As soon as the news came out, even OpenAI CEO Altman personally retweeted it with the caption "Well done, Psyho!"

This victory is indeed worthy of celebration, but it is only temporary. OpenAIAHC is closely chasing in second place. The strength of AI in programming competitions is getting stronger and stronger, and the programs developed from scratch by AI are approaching the level of top human contestants.

Just as AlphaGo "fought" Lee Sedol back then, the advantages of AI programming are gradually emerging and taking the leading position step by step.

Today's developers are surrounded by tools such as Claude Code, Gemini CLI, and Cursor. It's no longer a question of "whether to use" but "how to use."

Recently, the release of Kimi K2 has made Claude Code popular again. Besides how fast K2 runs and how large the model is, many people noticed for the first time that its API is connected to Claude Code.

Start Claude Code, write a prompt, press Enter, and a large section of well - structured functions will be written. The same goes for Gemini CLI and Cursor.

Programming has changed from a painful process of fighting bugs alone to a creative game of building blocks with AI. It even has a nice name called Vibe Coding (ambient programming, collaborating with AI through prompts).

Many people share their Vibe Coding experiences on social media. Some people say that Claude is "the most powerful code assistant they've ever used," but some experienced developers also share their painful experiences of using Claude.

Some experienced developers find the code written by AI "disgusting."

Is Vibe Coding really feasible? Is it the intelligence that ranks second in programming competitions or the repeated "disgust" in daily programming? Not long ago, a new study poured cold water on AI programming.

A counter - intuitive study: AI programming may be less efficient

Recently, the US AI security agency METR announced a practical study on Claude 3.5/3.7. They recruited 16 experienced open - source developers and asked them to use Claude Code to assist in programming in familiar projects.

The results of the experiment with 16 experienced developers (red), and the prediction results of economists, machine learning experts, and developers before, during, and after participating in the study from left to right (green).

The results surprised many people:

After developers used AI, the average time to complete tasks increased by 19%.

More interestingly, although the result was a slowdown, the participants self - reported that they felt they were faster! They thought AI had helped a lot, the coding process was smoother, and the efficiency had increased by 20%.

The "AI hallucination" seems to have transferred to humans, becoming an "efficiency hallucination." You think you're faster, but actually you just feel faster.

Why is this the case? The study summarized the following reasons:

Writing prompts takes a lot of time and often needs to be revised repeatedly;
Most of the code given by Claude cannot be used directly. You need to manually modify the logic and check for bugs;
You get into a "distracted state" in the cycle of "prompt - wait - modify."

Seeing this, we also began to wonder if we would encounter the same problems when using these tools to write something ourselves?

So we conducted a small experiment.

Can Vibe Coding really make you soar?

We designed a small task that doesn't seem difficult but has a somewhat complex logic:

Write a command - line tool that takes a keyword as input and returns the titles of posts containing the keyword on Zhihu's hot list, with a limit on the number of output items.

This task involves network requests, HTML parsing, string matching, and command - line parameter parsing, which is just right to test the capabilities of Claude Code and Gemini CLI.

Here we use Gemini CLI to complete this task. Although it's really awkward to use Chinese in the command line, both Gemini CLI and Claude Code support Chinese input.

It was very fast. Maybe because the task was relatively simple, there was no need to wait long for the code generation process. It first automatically generated a list of libraries that needed to be installed for web crawling, and then generated main.py, the core code file.

The problem encountered was that Zhihu requires login. It automatically used Google to search for a public API and tried to use other tools, but it was still useless. Finally, it told me that I needed to input the Cookie myself.

Although it didn't help me complete the task in a short time, the whole experience was really comfortable. It's like directing an intern to do a job. You can't scold an intern for doing a bad job, but you can directly scold Gemini CLI in Vibe Coding.

When using Claude Code with Kimi K2, similarly, we tried to let it complete a scientific research task from scratch. In an empty folder, I told Claude Code that I wanted to publish a paper at CVPR (a top - tier conference in computer vision), and I had a specific direction. It needed to help me write the code to complete the experiment.

The result was that by the time I used up all the free API Tokens of Kimi K2, the whole project was still almost nothing. It first confidently generated all the training code, network structure code, dataset code, test code, etc., and then told me that it could be run.

I said that its method was not novel at all. It admitted it. Then I asked it to find papers from the past two years, and then it used up all my Tokens.

Since the whole process was relatively short, I didn't intervene much manually and left it entirely to AI to handle. Even when problems occurred in the middle, I let AI solve them by itself.

I think its greatest advantage is that it can almost fully control the computer without me providing additional context information.

Can AI programming achieve both a sense of pleasure and efficiency?

What impressed me most in this small test is that AI gives more of a "sense of pleasure" rather than "efficiency."

You'll feel like a programming expert, and the code seems to magically appear. But once there's an error or the logic doesn't work, you'll find that you don't really understand the code and don't know how to fix it.

But I still think there's nothing wrong with the tools themselves. How to use the tools is the important factor in determining whether their potential can be fully realized.

Sean Grove from OpenAI gave a speech on "New Code" at AIEWF2025.

Sean Grove, who is engaged in alignment reasoning work at OpenAI, mentioned in a recent speech that when using AI programming tools, what's important is not prompt engineering but "specification."

There's a problem with the current "Vibe Coding." We keep the code generated by AI but discard the prompts that contain our original intentions. This is like "tearing up the source code and only doing version control on the compiled binary files," which is unsustainable.

The future of programming is no longer just about writing code but defining and communicating intentions through specifications. The real bottleneck and value lie in structured communication, and "specification" is the ultimate manifestation of this communication.

A developer who participated in the study mentioned at the beginning of the article shared his experience on X. He said he was the one whose efficiency decreased by 38% after using Vibe Coding.

He believes that LLM is just a tool, and we shouldn't expect it to be a "cure - all." Besides the drawback that only specific types of programming tasks have a large amount of clean training data, there are also "long - tail problems" such as context degradation, distraction during the waiting process for code generation, and the lack of an accurate success measurement standard for LLM code tools.

However, he also mentioned at the end that "if we want to make good use of this new tool, we must understand its (and our own) shortcomings and actively adapt."

So, is there really a way for everyone to achieve both efficiency and a sense of pleasure when using these AI programming tools?

Besides these "frustrating" experience shares, many users on X also share how these AI programming tools have improved their productivity.

Someone said that Claude Code can use the computer just like you. He created a Claude.md document and told Claude in this document how to access important directories in his folders, such as folders for memories, diaries, ideas, code, to - do lists, notes, and scripts.

In addition, he created

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

A human beats AI in programming to win the championship, and Ultraman gives a thumbs up. Sixteen top programmers' actual tests reveal: AI programming is actually an "efficiency illusion".

A counter - intuitive study: AI programming may be less efficient

Can Vibe Coding really make you soar?

Can AI programming achieve both a sense of pleasure and efficiency?