HomeArticle

Programming enters the "walkie-talkie" era. Claude rushes to launch voice code writing, and all transcription tokens are free.

新智元2026-03-03 15:36
Claude Code has launched a voice mode, supporting voice programming for a more natural interaction.

[Introduction] Claude Code officially launches the voice mode: Enter /voice, long - press the space bar to speak, and release it to complete the input. The voice transcription flows into the cursor position in real - time, seamlessly switching with the keyboard, and the transcription tokens are completely free. The next battlefield in programming is not the model's IQ, but the interaction method.

Just now, Anthropic added a voice mode to Claude Code.

From now on, you can write code just by speaking.

Claude Code is a command - line AI programming tool produced by Anthropic.

Previously, you had to type to communicate with it. Now, you don't need to.

Enter the /voice command to enable the voice mode. Long - press the space bar to speak, and release it to complete the input.

It's exactly the same as a walkie - talkie.

Currently in the gray - scale testing phase, about 5% of users will be the first to experience it, and it will be gradually rolled out in the next few weeks.

If your account has the permission, the welcome interface will prompt you when you open Claude Code.

What's so amazing about the voice mode?

It's not just a simple voice - to - text conversion.

The text transcribed from the voice will be output in real - time and in a streaming manner directly at the cursor position.

It's similar to what netizens shared below.

What does it mean? You can type half of the prompt first. When you encounter complex logic and are too lazy to type, long - press the space bar to switch to voice, rant about that hard - to - describe logic, release the space bar, and then continue typing.

Seamless connection. No overwriting. No replacement.

This is the key - it doesn't replace the keyboard, but complements it.

Imagine a scenario: You're debugging a tricky bug that involves three - layer callback nesting and a race condition.

It would take five minutes just to organize the language to describe it by typing.

But if you just speak? Humans are naturally good at verbally describing chaotic scenarios. You can finish in thirty seconds.

There's also a great benefit: The tokens for voice transcription are completely free. There's no charge. No quota deduction. You can speak as much as you want.

What's the reaction from the other side?

Interestingly, OpenAI's Codex also added a similar function almost at the same time.

The update log of Codex version 0.105.0 clearly states - hold the space bar to record, release to transcribe, and the text will be directly input into the terminal interface.

It uses the Wispr voice engine. Currently, it supports macOS and Windows, but Linux hasn't caught up yet.

Moreover, this function needs to be manually enabled:

Set features.voice_transcription = true in the configuration file.

Both companies took action almost simultaneously.

This is not a coincidence, but a consensus.

The next battlefield for programming tools lies not in how smart the model is, but in how natural the interaction is.

What does the community think?

The developer community has actually been working on it for a long time.

Before the official voice mode, there was a community project called Voice Mode on GitHub, which added voice capabilities to Claude Code through the MCP protocol.

It uses Whisper for voice recognition and Kokoro for voice synthesis, and it can even run offline.

There are also various third - party tools - AquaVoice, Superwhisper, Voicy - all vying for the niche of voice - based code writing.

Some people have achieved full hands - free operation with Talon Voice, and even the Ctrl + C in the terminal can be done by speaking.

Now, the official has directly entered the arena.

Are the third - party tools trembling? Maybe not.

Because the official voice mode is more like an entry - level function - it lowers the threshold and makes more people realize that code can be written by speaking.

What's the experience of voice programming like?

According to the feedback from early users, there are several scenarios where it's particularly useful:

When debugging

Describing a bug verbally is much faster than typing.

When you speak, you'll naturally bring out more context - like, on the login page, when you enter an email with a plus sign, it reports an error during verification - the information density of this kind of human rambling, you'd definitely be too lazy to type it.

When discussing architecture

I want this API to use JWT for authentication, with the access token expiring in fifteen minutes, the refresh token expiring in seven days, and adding a refresh endpoint - it takes ten seconds to say it, but a minute to type it.

When not in front of the computer

When you're eating, drinking coffee, have an injured hand, or have tendonitis - in these scenarios, voice input is not just a bonus, but a necessity.

But it also has its drawbacks.

You still have to type variable names, URLs, and code snippets. The recognition rate of voice recognition for camelCase, underscore naming, and various abbreviations is still not stable enough.

So the best practice is: Use your mouth for natural language parts and your hands for precise code parts.

A bigger signal

Let's take a broader view.

In 2024, Cursor made AI - powered code writing mainstream, and hitting the Tab key was the trend at that time.

In 2025, Claude Code and Codex made it possible for AI Agents to code autonomously.

In 2026, the addition of the voice mode fills in the last piece of the puzzle for human - computer interaction.

Programming is undergoing an input revolution.

The keyboard won't disappear, just like the mouse didn't disappear.

But the main bottleneck in programming has shifted from writing code to expressing intentions.

And the most primitive and efficient way for humans to express intentions is speaking.

The average speaking speed of humans is about 150 words per minute, while the typing speed is about 40 words per minute.

This 3 - 4 times difference is the market that voice programming aims to capture.

Let's imagine:

Looking a few steps ahead.

If voice input is accurate enough and AI can understand code intentions deeply enough, then the ultimate form of programming might be like this:

You're sitting on the sofa, saying to the computer: Refactor the permission system of the user module and change it to the RBAC model. Don't forget to write the tests.

The AI automatically reads the code, understands the architecture, writes the implementation, runs the tests, and creates a pull request.

You take a look at the diff, say "LGTM", and merge it.

From writing code to stating requirements, from a programmer to a programming director.

This day is closer than most people think.

The /voice command in Claude Code is just a starting point.

It may still be rough and have various small problems.

But the direction is right - Future programming will definitely be multi - modal.

The keyboard, voice, and even gestures and eye movements - all natural human expression methods will become input channels for programming.

Looking back at today from that time will be like looking at punch cards from twenty years ago.

It will seem quite cute.

Finally, you can write code by speaking instead of typing.

Although after speaking, you may still have to fix bugs manually.

Reference: https://x.com/bcherny/status/2028629573722939789 

This article is from the WeChat official account “New Intelligence Yuan”. Editor: Dinghui. Republished by 36Kr with authorization.