Semicolon Penetrates GitHub: One Less Line of Filtering Code Nearly Brings Down Hundred - Million

A tiny semicolon allows any GitHub user with push permissions to potentially execute arbitrary commands on the server backend. What it reveals is not just a single input filtering error, but the long - standing internal trust assumption that multi - tenant cloud platforms rely on.

On March 4, 2026, GitHub received a report submitted by Wiz through Bug Bounty. The attack entry described in the report was extremely simple:

A crafted git push with a push option that concealed a semicolon in its value.

Within 40 minutes, GitHub reproduced the vulnerability internally. It took approximately 75 minutes in total from confirming the root cause to the cloud fix going live, which was less than 2 hours.

Theoretically, any authenticated user with push permissions for a repository can execute arbitrary commands on the GitHub/GHES backend server that processes this git push.

Although the official GitHub blog clearly stated, "No customer data was accessed, modified, or leaked."

GitHub stated that since this exploitation chain triggers an abnormal code path that is not used in normal GitHub.com operations, they queried the telemetry and found no trigger records other than Wiz's tests.

Wiz itself also said, "We did not access the repository contents of any other tenants."

However, this incident still pried open the internal trust assumption that had been believed for many years.

A Semicolon

Breaking Through Three Layers of Trust

To understand this vulnerability chain, you first need to see what the internal git push pipeline at GitHub looks like.

When you execute a git push, the first stop for the request inside GitHub is called babeld. It is a self-developed git proxy by GitHub, responsible for forwarding your connection.

First, babeld asks gitauth: Does this user have the permission, and what rules should this push follow? Gitauth returns a list: file size limit, branch naming rules, and hook configuration.

Babeld packages this list into an internal request header called X-Stat and passes it down to gitrpcd.

Gitrpcd is an internal RPC service. It only recognizes this header and completely trusts every field in it.

Finally, the pre-receive hook, which is responsible for the final check, gets the X-Stat and decides whether this push can pass.

Throughout the entire pipeline, the X-Stat header is the pass.

GitHub's internal git push pipeline: babeld → gitauth → gitrpcd → pre-receive hook

The X-Stat header is a string of key=value pairs separated by semicolons, such as a=1; b=2; c=3.

During parsing, the system reads them into a map one by one. A crucial detail is that if the same key appears twice, the latter one will quietly overwrite the former one.

Where is the problem?

Git push has a normal function called push option, which allows you to pass some custom strings to the server when pushing code.

Babeld will encode these strings into the X-Stat as they are. For example, if you pass two options, it will become push_option_0=the content you passed; push_option_1=the content you passed.

Babeld forgot one thing: It didn't filter out semicolons.

There was originally a field large_blob_rejection_enabled=bool:true in the X-Stat, which is the switch for the file size limit and is enabled by default.

The attacker crafted a push option, stuffed a semicolon in its value, and then followed it with large_blob_rejection_enabled=bool:false. Babeld wrote it in as it was, and the same key appeared twice in the X-Stat.

Schematic of X-Stat field injection: How the push option crafted by the attacker overwrites a legitimate field

When the map parses the repeated key, the later-written value overwrites the previous one. The false injected by the attacker came later and won. Just like that, the file size limit was turned off.

The foundation of the entire attack was that missing line of filtering code in babeld.

The logic behind this vulnerability exists in many systems.

Multiple services are strung together on a pipeline. Each stop only focuses on its own work and assumes that the data passed from the previous stop is clean and problem-free. No one conducts a secondary check in the middle.

As a result, babeld didn't block the semicolon, gitrpcd didn't verify the received data, and the pre-receive hook used the fields directly.

Each stop assumes that what is passed from the previous stop is clean and trustworthy.

Three layers of trust stacked together, and a single semicolon broke through them all.

Injecting Three Fields

GHES Instances Compromised

Bypassing the file size limit was just the first test Wiz used to verify the feasibility of the injection. The real goal was to obtain the execution permission of the server.

Step 1: Turn off the sandbox.

GHES allows administrators to customize some hook scripts that run automatically before code is pushed. These scripts run in a sandbox by default, with limited permissions.

However, after reverse analysis, Wiz found that whether the sandbox is enabled depends on a field called rails_env in the X-Stat header. If the value is production, it goes into the sandbox; if any other value is filled in, it runs directly as the git service account without any isolation.

This field can be injected.

Step 2: Change the search directory for hook scripts.

Inject custom_hooks_dir to change the root directory where the system searches for hook scripts from the default location to a place that the attacker can control.

Step 3: Specify a malicious script to execute.

Inject repo_pre_receive_hooks and fill in a path traversal to make the system jump out of the normal directory scope and execute any binary file on the server.

After these three steps were strung together, the GHES server returned the following line of output:

remote: uid=500(git) gid=500(git) groups=500(git)

This line of output means that the code has been executed on the server as the git account. This indicates that the researcher can now execute commands on the GHES server as the git service account.

The PoC demonstrated by Wiz researcher sagitz on X: A normal git push command, and the remote server returned uid=500(git), indicating that the code was executed on the GitHub backend as the git service account.

Wiz aimed the same attack chain at GitHub.com but didn't succeed. The push completed, but the hook was not triggered, and the server didn't return anything.

They continued with reverse analysis.

Wiz found that there was also a flag hidden in the X-Stat, which controls whether the server runs in "enterprise mode". This flag is enabled by default on GHES, and custom hooks can always run; on GitHub.com, it is disabled by default, and custom hooks will not be triggered at all.

However, this flag is also in the X-Stat and can be injected.

After adding this fourth step, the entire chain was successful. Wiz executed the hostname command on GitHub.com, and the server returned an internal hostname ending with .github.net. They got in.

GitHub admitted in a post-incident blog that the non-production execution path should not have been in the production environment of GitHub.com.

This code was specifically excluded during the early deployment. Later, the deployment method changed, but the exclusion logic was not migrated, so this code quietly remained in the image and no one noticed it.

The reason why the vulnerability exploitation was successful was precisely because this "shouldn't be there" code happened to be there.

Wiz enumerated two compromised nodes and saw millions of repository index entries of other users and organizations on each node. They stated:

We did not read the repository contents of any other users. We only used our own test account to confirm one thing: The permissions of the git user can indeed access any repository on this node.

GitHub's logs also confirmed this. All the trigger records of that abnormal code path pointed to Wiz's own test traffic, and no other accounts appeared.

Being able to access and actually reading are two completely different things. This time it was the former, not the latter.

However, the root problem still lies in the underlying design of the multi-tenant platform.

GitHub.com stores the repositories of a large number of users and organizations on the same batch of servers and entrusts them to the same git service account for management because this account needs to process everyone's data.

Once an attacker obtains the execution permission of this account, a large number of repositories on that node will come into the scope of permissions.

The shared infrastructure brings efficiency, but it also brings this structural vulnerability that cannot be eliminated.

Patched in the Cloud in Two Hours

May Take Half a Year to Patch Locally

The real challenge lies in self-hosted GHES.

When Wiz publicly disclosed the vulnerability, data showed that 88% of GHES instances had not been patched.

According to GitHub's official action recommendations and the current NVD records, administrators should upgrade to the latest patched version of the corresponding supported branch. The currently listed fixed versions in NVD are 3.14.25, 3.15.20, 3.16.16, 3.17.13, 3.18.7, 3.19.4, and GitHub's official blog also lists 3.20.0 or higher.

GHES administrators need to do two things. Immediately upgrade, and then search the /var/log/github-audit.log for records containing semicolons in push options to check for any traces of unauthorized injection.

The trigger condition for this vulnerability is an authenticated user with push permissions for a repository on the instance.

The threshold is not high.

Any attacker who obtains an ordinary employee account, as long as this account can push code on the company's GHES, even for an insignificant internal project, can obtain the execution permission of the entire GHES instance.

Normally, all of a company's code assets, CI configurations, and internal credentials run on GHES. Once compromised, the consequences will be severe.

This vulnerability is not the only one on the current patch list for GHES administrators.

GitHub Enterprise Server 3.19.5 Release Notes, multiple HIGH-level vulnerabilities fixed in a single version https://docs.github.com/en/enterprise-server@3.19/admin/release-notes

Just looking at the update notes for version 3.19.5, there are still four other HIGH-level vulnerabilities listed at the same time: CVE-2026-5845, CVE-2026-5921, CVE-2026-4821, CVE-2026-4296.

When using SaaS to host code, patches take effect automatically once they are available. For self-hosted GHES, administrators have to keep up with each patch themselves.

The Moat of High Reverse Engineering Costs

AI is Filling It

There is a detail in this disclosure that was overlooked. Wiz specifically explained in its technical blog why they were able to find this vulnerability this time.

They said that the internal git pipeline at GitHub consists of a large number of compiled, closed-source binaries. In the past, they weren't unwilling to conduct an audit; rather, the cost of manual reverse engineering was too high, and they gave up halfway.

They came back to do it again this time because the tools had changed.

In the past, the reason why many closed-source software was "relatively safe" was not because the code itself was very secure, but because the cost of reverse analysis was too high, and no one was willing to spend that time.

Just because the code is not publicly available doesn't mean it can't be audited; it's just that auditing used to be economically unfeasible.

Now, the calculation has changed.

What used to take senior reverse engineers several months to complete can now be done in a few weeks with a combination of AI tools.

Wiz itself also said that this was the first batch of critical vulnerabilities found in closed-source binaries with the help of AI in their public records, using a combination of IDA Pro, the MCP protocol, and LLM.

The Wiz Research blog believes that AI-enhanced reverse engineering tools will play an increasingly important role in finding vulnerability types that require in-depth cross-component analysis. https://www.wiz.io/blog/github-rce-vulnerability-cve-2026-3854

AI is not only writing code but also disassembling it.

All systems that have relied on "closed-source and unaudited" for a long time should re-evaluate their real exposure.

The Trust Assumption Pried Open by a Semicolon

Let's go back to the original question.

After the appearance of CVE-2026-3854, GitHub's emergency response was textbook-level. All the necessary patches were available, and there were no known real attacks.

However, this incident left three things for backend engineers to seriously consider.

First, hidden dangers in system design are more difficult to detect than single-point vulnerabilities.

The delimiter protocol, implicit cross-service trust, and last-write-wins parsing semantics are not big problems on their own, but when combined, they are a ticking time bomb. Any system that uses semicolons, vertical bars, or line breaks to transmit internal metadata should review its code today.

Second, the risks of multi-tenant platforms are structural, not just a problem for GitHub.

As long as the data of different users is stored on the same machine and managed by the same service account, this sensitive nerve will always be there. Once an attacker obtains the execution permission, how much isolation can withstand depends on how strictly the permission boundaries of that shared account are designed.

Third, AI-assisted reverse engineering is rewriting the economic account of offense and defense.

AI-assisted reverse engineering is changing the cost structure of offense and defense. The next batch of enterprise-level software with critical vulnerabilities discovered is likely to be those that have relied on "closed

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

A semicolon penetrated GitHub. By writing one less line of filtering code, a code repository with hundreds of millions of lines of code was almost taken down.