Security Company Stops Human Code Interaction, Open - Sources Model: No Writing, Reading, or Reviewing

Can software still be delivered even if no one writes or reads the code?

Can software still be delivered without anyone writing or reviewing code?

In February 2026, StrongDM, a company focused on infrastructure security, publicly announced the results of a "lights-out software factory" production line.

In this production line, humans no longer write code directly or conduct code reviews. Development has shifted from interactive collaboration to "feeding specifications and scenarios into the system." Subsequently, agents automatically generate code, run tests/evaluation harnesses, and iterate repeatedly in the feedback loop until the results converge and are ready for delivery. The team wrote this approach into their charter, with the most important statement being "No hand-coded software."

StrongDM AI also unusually open-sourced them:

One of the repositories is: https://github.com/strongdm/attractor.

This is the most core non-interactive coding agent in their "software factory" system. However, there isn't a single line of code in this repository; it only contains three Markdown files that extremely detailedly describe the complete specifications of the software, along with a prompt in the README - simply hand these specifications over to the coding agent of your choice for execution.

Another repository is https://github.com/strongdm/cxdb.

This is closer to a traditional software release: it includes 16,000 lines of Rust, 9,500 lines of Go, and 6,700 lines of TypeScript. This is their "AI Context Store" - a system for storing conversation histories and tool outputs, with data organized in an immutable DAG format.

In the discussion on Hacker News, a developer quickly followed the instructions and ran the entire process. He said that he carefully read the documentation in the Attractor repository and strictly followed the specifications provided by StrongDM to have Claude build a complete application based on the spec. The final result was an AI agent that could directly use the Claude API Key, and its overall quality was "significantly better than the results generated when the model was left to operate freely."

What impressed him the most was the volume and level of detail in the specifications: the entire spec was approximately 6,000 - 7,000 lines, covering behavioral constraints, interface semantics, and system boundaries. "In the past, when assigning projects to agents, my specifications were at most one page long. I was really shocked by the density of details this time."

Of course, this open-source release was not a "polished" demonstration version. As soon as the code was released, developers on Hacker News quickly examined it and pointed out suspected bugs, Rust anti-patterns, and relatively lenient error handling methods. In response, Jay Taylor, a member of the StrongDM AI team, commented in the discussion area that these projects "were only decided to be open-sourced in the past few days" and had not undergone sufficient technical optimization. They have now arranged for agents to continue cleaning up and improving CXDB.

This practice was soon recognized by the academic community. Ethan Mollick, a professor at the Wharton School who studies AI and organizational change, said straightforwardly when reposting StrongDM's public content that this was a "truly radical approach to software development": "There is almost no human intervention. Even if this approach may not be suitable for most scenarios, we need more such leapfrog ideas to redesign processes instead of just integrating AI into old processes."

In his view, the truly valuable progress is not to "add a little more AI" to the existing processes, but to completely redesign the processes around AI.

An internal experimental line "banning handwritten code"

StrongDM is a company focused on infrastructure access and identity security. Its core work is to manage how human and non-human identities securely connect to databases, cloud resources, and various internal systems.

Their AI team was established six months ago. On July 14, 2025, Jay Taylor, Navan Chauhan, and Justin McCarthy, the co-founder and CTO of StrongDM, officially separated an internal exploratory project into a dedicated team.

After the new team was established, their first task was not to write code but to draft a charter. Justin McCarthy mentioned in a retrospective that in the first hour of the team's establishment, they clearly defined a set of constraints that they must follow.

Code shall not be written by humans.

Code shall not be reviewed by humans.

If you're spending less than $1,000 in token costs per human engineer today, your software factory has a lot of room for improvement.

In StrongDM's own retrospective, this decision was not made on a whim. Its background can be traced back to the end of 2024. After the second revision of Claude 3.5 was released in October 2024, the team began to notice an unusual change: in long-term agentic programming tasks, the results started to accumulate correctness rather than just piling up errors.

By December 2024, this change could be clearly observed through Cursor's YOLO mode.

StrongDM wrote in their blog that before this, repeatedly using LLMs for coding tasks often led to the accumulation of misunderstandings, hallucinations, syntax errors, dependency incompatibilities, and other issues, ultimately causing the system to "gradually break down." However, combined with the YOLO mode, Anthropic's updated model first demonstrated what they later referred to internally as the embryo of "non-interactive development" or "growing software."

In this context, the newly established team set an extreme experimental premise from the start: no handwritten code was allowed. In July 2025, this still sounded quite radical.

What's most thought-provoking is the second rule: code shall not be reviewed by humans. After all, it's well-known that large language models are extremely prone to making "non-human" errors. Under such circumstances, completely abandoning manual code review seems counterintuitive.

Moreover, security software has always been the type of system that is least willing to be supported by "LLM code without manual review."

After the rules were implemented, problems emerged: if nothing is written by hand, how can we ensure that the code actually runs? Having agents write their own tests is only useful under one premise - they don't "cheat," such as simply writing an "assert true."

This was quickly distilled into a more fundamental question: when both the implementation and testing are generated by coding agents, how can you prove that the software you deliver works? StrongDM's answer was inspired by scenario testing (Scenario Testing, Cem Kaner, 2003). They described it as follows:

We redefined the term "scenario" to represent an end-to-end "user story." These scenarios are usually stored outside the codebase (similar to the "holdout set" in model training), can be intuitively understood by LLMs, and can be flexibly verified.

Since the software they build often includes agentic components, StrongDM also abandoned the binary definition of success like "all tests passing" and instead adopted a measurement method closer to the real user experience. They introduced the concept of "satisfaction" to quantify the verification results: what percentage of the execution trajectories observed in all scenarios are likely to satisfy users?

They treat these scenarios as an "isolation set" and store them in a place that coding agents cannot directly access to evaluate the overall behavior of the system. This design is quite interesting as it, to some extent, mimics an extremely expensive but highly effective practice in traditional software engineering - a powerful end-to-end test conducted by an external QA team.

Synthetic scenario planning and shaping interface

From the overall principles of the software factory, StrongDM summarized all this into a clear process: "seed → verification → feedback loop." The system first receives a minimal starting point - a few sentences, screenshots, or an existing codebase. Then it runs scenarios in a verification environment as close to the real world as possible and continuously feeds the output back to the input, allowing the system to self-correct in the closed loop and continuously accumulate correctness. The cycle will continue until all isolated scenarios not only pass but also continue to pass. Tokens are described as the fuel for this production line.

Leave "acceptance" to the spec?

In StrongDM's software factory, the spec is not a design manual for humans to read but the core input for the entire system to start, correct, and converge.

In the traditional development process, the spec is more of an "alignment tool": it helps engineers understand what needs to be done, but the actual implementation details, trade-offs, and compromises often occur during the coding and code review processes. In StrongDM's setting, when the premise is "humans don't write code and humans don't review code," the role of the spec is completely shifted forward - it is no longer a reference material but the de facto control plane.

The team requires the system to be able to "grow from progressive natural language specifications" and must be able to complete verification without conducting semantic checks on the source code. In this setting, "acceptance" itself is also redefined. The spec and scenarios together form a continuously running evaluation benchmark: whether the behavior generated by the model complies with the specifications is not judged by humans reading the code but by whether the results it produces in these scenarios continuously meet expectations.

In other words, StrongDM's method shifts the focus of coverage from "how many tests are written manually" to "whether the specifications/scenarios are sufficient and accurate" + "whether the verification ecosystem can capture exceptions in the closed loop."

Based on this concept, StrongDM further proposed another key concept: the Digital Twin Universe (DTU).

StrongDM defines the Digital Twin Universe as a set of behavior-level clones of third-party services. They built twin systems for Okta, Jira, Slack, Google Docs, Google Drive, and Google Sheets, replicating the APIs, boundary conditions, and observable behaviors of these services.

With the DTU, they can conduct verification at a scale and rate far beyond the limitations of the production environment. They can test failure modes that are dangerous or even impossible to attempt on real services, and they can run thousands of scenarios per hour without worrying about hitting rate limits, triggering abuse detection, or accumulating API costs.

So how are the key behaviors of Okta, Jira, and Slack "cloned"? The answer is: using coding agents.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Don't write, don't read, don't review: This security company has decided to stop letting humans touch code and has open-sourced this model.

An internal experimental line "banning handwritten code"

Leave "acceptance" to the spec?