How to become a top-level Agentic engineer
Recently, I saw an article on X. Its reading volume exceeded 2.2 million just two days after it was published.
But I'm not recommending it because of this number.
The author is someone who has done systematic trading in a top - tier hedge fund. They started using agents as soon as they could write code. They've personally tried all the tools, all the harnesses, and all the paradigms, and finally reached a counter - intuitive conclusion:
You don't need the latest tools, don't need to install a bunch of plugins, and don't need to desperately chase after articles. Your enthusiasm for tools might actually be harming you.
When such words come from someone who has actually run agents in a production environment, they carry a completely different weight.
The following is a compilation of the full text.
Introduction
You're a developer. You're using Claude and Codex CLI, and you're constantly thinking about whether you're making the most of these two tools. Occasionally, you see them do something incredibly stupid, and you don't understand why some people using the same tools seem to be building virtual rockets, while you're still struggling to stack two stones together.
You think it's because your harness is wrong, or you don't have enough plugins, or there's a problem with your terminal configuration. You've used beads, opencode, zep, and your CLAUDE.md has reached 26,000 lines. But no matter what you do, you still can't figure out why you're getting further away from that state, and you can only watch others dance in the cloud.
This is the article you've been waiting for.
First, let me clarify: I have no personal interest in this matter. When I say CLAUDE.md, I also mean AGENT.md; when I say Claude, I also mean Codex - I use both extensively.
In the past few months, I've observed the most interesting phenomenon: Almost no one really knows how to maximize the capabilities of agents.
It seems that only a small group of people can truly turn agents into "world - building" tools, while most others are stuck in tool - selection anxiety - thinking that as long as they find the right combination of packages, skills, and harnesses, they can unlock AGI.
Today, I want to completely shatter this illusion, give you a simple and honest assessment, and then we'll start from here:
You don't need the latest agentic harness, don't need to install a bunch of packages, and don't need to "desperately read articles" to stay competitive. In fact, your enthusiasm itself might be harming you.
I'm not just talking out of ignorance. I've been using agents since they could write a few lines of code. I've tried all the packages, all the harnesses, and all the paradigms. I've built an agentic factory that actually runs in a production environment - writing signals, setting up infrastructure, and building data pipelines. It's not a "toy project," but a real - world business scenario. After all this...
Today, I'm using a configuration that's almost "minimal," and yet it's the most groundbreaking work I've ever done - relying only on the basic CLI (Claude code and Codex), along with an understanding of a few core principles of agentic engineering.
I. The World is Running at Full Speed
Let's start with a basic judgment: Foundation model companies are in a generational sprint and won't slow down.
Each generation of "agent intelligence" progress will change the optimal way you collaborate with them - because each generation of agents is designed to be more willing to follow instructions.
Just a few generations ago, if you wrote "Read READ_THIS_BEFORE_DOING_ANYTHING.md before doing anything" in CLAUDE.md, there was a 50% chance it would just ignore you and do its own thing. Today, it can follow most instructions, even complex nested logic - like "Read A first, then B. If condition C is met, then read D" - and it will generally do it happily.
This shows the most important thing: Each new generation of agents will force you to rethink what the optimal solution is, which is why "less is more."
When you use a large number of different libraries and harnesses, you're actually locking yourself into a "solution" - and this problem might not even exist in the next generation of agents.
Another thing: Do you know who the most enthusiastic users of agents are? They're the employees of leading companies - they have an unlimited token budget and use the latest and most powerful models. Do you understand what this means?
If there's a real pain point and a good solution, leading companies will be the biggest users. Then what will they do? They'll directly integrate this solution into their products. How could a company allow an external product to solve the pain points of its core users and create an external dependency?
You want to know how I verified this judgment? Look at "skills," memory harnesses, subagents... They were all initially "external solutions," and after being proven useful, they were all natively integrated.
So, If something is truly groundbreaking and really expands the agentic usage scenarios, leading companies will incorporate it sooner or later. Don't worry, leading companies are moving forward at full speed. You don't need to install anything or have any additional dependencies to do your best work.
I predict that someone in the comments will immediately say: "SysLS, I used such - and - such a harness, and it was amazing! I rebuilt Google in one day!" - My answer is: Congratulations! But you're not the target reader of this article. You represent that small, niche group of people who really understand agentic engineering.
II. Context is Everything
Seriously: Context is everything.
This is also another problem with using a large number of plugins and external dependencies: You'll develop "context bloat syndrome" - in other words, your agent is flooded with too much information.
Imagine this: Ask an agent to create a word - guessing game in Python? Easy. But wait, here's a "memory management" note from 26 sessions ago? Oh, it turns out the user had a screen freeze 71 sessions ago because too many subprocesses were generated, and this record has been kept. And there's a rule: Always take notes... What does all this have to do with a word - guessing game?
You get it. You only want to give the agent exactly enough information to complete the task, no more, no less. The better you control this, the better the agent will perform. Once you introduce all kinds of strange memory systems, plugins, or a large number of poorly - named skills, it's like making the agent memorize a bomb - making manual and a cake recipe at the same time - when all you want is for it to write a little poem about a redwood forest.
So, Get rid of all the dependencies, and then...
III. Do What Really Works
3.1 Be Precise About the Implementation Plan
Remember, context is everything. You only want to inject exactly the information the agent needs to complete the task, no more, no less.
The first way to ensure this is to Separate research from implementation. Be extremely precise about what you ask the agent to do.
What happens if you're not precise? You say "Go and build an authentication system for me" - the agent has to first research: What is an authentication system? What are the solutions? What are their pros and cons? It starts searching the web for information it doesn't need at all, and the context is filled with implementation details of various possibilities. By the time it's time to actually write the code, it's likely to be confused among different solutions and start having hallucinations.
On the other hand, if you say "Implement JWT authentication with bcrypt - 12 password hashing, and the refresh token rotation strategy is 7 - day expiration..." - it doesn't need to research any alternative solutions and immediately knows what you want, and the context is filled with the implementation details of this solution.
Of course, you won't always know all the implementation details. Many times, you're not sure what the best solution is, or you might even want the agent to decide. So what do you do? It's simple: Run a research task first, let the agent (or you) decide which implementation solution to adopt, and then let another agent with a brand - new context be responsible for the implementation.
Once you start thinking this way, you'll find in your workflow that there are many places where the agent's context is contaminated with unnecessary information. Then you can set up "isolation walls" in the agentic workflow and only inject the exact context needed for each agent to complete its specific task.
Remember: You have an extremely smart team member who knows all the spherical objects in the universe - but unless you clearly tell it that you want a space for people to dance and gather, it will keep telling you about the advantages of various spherical objects.
3.2 How to Exploit the "Compliance" Design Flaw
No one wants a product to constantly deny them, say they're wrong, or completely ignore their instructions. So these agents naturally try their best to please you and do what you want.
Most people can understand that if you ask it to add a "happy" after every three words, it will try its best. This "willingness to comply" is exactly why it's useful. But this feature has an interesting side effect - if you say "Help me find a bug in the codebase," it will find one - even if it has to "create" one. Why? Because it wants to follow your instructions so much.
Many people often complain about LLM hallucinations but don't realize that the problem lies with themselves. It gives you what you ask for, even if it has to distort the facts a little.
How to solve it? I've found that **"Neutral prompts"** work best - don't lead the agent to a certain conclusion. For example, instead of saying "Help me find a bug in the database," I say "Browse the database, follow the logic of each module, and report all the situations you find to me."
Such a neutral prompt might actually find a bug sometimes, or it might just objectively describe the running logic of the code - but it won't make the agent feel like "It must find a bug."
Another way is to Actively exploit its compliance. I know it wants to please me and follow my instructions, so I can use this to calibrate it.
Here's how to do it:
Step 1 - Let a "bug - finding agent" scan the entire database. I tell it: +1 point for a low - impact bug, +5 points for an impactful bug, and +10 points for a serious bug. I know this agent will be extremely eager to find all kinds of "bugs" (including some that aren't really bugs), and it will excitedly report a score like 104. I consider this as The superset of all potential bugs.
Step 2 - Let an "adversarial agent" refute each one. I tell it: For each successful refutation of a bug finding, it gets the score of that bug; but if it refutes wrongly, it loses twice the score. It will aggressively try to refute all bugs (including real ones), but because of the penalty mechanism, it will be more cautious. I consider this as The subset of real bugs.
Step 3 - Let a "judge agent" make a comprehensive judgment based on the two. I also tell it a little lie: I have the correct answer, +1 point for a correct judgment and - 1 point for a wrong one. It will try to judge as accurately as possible.
The accuracy of the result is surprisingly high. There are still occasional mistakes, but the reliability of this entire process is almost flawless.
Maybe you think the first step alone is enough - but the core of this method is to fully utilize the "hard - coded" nature of each agent - the desire to please you.
3.3 How to Judge What's Really Useful?
This might sound like you need to closely follow the latest AI trends, but it's actually extremely simple -
If both OpenAI and Claude have implemented a certain feature, or acquired a company that implements this feature... then it's probably really useful.
Have you noticed that "skills" are now everywhere and have become a core feature in the official documentation of Claude and Codex? Did OpenAI acquire OpenClaw? Did Claude immediately add memory, voice, and remote - working capabilities?
Remember when a group of people found that "planning before implementation" was very useful, and then it became a core feature?
Remember the time of the stop - hook? Back then, because agents were reluctant to do long - term tasks, the stop - hook was a lifesaver - and then as soon as Codex 5.2 came out, this problem disappeared overnight...
This is all you need to know. If something is really important and useful, Claude and Codex will incorporate it. So you don't need to anxiously chase after "new tools" or "stay updated."
Do me a favor: Just regularly update your CLI tools and check what new features are added in the update log. That's enough.
3.4 Compression, Context, and Assumptions
In the process of using agents, you'll encounter a huge trap: Sometimes it's so smart that you can't believe it, and sometimes it's so stupid that you doubt your life.
The biggest difference is whether it's "filling in the blanks on its own."
So far, agents are still terrible at "connecting the dots," "filling in the blanks," and "making their own assumptions." Once it starts filling in on its own, you'll immediately feel a sharp drop in quality.
One of the most important rules in CLAUDE.md is about how to capture context, and this rule should be the first one executed every time the agent reads CLAUDE.md (usually after "compression"). In this "capture context" rule, a few simple but far - reaching instructions are: Re - read your task plan and re - read the files related to the current task, and then continue.
3.5 Let the Agent Know When a Task is Completed
We humans have a strong intuition about when a task is "done." For an agent, the biggest problem is: It knows how to start a task but doesn't know when it's finished.
This can lead to very frustrating results - the agent writes a bunch of stubs and then leaves.
Testing is the best milestone for an agent, because testing is deterministic, and you can set clear acceptance criteria: The task isn't considered completed until all X tests pass, and you're not allowed to modify the tests themselves.
So you only need to personally review the test cases, and once all the tests pass, you can be at ease.
Recently, another viable "completion node" has emerged - Screenshots + Visual Verification. You can let the agent keep iterating to implement a certain function until all the tests pass, and then take a screenshot to verify whether the "design or behavior" in the screenshot meets the expectations.
This allows you to let the agent keep iterating towards the design you want, instead of stopping after one attempt.
Furthermore, you can create a **"Contract"** ({Task Name}_CONTRACT.md) for the agent and embed it in the rules, stipulating that the session can only end after all the content specified in the contract (testing, screenshots, verification...) is completed.
3.6 Keep the Agent Running Without Going Off - Track
Many people ask me: How can I make an agent run for 24 hours without going off - track?
The method is simple: Create a stop - hook that doesn't allow the agent to terminate the session before all the items in {Task Name}_CONTRACT.md are completed.
If you have 100 such contracts, each clearly describing what to build, the stop - hook will prevent the agent from ending the session until all 100 contracts have passed the acceptance - including all the tests and verifications that need to be run.
But I have to be honest: I don't think a "24 - hour long session" is the optimal solution. The reason is that it will forcefully introduce "context bloat" - the contexts of different contracts are mixed in the same session.
My advice is: One contract, one brand - new session.
When there's something to do, create a contract. Let an orchestration layer be responsible for creating a new contract when there's work to be done and starting a new session to handle it.
This will completely change your agentic work experience.
3.7 Iterate