Top GitHub & Hacker News: Open-Source Project Slashes AI Programming Costs by 98%

In addition, the context-mode extends the large model's memory retention from 30 minutes to 3 hours.

Text | Li Jiaxing

Editor | Zhou Xinyu

One-sentence Introduction

context-mode is a context optimization MCP (Model Context Protocol released by Anthropic) plugin specifically designed for AI programming.

It addresses the core pain points of "model amnesia" and "excessive Token consumption" that developers encounter during long - term development.

According to the team, in programming scenarios, context - mode can reduce the cost of AI programming by 98% and increase the memory of large models from 30 minutes to 3 hours.

Team Background

Behind context - mode is a multinational startup team with diverse backgrounds. Currently, the core members of the team are distributed in 4 countries such as Turkey and France, and they mainly collaborate asynchronously through GitHub.

Mert Köseoğlu (Core Developer, Founder): He has provided technical services as a technical consultant for companies like OpenAI and has over 10 years of experience in full - stack engineering and system architecture. Before starting his business, he served as a senior software engineer at globally renowned data and SaaS platforms such as Countly, Planhat, and Jotform.

Sun Yicheng (Core Developer, Responsible for Multi - platform Adaptation): The Chinese face in the team, currently a sophomore in college. He was shortlisted for the Strengthening Foundation Program (top 18 in the province in mathematics and physics), has independent development experience with the Temporal - RAG (Temporal Data Retrieval Augmentation) engine, and won the silver award in the Zhihu Global A2A (Agent - to - Agent) Hackathon.

Product and Business

Image source: context - mode

Simply put, context - mode is an open - source MCP plugin that specifically "relieves the burden" and "organizes the memory" of AI programming assistants.

After the project was released, it topped GitHub Hacker News and has now received over 15,000 Stars on GitHub. context - mode has attracted more than 243,000 developers to access it, completed the underlying adaptation of 15 mainstream platforms, and has been adopted by the R & D teams of technology companies such as Microsoft, Google, Meta, ByteDance, and Cursor.

The reason why context - mode has received a lot of attention in the geek community is that it precisely addresses a troublesome industry anxiety: developers driven crazy by expensive API bills and large - model amnesia

With the popularization of fully automated AI programming agents such as "Lobster (OpenClaw, an open - source Agent framework)", the application threshold of Vibe Coding has been further lowered.

However, while enjoying the efficiency improvement brought by AI, users quickly realized that intelligence comes at a high price: on the one hand, the Token pricing of top - tier models such as Claude and GPT is not cheap. The advanced packages with sufficient Token quotas are often priced as high as $200 per month.

On the other hand, limited by the current capabilities, during the execution of specific tasks, the repeated trial - and - error and repeated retrieval of the model will cause additional Token waste.

In actual development scenarios, large models often behave like a "data - processing machine without common sense". Team member Sun Yicheng shared a pit - falling experience:

When participating in a Kaggle data competition, he assigned a training task with 300 sets of data to Claude. To confirm the task progress, instead of writing a timing script, Claude chose to initiate a global retrieval of the entire project every 5 seconds. This extremely inefficient "staring" strategy caused 90% of the API quota of a high - end membership account to be consumed within just half an hour.

Meanwhile, large models also have the problem of "amnesia". Developers found that when the code volume reaches the invisible upper limit of some mainstream IDEs (Integrated Development Environments) (such as 164K), the system has to discard or compress historical information, causing the model to forget key details. This results in: the AI that was writing code smoothly one second will forget all the key pre - set architectures and constraints the next second.

Facing the serious "hallucination" and "amnesia" of large models, context - mode offers a solution: Since large models are both expensive and inefficient at processing massive amounts of raw data, deprive them of the right to directly read the raw data.

Sun Yicheng gave an analogy: "Traditional AI programming is like watching a marathon. The large model will stare at every step of each runner, which will of course exhaust its context. What context - mode does is to put the process of running the marathon into a shielded sandbox, and the large model only needs to look at the final ranking results."

Specifically in terms of the working principle, first, by introducing a "virtual sandbox" and precise retrieval, context - mode can effectively reduce Token consumption.

In the traditional call mode, each call of the MCP tool is extremely expensive. A large amount of raw data will be directly dumped into the context window of the large model, resulting in an increase in Token consumption.

The "virtualized sandbox" mechanism of context - mode is like building a "firewall" between the large model and the operating system. It will first store all files and running records locally and then help the large model find the relevant content when needed.

Test results from "Intelligent Emergence".

According to the test of "Intelligent Emergence", after accessing context - mode, when the large model reads a 79.3 KB file, the Token consumption cost is reduced by 87.7%.

Secondly , to solve the "amnesia" pain point of large models, context - mode builds "checkpoints" to monitor every file edit of the developer in real - time.

When the conversation is too long, it will actively build and inject a "snapshot" usually less than 2KB into the AI, which is equivalent to establishing a "checkpoint" during the code editing process. The official said that this mechanism can increase the effective continuous programming time of large models from 30 minutes to 3 hours.

Finally, context - mode introduces a mandatory "Think in Code" paradigm to save Token consumption.

The so - called Think in Code, simply put, is to prevent the model from reading and processing files line by line. Instead, let the model write a "small program" first. Let the "small program" complete the data analysis locally and then feed the refined results back to the model.

Mert, the founder of context - mode, told "Intelligent Emergence" that developers have fallen into a misunderstanding: they are used to directly throwing massive amounts of data to the large model for processing. In fact, for a data statistics task of 50 files, instead of letting the model read each file personally, it is better to let the model write a script first. Let the script complete the statistics work and then return the results to the model.

In Mert's words, a script can replace more than a dozen expensive tool calls and save a hundred times the context.

According to the test of "Intelligent Emergence", after accessing context - mode, when the model processes a file, it saves 99.98% of the Token cost.

The entry threshold of context - mode is lower than that of independent development software (IDEs) such as Cursor, which require re - downloading and adapting to the environment. As a lightweight MCP (Model Context Protocol) plugin middleware, context - mode can be directly integrated into the developer's existing workflow.

The context - mode team also provides a series of shortcut commands to view the Token savings of major platforms. Users only need to enter the command in the chat box, and the browser will pop up a local data statistics panel, recording how many times the API has been called that week and how many times context - mode has intercepted invalid data reads.

△List of shortcut commands. Image source: context - mode

Recently, context - mode has launched "Context as a Service" for enterprise R & D scenarios.

In enterprise R & D scenarios, the ROI of AI is often difficult to measure.

For this reason, context - mode has launched the enterprise service "Insights". After obtaining authorization, the plugin installed on the programmer's computer can directly send the process data of the programmer's use of AI (such as which tools were called, how many times there were errors, and how much money was consumed) to the server where Insights is located.

Meanwhile, Insights can also provide different data reports for different positions. For example, for the security director, the system will automatically generate a security report; for the finance team, the system can provide a detailed breakdown of Token consumption.

Currently, Insights is still in the targeted internal testing phase.

Founder's Thoughts

Stop regarding large models as "data processors". In essence, they are "code generators".

Many platforms and developers are currently in a misunderstanding. They like to directly read 50 files into the context and let the large model "count" how many functions there are.

This is not only slow but also extremely wasteful of computing power. Our proposition is "Think in Code" - LLM should write a statistical script to complete the counting and only output the results in the end.

A script can replace more than a dozen expensive tool calls and save a hundred times the context. In the future AI programming paradigm, this is an underlying iron law that all platforms must follow.

Unlimited context is a false proposition. Restraint is the most difficult barrier to build for AI tools.

The industry is competing for the long - text ability of large models (such as 100K or even 1M context), but this is actually a trap. Dumping dozens of KB of error logs to AI all at once will only accelerate its "amnesia" and hallucinations.

The real solution is not to blindly expand but to establish a highly restrained "state memory layer (sandbox)". Whoever can compress the invalid noise passed to AI to the extreme can truly help developers extend the continuous programming time from 30 minutes to 3 hours.

The bottleneck of next - generation AI programming lies not in how smart the model is but in how clear the context management framework is.

Now everyone is complaining that AI keeps making the same mistakes on the same Bug. This is not because the model has become stupid but because it gets lost in the long conversation.

Only by providing AI with checkpoints like in a single - player game and forcing it to read memories by priority can we leave enough space for its truly valuable logical reasoning.

Large companies are competing for the "all - in - one package", while we are creating a cross - platform "universal socket".

We spend a lot of effort adapting to the different underlying logics of Cursor, Claude, Gemini, etc. because the real developer ecosystem is always fragmented and rapidly iterating.

Developers don't need another all - powerful agent deeply bound by large companies. What they need is a lightweight, memory - efficient, plug - and - play middleware that can significantly reduce API bills.

This article is originally produced by「李嘉星」， For reprint or content cooperation, please click Reprint Instructions ；Unauthorized reprint will be held accountable.

Topping GitHub Hacker News, This Open-Source Project Cuts AI Programming Costs by 98% | Emergent Project

One-sentence Introduction

Team Background

Product and Business

Founder's Thoughts