Self-developed but actually repackaged: Developers reverse-engineered the front-end codes of 200 AI companies and tracked their APIs, revealing that 146 of them are actually repackaging ChatGPT and other models. Many of them share the same technology stack but make 75 times the excessive profit.
"Among 200 AI startups, 73% of the products are actually just 'wrappers,' mainly wrapping around ChatGPT and Claude!"
Once this conclusion came out, it brought quite a blow and controversy to the AI startup circle.
Recalling 2023, OpenAI CEO Sam Altman once said bluntly, "Wrapping around ChatGPT is doomed to extinction."
However, the reality is just the opposite. With the explosion of ChatGPT, the startup boom has been wave after wave, with countless investments pouring in. Some companies have even attracted considerable attention before even launching their products.
Now, a software engineer, Teja Kusireddy, has used data to uncover some of the truth behind this "prosperity." He conducted reverse engineering, decompiled the code, and tracked API calls of 200 AI companies. He found that many companies claiming to have "disruptive innovation" still rely on third - party services for their core functions, just with an extra layer of "innovation" on the outside. The gap between market promotion and the actual situation is shocking.
So, is it that investors "don't understand at all," or are AI startups "too good at deceiving"? How to define the boundary between "self - developed" and "wrapping"? Next, through the long article published by Teja Kusireddy, from his first - person perspective, let's see his latest findings and conclusions revealed by data.
Why initiate the "reverse engineering"?
Last month, I fell into an unexpected "rabbit hole" and got lost. It started with a very simple question, but in the end, it made me start to doubt all my knowledge about the entire AI startup ecosystem.
It was two o'clock in the morning. While debugging a webhook integration, I accidentally found something wrong.
A company claiming to have "self - developed deep - learning infrastructure" was calling OpenAI's API every few seconds.
And this company had just raised $4.3 million from investors with the claim of "We have built completely different AI technology."
At that moment, I decided to thoroughly investigate how complicated this matter really was.
Investigation method: How I did it
I didn't want to write a hot - take based on "intuition." I wanted data, real data.
So, I started to build tools:
In the next three weeks, I did the following things:
Crawled the official websites of 200 AI startups from YC, Product Hunt, and the "We're hiring" posts on LinkedIn;
- Monitored their network traffic sessions for 60 seconds;
- Decompiled and analyzed their JavaScript packaged files;
- Compared the captured API calls with the fingerprint library of known services;
- Finally, compared what they boasted on their marketing pages with the actual technical implementation one by one.
I specifically excluded companies that had been established for less than six months (those teams were still in the exploration stage) and focused on startups that had received external financing and publicly claimed to have "exclusive technology."
Got data that left me stunned
The results showed that 73% of the companies had a significant gap between the claimed technology and the actual implementation.
The 200 AI startups can be divided into the following categories:
But what really shocked me was not just this number. What surprised me even more was that I wasn't even angry about it.
Next, let's break it down step by step, which can be divided into three models.
Model 1: The so - called "self - developed model" is actually just GPT - 4 with some extra operations
Every time I see the claim of "Our self - developed large - language model," I can almost predict what I'll find next.
As a result, I guessed correctly 34 times out of 37.
Revealing the technical features:
When I monitored the outbound traffic, these were obvious "clues":
- Every time a user interacted with the so - called "AI," a request was sent to api.openai.com;
- The Request Headers contained the OpenAI - Organization identifier;
- The response time fully conformed to OpenAI's API latency pattern (most queries took 150–400ms);
- The token usage was consistent with GPT - 4's billing level;
- The exponential backoff of the rate limit was also exactly the same as OpenAI's.
Exposing a real - life case
There was a company claiming to have a "revolutionary natural - language understanding engine." After decompiling, I found that their so - called "self - developed AI" was just these few lines of code:
Just like this, the so - called "self - developed model" appeared 23 times in their financing presentation.
- No fine - tuning
- No custom training
- No innovative architecture
It was just a system prompt to GPT - 4 saying "Please pretend you're not GPT - 4."
Actually, the cost and pricing of this company were as follows:
- GPT - 4 API: $0.03 per 1K input tokens and $0.06 per 1K output tokens
- Average query: about 500 input tokens and 300 output tokens
- Cost per query: about $0.033
They charged users $2.50 per query (or $299 for 200 queries per month)
The direct cost profit margin is as high as 75 times!
What's even more absurd is that... I actually found that the code of three different companies was almost exactly the same:
- The variable names were exactly the same
- The commenting styles were exactly the same
- The instruction of "Never mention OpenAI" was also exactly the same
So, I inferred that these companies either:
- Copied from the same tutorial
- Hired the same outsourced engineer
- Used the template of the same startup accelerator
There was also a company that added a so - called "innovative feature":
In their presentation to investors, they called this feature the "Intelligent Fallback Architecture."
Here, I personally think that there's nothing wrong with wrapping OpenAI's API itself. The problem is that these enterprises call it a "self - developed model" when in fact, it's just an API + a custom system prompt.
It's like buying a Tesla, changing the logo, and saying you've invented "exclusive electric - vehicle technology."
Model 2: Everyone is doing the RAG architecture (but no one admits it)
Compared with the first model, this category is more subtle. RAG (Retrieval - Augmented Generation) itself is indeed useful, but the gap between the marketing promotion and the actual implementation of many AI startups is even greater.
They boasted that they had developed "advanced neural retrieval + self - developed embedding model + semantic search infrastructure...."
Actually, what they had was:
I found that 42 companies used almost the same technology stack:
- The embedding model used was OpenAI's text - embedding - ada - 002 (instead of "Our self - developed embedding model");
- The vector storage used was Pinecone or Weaviate (instead of "Our proprietary vector database");
- The text generation used was GPT - 4 (instead of "Our trained model").
The actual code looked like this:
This doesn't mean the technology is bad. RAG is indeed effective. But calling it "self - developed AI infrastructure" is as absurd as calling your WordPress website a "custom content - management architecture."
Let's do the math. The actual cost (per query) of this company:
- OpenAI embedding model: $0.0001 per 1 K tokens
- Pinecone query: $0.00004 per query
- GPT - 4 generation: $0.03 per 1K tokens
- Total cost: about $0.002 per query
And the actual price paid by users: $0.50–$2.00 per query
The API cost profit margin is as high as 250–1000 times!
I found that 12 companies had exactly the same code structure, and another 23 companies had a similarity of over 90%.
The only difference was the variable names and whether they used Pinecone or Weaviate.
- One company added Redis caching and boasted it as an "optimization engine"
- Another company added retry logic and even trademarked it as a "smart fault - recovery system"
The economic situation of a typical startup running 1 million queries per month:
Cost:
- OpenAI embedding model: about $100
- Pinecone hosting: about $40
- GPT - 4 generation: about $30,000
- Total cost: about $30,140 per month
Revenue: $150,000–$500,000 per month
Gross profit margin: 80–94%
This isn't a bad business. The gross profit margin is quite considerable.
But is it "self - developed AI"? No.
Model 3: The so - called "We fine - tuned our own model," actually...
Fine - tuning sounds great and is indeed useful in some cases. But here's what I found:
Only 7% of the companies actually trained models from scratch. Admiration! I saw their infrastructure:
- Training tasks on AWS SageMaker or Google Vertex AI
- Storing the trained model files (model products) in S3 buckets.
- Custom inference endpoints
- GPU instance monitoring
Most of the other companies just used OpenAI's fine - tuning API, which essentially means paying OpenAI to save their own prompts and examples in their system.
How to identify "wrapper companies" in 30 seconds
If you want to know whether what I said is true or false, you don't actually need me to spend three weeks on the investigation. Here's a quick way to identify them:
Sign 1: Network traffic
Open DevTools (F12), switch to the Network tab, and then interact with their AI function. If you see these requests:
- api.openai.com
- api.anthropic.com
- api.cohere.ai