Self - developed or Repackaged? Developers Reverse - engineer 200 AI Firms' Front - end Codes, Find 146 Repackaging ChatGPT, Making 75x Excessive Profits

The difference between a smart shell company and a fraudulent company is transparency.

"Among 200 AI startups, 73% of the products are actually just 'wrappers,' mainly wrapping around ChatGPT and Claude!"

Once this conclusion came out, it brought quite a blow and controversy to the AI startup circle.

Recalling 2023, OpenAI CEO Sam Altman once said bluntly, "Wrapping around ChatGPT is doomed to extinction."

However, the reality is just the opposite. With the explosion of ChatGPT, the startup boom has been wave after wave, with countless investments pouring in. Some companies have even attracted considerable attention before even launching their products.

Now, a software engineer, Teja Kusireddy, has used data to uncover some of the truth behind this "prosperity." He conducted reverse engineering, decompiled the code, and tracked API calls of 200 AI companies. He found that many companies claiming to have "disruptive innovation" still rely on third - party services for their core functions, just with an extra layer of "innovation" on the outside. The gap between market promotion and the actual situation is shocking.

So, is it that investors "don't understand at all," or are AI startups "too good at deceiving"? How to define the boundary between "self - developed" and "wrapping"? Next, through the long article published by Teja Kusireddy, from his first - person perspective, let's see his latest findings and conclusions revealed by data.

Why initiate the "reverse engineering"?

Last month, I fell into an unexpected "rabbit hole" and got lost. It started with a very simple question, but in the end, it made me start to doubt all my knowledge about the entire AI startup ecosystem.

It was two o'clock in the morning. While debugging a webhook integration, I accidentally found something wrong.

A company claiming to have "self - developed deep - learning infrastructure" was calling OpenAI's API every few seconds.

And this company had just raised $4.3 million from investors with the claim of "We have built completely different AI technology."

At that moment, I decided to thoroughly investigate how complicated this matter really was.

Investigation method: How I did it

I didn't want to write a hot - take based on "intuition." I wanted data, real data.

So, I started to build tools:

In the next three weeks, I did the following things:

Crawled the official websites of 200 AI startups from YC, Product Hunt, and the "We're hiring" posts on LinkedIn;

Monitored their network traffic sessions for 60 seconds;
Decompiled and analyzed their JavaScript packaged files;
Compared the captured API calls with the fingerprint library of known services;
Finally, compared what they boasted on their marketing pages with the actual technical implementation one by one.

I specifically excluded companies that had been established for less than six months (those teams were still in the exploration stage) and focused on startups that had received external financing and publicly claimed to have "exclusive technology."

Got data that left me stunned

The results showed that 73% of the companies had a significant gap between the claimed technology and the actual implementation.

The 200 AI startups can be divided into the following categories:

But what really shocked me was not just this number. What surprised me even more was that I wasn't even angry about it.

Next, let's break it down step by step, which can be divided into three models.

Model 1: The so - called "self - developed model" is actually just GPT - 4 with some extra operations

Every time I see the claim of "Our self - developed large - language model," I can almost predict what I'll find next.

As a result, I guessed correctly 34 times out of 37.

Revealing the technical features:

When I monitored the outbound traffic, these were obvious "clues":

Every time a user interacted with the so - called "AI," a request was sent to api.openai.com;
The Request Headers contained the OpenAI - Organization identifier;
The response time fully conformed to OpenAI's API latency pattern (most queries took 150–400ms);
The token usage was consistent with GPT - 4's billing level;
The exponential backoff of the rate limit was also exactly the same as OpenAI's.

Exposing a real - life case

There was a company claiming to have a "revolutionary natural - language understanding engine." After decompiling, I found that their so - called "self - developed AI" was just these few lines of code:

Just like this, the so - called "self - developed model" appeared 23 times in their financing presentation.

No fine - tuning
No custom training
No innovative architecture

It was just a system prompt to GPT - 4 saying "Please pretend you're not GPT - 4."

Actually, the cost and pricing of this company were as follows:

GPT - 4 API: $0.03 per 1K input tokens and $0.06 per 1K output tokens
Average query: about 500 input tokens and 300 output tokens
Cost per query: about $0.033

They charged users $2.50 per query (or $299 for 200 queries per month)

The direct cost profit margin is as high as 75 times!

What's even more absurd is that... I actually found that the code of three different companies was almost exactly the same:

The variable names were exactly the same
The commenting styles were exactly the same
The instruction of "Never mention OpenAI" was also exactly the same

So, I inferred that these companies either:

Copied from the same tutorial
Hired the same outsourced engineer
Used the template of the same startup accelerator

There was also a company that added a so - called "innovative feature":

In their presentation to investors, they called this feature the "Intelligent Fallback Architecture."

Here, I personally think that there's nothing wrong with wrapping OpenAI's API itself. The problem is that these enterprises call it a "self - developed model" when in fact, it's just an API + a custom system prompt.

It's like buying a Tesla, changing the logo, and saying you've invented "exclusive electric - vehicle technology."

Model 2: Everyone is doing the RAG architecture (but no one admits it)

Compared with the first model, this category is more subtle. RAG (Retrieval - Augmented Generation) itself is indeed useful, but the gap between the marketing promotion and the actual implementation of many AI startups is even greater.

They boasted that they had developed "advanced neural retrieval + self - developed embedding model + semantic search infrastructure...."

Actually, what they had was:

I found that 42 companies used almost the same technology stack:

The embedding model used was OpenAI's text - embedding - ada - 002 (instead of "Our self - developed embedding model");
The vector storage used was Pinecone or Weaviate (instead of "Our proprietary vector database");
The text generation used was GPT - 4 (instead of "Our trained model").

The actual code looked like this:

This doesn't mean the technology is bad. RAG is indeed effective. But calling it "self - developed AI infrastructure" is as absurd as calling your WordPress website a "custom content - management architecture."

Let's do the math. The actual cost (per query) of this company:

OpenAI embedding model: $0.0001 per 1 K tokens
Pinecone query: $0.00004 per query
GPT - 4 generation: $0.03 per 1K tokens
Total cost: about $0.002 per query

And the actual price paid by users: $0.50–$2.00 per query

The API cost profit margin is as high as 250–1000 times!

I found that 12 companies had exactly the same code structure, and another 23 companies had a similarity of over 90%.

The only difference was the variable names and whether they used Pinecone or Weaviate.

One company added Redis caching and boasted it as an "optimization engine"
Another company added retry logic and even trademarked it as a "smart fault - recovery system"

The economic situation of a typical startup running 1 million queries per month:

Cost:

OpenAI embedding model: about $100
Pinecone hosting: about $40
GPT - 4 generation: about $30,000
Total cost: about $30,140 per month

Revenue: $150,000–$500,000 per month

Gross profit margin: 80–94%

This isn't a bad business. The gross profit margin is quite considerable.

But is it "self - developed AI"? No.

Model 3: The so - called "We fine - tuned our own model," actually...

Fine - tuning sounds great and is indeed useful in some cases. But here's what I found:

Only 7% of the companies actually trained models from scratch. Admiration! I saw their infrastructure:

Training tasks on AWS SageMaker or Google Vertex AI
Storing the trained model files (model products) in S3 buckets.
Custom inference endpoints
GPU instance monitoring

Most of the other companies just used OpenAI's fine - tuning API, which essentially means paying OpenAI to save their own prompts and examples in their system.

How to identify "wrapper companies" in 30 seconds

If you want to know whether what I said is true or false, you don't actually need me to spend three weeks on the investigation. Here's a quick way to identify them:

Sign 1: Network traffic

Open DevTools (F12), switch to the Network tab, and then interact with their AI function. If you see these requests:

api.openai.com
api.anthropic.com
api.cohere.ai

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。