In weniger als 3 Monaten erreichte der Unternehmenswert 4 Milliarden Yuan. Fal.ai CEO: Je mehr Modelle wir haben, desto wertvoller sind wir.
On October 22, 2025, AI infrastructure company Fal.ai announced the completion of a new round of $250 million in financing. It is reported that Kleiner Perkins and Sequoia Capital led this round of investment, and the company's valuation exceeded $4 billion.
It has been less than three months since the previous Series C financing with a valuation of $1.5 billion.
This startup with fewer than 50 employees has not trained any self-developed large models, nor does it pursue the most powerful parameters.
It only does one thing: Make models callable and commercially available.
Fal.ai co-founder and CEO Gorkem Yurtseven defined himself in a subsequent exclusive interview as follows:
Instead of competing in model capabilities, we enable any model to be used by developers. The more models there are, the more valuable our platform becomes.
Eighteen months ago, they were still working on data infrastructure tools, handling data cleaning and transformation for large companies.
Until Stable Diffusion became extremely popular, they noticed that the underlying logic had changed: in the past, it was difficult to train models; now, there are too many models, but few people can use them well. They abandoned their paid products, regarded models as raw materials, and turned inference into an assembly line.
(Source: TechCrunch: Fal.ai Completes New Round of Financing, Valuation Exceeds $4 Billion)
Today, the Fal platform hosts more than 600 models and serves over 2 million developers. Adobe, Canva, Shopify, and Perplexity have already adopted it as the infrastructure for generative media.
This company doesn't talk about "AGI alignment" or rely on "paper press conferences" to gain popularity.
But it has grasped a key issue: When there is an explosion of models, who will undertake their implementation?
The answer lies in this interview with Gorkem Yurtseven.
Section 1 | Not Making Models, but Building a "Gas Station" for Models: What Did Fal Bet Right On?
How did Fal get started?
In 2022, they were still working on a tool for data teams, mainly helping enterprises clean data and manage data pipelines. They had customers, revenue, and the support of investors.
But Fal founder Gorkem Yurtseven said: We decided to abandon all existing customers and focus entirely on model inference.
Behind this decision was a clear judgment: What is truly booming is not model training, but model invocation.
When Fal made this transformation, it was the time when Stable Diffusion became extremely popular. Subsequently, OpenAI's ChatGPT, DALL·E 2, and Sora were launched one after another.
In the past, if you wanted to use AI, you had to train your own model and have your own big data. Now, things are different. The models are already trained, and as long as you can access them, you can create products.
Gorkem recalled that they asked themselves a key question at that time:
Which direction can enable us to reach $1 million in revenue the fastest? Which can get us to $10 million the fastest?
Their answer: Instead of creating a new model, build a platform that allows any model to go live directly.
The market signals were obvious:
- Whenever a new model is released, many people want to use it;
- But these people either lack GPUs or don't know how to deploy models;
- Model authors also don't know how to create APIs or handle operations.
What Fal does is to make the transition from the laboratory to the product extremely fast for models.
The first model they supported was Stable Diffusion. Later, it was the Flux video model. Subsequently, they were able to integrate almost all mainstream open-source multimodal models on the day of their release.
Gorkem described it like this:
"Everyone wants to do similar things over and over again. So, we optimize the most common use cases to make it easy for people to get started."
Today, Fal has become one of the first platforms for multimodal model releases. As long as a model can generate images, videos, or audio, it can be invoked, tested, and integrated into enterprise products on Fal.
Not building engines, but building "gas stations."
This judgment was the first step in their three rounds of financing and the doubling of their valuation.
Section 2 | Without GPU Advantages, How to Achieve Speed Advantages?
What made Fal most well - known was not the financing news, but two hats in Silicon Valley.
One was printed with "GPU Rich," and the other with "GPU Poor." As a result, the "GPU Poor" hat was snatched up first.
This was a joke and also a true reflection. Gorkem said: At that time, we couldn't get GPUs, so we started to optimize the speed of each model invocation like crazy. We don't have a lot of resources, but we use them efficiently and adjust them quickly.
He said that their core goal was only one:
Make each model invocation as fast, stable, and inexpensive as possible.
This was not simply about reducing costs but about debugging the entire model operation process as a "production line."
Fal first optimized the Stable Diffusion model. They had an engineering team of only two people. One was good at writing low - level code, and the other was proficient in GPU instructions. Every day, they did only one thing: find out where they could save time and where they could increase speed.
Gorkem said: We don't study how to make models smarter but how to make them run faster.
They achieved it: The time to generate the same image was reduced from dozens of seconds to less than a few seconds; the inference time of video models was also reduced to the lowest in the industry.
How did they do it?
- Is the model's cold start slow? Then cache it in advance;
- Is GPU allocation chaotic? Then schedule it intelligently according to model popularity;
- Do multiple models compete for resources? Then build 28 nodes to separate the traffic.
Fal doesn't simply pile up hardware but manages models as production processes and allocates them efficiently like an assembly line.
Gorkem emphasized that the key to all this was that they insisted on deploying models themselves and controlling APIs, not allowing customers to "upload their own code": We don't build an open - ended platform but package commonly used functions into a set of general interfaces for plug - and - play and quick response.
Fal's product logic is clear: It's not about letting you build freely but allowing you to call directly.
Their model platform is like a control room, with more than 600 image, video, and audio models on standby at any time. After a customer makes a request, the system will provide results at the most appropriate node in the shortest time.
Thanks to this, they later served millions of developers and supported large enterprises such as Adobe and Shopify.
Section 3 | Technology Is Not the Barrier, the Entry Point Is
Fal's early customers were mostly individual developers. They had no budget, no team, and even no GPUs.
But these people were willing to spend money as soon as they went online.
Gorkem recalled:
We saw some users spending thousands or even tens of thousands of dollars on the platform every day. At that time, we knew this was not a toy.
These customers were not testing models but wanted to launch products. Even if the models were open - source, they didn't want to spend time deploying them themselves.
They only wanted an API that they could call directly, get stable outputs from, and rely on without any glitches.
This was Fal's entry point: Instead of providing the most powerful models, package the most commonly used models into one - click tools.
When using Stable Diffusion to generate images, Fal provides a pre - packaged inference interface;
If you want to use the Flux video model, you don't need to download the model or configure the environment. You only need to connect to an API;
The platform will automatically allocate GPUs and pre - load models to ensure that you can get results immediately after making a request.
Gorkem clearly stated:
"We don't build a hosting service that everyone can customize but a set of inference platforms that respond quickly, can be implemented, and are commercially available."
We don't give too much freedom but handle the most complex parts for you.
Customers don't care who trained the model. They only care whether the results can be returned on time, whether the price is controllable, and whether there will be problems after integration.
Once they are integrated, they are too lazy to switch.
Fal doesn't bind users with a single model but with the overall experience of the platform: quick launch, unified interfaces, automatic model updates, and stability and reliability.
Just like AWS's S3, the key is not the most complex technology but making it easy and reassuring for developers to use.
Fal turns models into services and the "entry - point experience" of models into its own product. Every time a new model is released, Fal's first reaction is not to read the paper but to figure out how to integrate it into the platform and launch it within three hours.
Gorkem was very specific:
"We will start a Slack group call, and seven or eight people will tune the model online together. Some will adjust the code, some will share their screens, and some will test the interfaces. We've even live - streamed this process."
This is Fal's unique rhythm.
It's not about making the models more complex but about who can make them usable the fastest.
Section 4 | As Multimodality Becomes More Fragmented, the Platform Becomes More Valuable
In the past few years, the main theme in the AI field has been: Whoever trains models controls the future.
But in 2025, the game of multimodal models has changed.
Gorkem Yurtseven's judgment is: Now, three or five models are released every week, and developers can't keep up.
This wave of explosion is not driven by a single super - model but by the emergence of a large number of open - source, diverse, and application - specific models.
- Some are specialized in generating images;
- Some only deal with 4 - second videos;
- Some focus on 3D motion capture;
- Others are for voice synthesis and music composition...
They are all useful, but they have different formats, invocation methods, and speeds. From the architecture to the interfaces, the functions of each model are similar. As a result:
- Research institutes create good things, but no one can use them;
- Developers want to use new models but don't know where to find them or how to deploy them;
- Enterprises want to buy services but lack a reliable platform to connect with.
And Fal has seized this opportunity.
Gorkem's thinking is clear:
The more models there are, the more dispersed and fragmented they become, the more valuable we are. Because only a platform can integrate, optimize, and make them usable.
What Fal does can be understood as follows:
Help developers connect to 600 models at once without switching interfaces;
Help enterprises bypass compatibility issues and directly integrate with a unified solution;
Help model authors publish and launch their models on the platform without building their own servers.
Just as browsers unified website entry points and e - commerce platforms unified product entry points in the past, Fal is now unifying model entry points.
More importantly, this is a long - term trend.
Gorkem pointed out: "There won't be only one official version of a model. As long as it can be modified and replicated, there will be countless optimized versions for different scenarios."
This means:
- The "sole authority" status of large models is weakening;
- Open - source communities, startups, and research institutes will all release their own versions;
- The market will become a diverse and mixed - version situation.
At this time, infrastructure is more valuable than the models themselves.
This is the key to Fal's sky - rocketing valuation within three months. When video applications such as Sora became extremely popular, Fal, as the underlying infrastructure, naturally became one of the biggest beneficiaries.
Section 5 | 45 People, $100 Million: How Does a Small Team Achieve High Efficiency?
Fal's team size has always been surprising.
In 2025, their annual revenue exceeded $100 million, and their customers included large enterprises such as Adobe, Shopify, and Perplexity.
But the entire company has fewer than 50 people.
Gorkem was very straightforward: We don't have engineering managers. Everyone writes code, and even the leaders do it themselves. There are no hierarchies, no process reports, and no regular meetings. They prefer to form a small group of three or four people and solve problems directly online.
What they do is: When there is a problem, form a small group of three or four people and solve it directly online.
It seems loose, but the efficiency is extremely high.
At Fal, the team doesn't pursue a detailed plan but focuses on model popularity and is always ready to launch quickly.
Gorkem described them as follows: As long as a new model has potential, the whole team gets involved, like being on standby for battle. The key to this mechanism is not the number of people but the focused direction.
He said:
"We have only one core goal: revenue growth. As long as it helps with this, other things can wait."
In the early days, Fal didn't even have a sales team. Only the founder negotiated with customers personally. Many enterprise users were naturally converted from the platform.
For example:
Developers use the API on the platform to create applications and spend thousands of dollars per month;
The system monitors the activity and automatically alerts the sales team;
The sales team intervenes and offers annual contracts, price discounts, and customized support.
This is not the traditional sales process of SaaS companies but relies on the natural growth of the platform itself.
They have a simple internal indicator: Customers with a daily expenditure of more than $300 automatically enter the sales conversion pool.
This allows them to cover a large number of customers with very few salespeople. To date, the sales and customer support teams only have about a dozen people.
Fal's high efficiency is not only a business strategy but also a cultural gene.
They are extremely cautious about recruitment.
"We don't recruit people just because we have money, nor do we start projects just for rapid expansion."
Gorkem admitted that many of their early recruits were former colleagues, friends, and developers active in the community. Some worked in large Silicon Valley companies, and some were in Turkey, but they were all willing to get their hands dirty and deeply tune models.
The models are ready to be launched before they become popular;
Customers are already spending on the platform before they approach;
The product is at the forefront before the team expands.
It's not built with a large budget but by spending every bit of manpower, technology, and resources on the most critical areas.
Conclusion | Models Are Just Raw Materials, the Platform Is the Outlet
Fal.ai has not trained its own large models, has no open - source weights, and has not published any technical papers.