HomeArticle

The Secrets of the Semantic Layer at Big Tech Companies: How Uber, Netflix, and Airbnb Manage Metrics and How You Can Apply These Methods in Your Own Company

王建峰2026-03-20 18:01
The Secrets of the Semantic Layer at Big Tech Companies: How Uber, Netflix, and Airbnb Manage Metrics and How You Can Apply These Methods in Your Own Company

In the past few years, everyone in the data field has been talking about the semantic layer.

Business intelligence vendors sell it as a convenient indicator model. Modern data architectures call it the indicator layer. AI teams claim that they can't build analytical agents without it. However, if you take a closer look at the architectures of major technology companies (Uber, Netflix, Airbnb, LinkedIn, Spotify), you'll find that their meanings are quite different from what the term "semantic layer" usually implies.

For them, it's not just a layer of indicators within BI tools. It's an independent infrastructure within the data platform. A platform that manages the definition, calculation, data quality, access control of business indicators, and how these indicators are used in BI, machine learning, products, and even AI systems.

Interestingly, many companies have partially revealed their architectural information in blogs, research papers, and architecture presentations. If you piece together these scattered pieces of information, a rather surprising picture will emerge. This article will attempt to do just that.

We'll collect publicly available evidence from the data engineering projects of large technology companies and reconstruct the real architecture of the semantic layer. We'll study how Uber and LinkedIn's indicator platforms work, why Netflix built the Metrics Repo, how Airbnb designed Minerva, why Spotify placed an API in front of the data warehouse, and what role the semantic layer is starting to play in AI systems.

The final result will be similar to a map: how the semantic layer actually works in large technology companies and which principles can be applied to more typical organizations. Perhaps the most interesting conclusion will be unexpected: in large technology companies, the semantic layer is not a BI function at all, but one of the key architectural layers of modern data platforms.

1. Semantic Layer Architecture in Large Enterprises

1.1 Uber

Indicator Platform Architecture

Uber built a centralized platform called uMetric to manage the entire lifecycle of indicators: definition, discovery, calculation, quality verification, and consumption.

Actually, it is both a semantic layer and an indicator platform.

Uber publicly describes its internal uMetric platform as a unified indicator platform that covers the entire lifecycle of indicators: definition, discovery, planning, calculation, quality, and usage.

In addition, Uber clearly states that the platform extends indicators to machine learning features, which means it is no longer just an analytical dictionary but a bridge between analysis and machine learning.

In 2025, Uber also introduced its conversational data agent Finch. It runs on a carefully curated single-table data mart and a semantic layer built on metadata. Finch uses metadata, column aliases, and values stored in OpenSearch to enable LLMs to generate more precise WHERE filters and significantly reduce errors.

  • Insights

At Uber, the semantic layer has actually become the control plane for machines, not just for analysts.

The most valuable evidence here is that their AI agents don't rely on the idea that "LLMs will infer the schema on their own." Instead, they rely on carefully managed data marts, metadata aliases, and controlled access rights.

In other words, enterprise-level AI truly built on data doesn't rely on the generation of raw SQL statements but on pre-built semantic contexts.

  • Core Concept of the System

The main idea of the system is to eliminate the differences between indicators calculated by different teams.

  • Simplified Architecture

[Event Stream] → [Data Pipeline] → [Indicator Definition] → [Indicator Calculation Engine] → [Quality Verification] → [Indicator API] → [Dashboard/Machine Learning/Application]

  • Key Insights

Uber clearly states that its indicator system is not only used for analysis but also serves as a machine learning feature platform.

This actually means: Semantic Layer = Feature Layer for Machine Learning

1.2 Netflix

Metrics Repo — Indicators as Code

Netflix built a system called Metrics Repo, which is a framework for centralized indicator definition.

When describing its experimental platform, Netflix explains that the Metrics Repo is an internal Python framework where users can define programmatically generated SQL queries and indicator definitions. Then, the system will centrally manage these definitions.

In a recent overview of its analytics project, Netflix emphasizes that the creation and use of internal indicators "are usually much more complex than they should be." In other words, even in a mature company like Netflix, the problem of inconsistent indicator definitions has not completely disappeared.

In addition, there is another important signal. In another article about cloud efficiency, Netflix describes an analytical data layer that provides time-series efficiency analysis for financial project use cases.

  • Insights

Netflix reveals some little-known inside information:

In large companies, the semantic layer is usually not a single, universal system. Instead, it consists of domain-specific indicator repositories and analytical layers for specific use cases — such as experiments, efficiency analysis, creative analysis, etc.

In other words, the real architecture is closer to federated semantic governance rather than the idea of "one semantic layer to rule them all."

This is not a direct quote — it's a conclusion based on Netflix's descriptions of its various indicator frameworks and domain-specific analytical layers.

  • Core Idea

Indicators are defined programmatically rather than within BI tools.

Therefore, indicator calculation is moved out of the ETL pipeline and closer to analysts.

  • Simplified Architecture

[Raw Data] → [Data Warehouse] → [Metrics Repo (Code Definition)] → [Experimental Platform] → [Statistical Engine] → [Dashboard/Decision System]

  • Key Insights

The Metrics Repo is not only used for business intelligence but mainly for:

A/B Testing, Product Experiments, Causal Inference

Netflix's research paper on its experimental platform confirms this. In other words, Netflix's semantic layer is part of the scientific experiment platform.

1.3 LinkedIn

Unified Indicator Platform

LinkedIn built the Unified Metrics Platform (UMP). The main problem it aims to solve is that different teams calculate the same indicators in different ways.

To solve this problem, LinkedIn took centralized measures: indicator definition, calculation, and service.

  • Simplified Architecture

[Raw Events] → [Kafka] → [Batch Processing + Stream Processing] → [Indicator Calculation] → [Indicator Storage] → [Indicator API] → [Dashboard/Service]

  • Key Insights

LinkedIn transforms the semantic layer into a real service, not an SQL model, but an indicator API.

1.4 Spotify

Semantic Layer Inside the Experimental Platform

Spotify built its own experimental platform. Its architecture is roughly as follows:

[Product Events] → [Data Lake] → [Indicator Definition] → [Experimental Engine] → [Statistical Analysis] → [Decision Dashboard]

  • Core Principle

Indicators must be reproducible. In other words, each experiment must be based on the same indicator definition.

1.5 Airbnb

Minerva — A Semantic Layer for the Entire Company

Airbnb developed a system called Minerva.

Airbnb clearly states that Minerva plays a core role in its new data warehouse architecture. It is responsible for ingesting fact tables and dimension tables, denormalizing the data, and providing it to downstream applications through an API.

They also revealed the scale of the system: more than 12,000 indicators, more than 4000 dimensions, and more than 200 data producers from different company functions.

Indicator and dimension definitions are stored in a centralized GitHub repository and go through code review, static verification, and test runs.

The system supports:

Defining quality checks, backfilling, version control

Cost attribution, GDPR selective deletion, access control

Automatic deprecation policies, usage-based retention

Airbnb summarized its goal very clearly: "Define once, use everywhere."

  • Insights

The real "secret" doesn't lie in the formula. Airbnb's semantic layer is neither a user interface feature nor a business intelligence feature — it's an engineering discipline.

Indicators are treated as code. Metadata is mandatory. There is a review process. Intermediate calculation results can be reused. Deprecation and lifecycle management are formalized.

In other words, Minerva not only solves the problem of "how to calculate KPIs" but also the problem of "how to prevent business meanings from being scattered among hundreds of teams."

Airbnb clearly explains that simply standardizing tables is not enough. Standardization must be done at the indicator level because users use indicators, dimensions, and reports, not tables.

Minerva manages indicators, dimensions, and KPI calculations.

  • Core Idea

Define once, use everywhere

  • Simplified Architecture

[Data Warehouse] → [Semantic Layer (Minerva)] → [Indicator Calculation] → [Indicator API] → [Analysis Tools]

Airbnb also points out that it has extended its data quality score to Minerva indicators and dimensions.

This is a crucial signal: unless an indicator has a trust signal, it is not considered a complete object.

  • Insights

A real enterprise semantic layer almost always consists of three components:

Definition of meaning

Calculation mechanism

Trust/quality signal

Without the third component, it's just a formula dictionary, not an enterprise-level semantic layer. Airbnb's Minerva + data quality score and the independent quality pillar in Uber's uMetric platform clearly support this conclusion.

1.6 Pinterest

In a recent article about text-to-SQL, Pinterest explains that before parsing queries, they enrich the context in the following ways:

Table and column descriptions

Standardized terms

Metric definitions

Data quality considerations

Recommended date ranges

They also explain that without this context, LLMs can only see the raw tables and columns, thus losing the business meaning of the data.

Pinterest also points out that this context information is automatically maintained in the following ways:

AI-generated documentation

Glossary propagation based on connections

Semantic matching based on search

  • Insights

This provides strong evidence for a new trend. In the AI era, the semantic layer is no longer just an expression like this: Revenue = SUM(x)

It also includes:

Synonyms for fields

Data quality considerations

Acceptable date ranges

Valid connection paths

These are exactly the elements often missing in traditional BI semantic layer products — although they are crucial for text-to-SQL systems and agent-driven analysis.

2. Semantic Layer Matrix of Large Technology Companies

3. The Real Situation

When these practices are combined, they form a unified architecture for the semantic layer in large technology companies.

[Data Source] → [Data Warehouse/Lakehouse] → [Transformation Layer] → [Indicator Definition (Git)] → [Indicator Calculation Engine] → [Indicator Catalog] → [Indicator API] → [BI / ML / Application / AI]

This represents a complete enterprise-level semantic layer architecture.

Actually, it's not easy to replicate this architecture within an ordinary company.

Most organizations already have data warehouses, transformation tools, and BI dashboards.

But they usually lack a semantic modeling layer that connects business meanings with the underlying data structure.

This is where tools like DataForge come in. Instead of embedding indicator logic in BI tools or SQL pipelines, DataForge allows teams to design a centralized semantic model that includes facts, dimensions, and business indicators — effectively serving as the architectural layer described in this article.

In other words, it helps implement the same principles used by companies like Uber, Airbnb, and LinkedIn —