What is a semantic layer in BI?

Every company eventually runs into the same problem: two people ask the same question about the business and get different numbers. Marketing says revenue was $4.2M last quarter. Finance says $3.8M. Both are technically correct — they’re just using different definitions, different filters, and different data sources.

A semantic layer is the fix. It’s a logical abstraction that sits between your raw data and the tools people use to query it, defining what “revenue” (or any other metric) actually means in one canonical place. Every dashboard, report, and AI-generated answer pulls from the same definitions, so the numbers always match.

This guide covers what a semantic layer actually is, why it matters more now than ever (especially with AI-powered BI), how it compares to alternatives like data marts and LookML, and how to evaluate whether your team needs one.

What a semantic layer does

A semantic layer translates business concepts into database logic. When someone asks “what was revenue last quarter?”, there’s a chain of decisions hidden in that question: which table has revenue data, whether refunds are subtracted, whether the number is recognized or booked, which date field defines “last quarter,” and whether certain transaction types are excluded.

Without a semantic layer, those decisions get made independently by whoever builds each dashboard or writes each query. The marketing dashboard might exclude refunds. The finance report might include them. Neither is wrong — they just disagree, and nobody realizes it until the board meeting.

A semantic layer centralizes these decisions. You define “revenue” once — with its exact SQL logic, filters, and business rules — and every tool that queries the data uses that definition. Change it once, and every downstream report updates automatically.

The core components

Metrics: Named calculations with defined logic. “Monthly recurring revenue” isn’t just SUM(amount) — it’s a specific aggregation with specific filters, applied to a specific time grain, from a specific source. The semantic layer captures all of this.

Dimensions: The attributes you use to slice metrics. “Region,” “customer segment,” “product line” — these seem straightforward, but even dimensions need governance. Is “region” the billing address or the shipping address? Is “enterprise” defined by employee count or contract value? The semantic layer makes these decisions explicit.

Relationships: How tables connect to each other. When someone asks for “revenue by customer segment,” the semantic layer knows which join path gets from the transactions table to the customer attributes table. It prevents the ambiguous joins that produce duplicated or missing rows.

Access controls: Who can see what. Row-level security policies, column-level restrictions, and team-based permissions can be defined in the semantic layer so that access governance is applied consistently regardless of which tool or interface is used to query the data.

Why semantic layers matter more now

The AI problem

The rise of AI-powered BI tools has made the semantic layer more important, not less. When a user asks an AI assistant “how is churn trending?”, the LLM needs to translate that into a correct SQL query. Without a semantic layer, the AI has to guess which table holds churn data, how churn is calculated, and what time period to use.

AI models can generate syntactically correct SQL, but they can’t infer business logic from a raw schema. A column called amount could be revenue, cost, or profit. A table called events could hold product analytics or calendar entries. The semantic layer gives the AI the context it needs to write queries that are not just valid SQL, but actually correct for your business.

This is why the best AI-native BI platforms (Basedash, for example) let data teams define business terms, metric calculations, and table relationships centrally. When someone asks a question in natural language, the AI translates it using those governed definitions rather than guessing from column names.

Self-service at scale

As more non-technical users get access to BI tools, the risk of inconsistent metrics increases. An analyst who writes SQL knows (or should know) the business rules behind each metric. A product manager using a drag-and-drop dashboard builder might not. A semantic layer protects against this by ensuring that everyone, regardless of technical skill level, gets the same governed numbers.

Tool proliferation

Most companies don’t use a single BI tool. There’s a dashboarding platform, a SQL editor, a notebook environment, embedded analytics in the product, a Slack app, and now an AI assistant. Without a semantic layer, each tool implements its own version of each metric. The semantic layer provides a single source of truth that all tools can reference.

Semantic layer vs. data marts

Data marts were the original solution to the “everyone needs consistent data” problem. You pre-aggregate data into purpose-built tables (a marketing mart, a finance mart, a product mart) with pre-computed metrics and clean dimensions. Analysts query the marts instead of the raw tables.

This works, but it’s rigid. Every new question that doesn’t fit the pre-built mart requires a data engineering ticket. Want to see revenue broken down by a dimension that isn’t in the mart? Wait for the next sprint. Want to combine marketing and finance data in a way nobody anticipated? Good luck.

A semantic layer is more flexible because it defines metrics logically rather than physically. The definitions exist as code or configuration, and the underlying queries are generated dynamically when someone asks a question. You don’t pre-compute every possible combination — you define the building blocks and let the query engine assemble them on demand.

	Data marts	Semantic layer
How metrics are defined	Pre-computed in physical tables	Defined as logic, computed at query time
Flexibility	Limited to pre-built aggregations	Any combination of metrics and dimensions
Time to new metric	Requires data engineering work	Configuration or code change
Storage cost	Higher (duplicated, pre-aggregated data)	Lower (queries run against source data)
Query performance	Fast (pre-computed)	Depends on query engine and caching
Maintenance	ETL pipelines for each mart	Centralized definitions

In practice, many teams use both: data marts for high-volume, performance-critical queries, and a semantic layer for flexibility and governance on top of the warehouse.

Semantic layer vs. LookML

LookML is Looker’s proprietary modeling language, and it’s effectively a semantic layer — one of the earliest and most influential implementations. You define dimensions, measures, and relationships in .lkml files, and Looker generates SQL based on those definitions.

The distinction matters because LookML locks you into the Looker ecosystem. Your metric definitions live in LookML syntax, are version-controlled in a LookML project, and are only usable by Looker. If you switch BI tools (or want to use multiple tools), those definitions don’t come with you.

Modern semantic layer tools aim to be tool-agnostic. Platforms like dbt’s semantic layer (using MetricFlow), Cube, and AtScale define metrics in a way that multiple downstream tools can consume. The idea is that your metric definitions should be infrastructure, not a feature of one particular BI vendor.

That said, LookML is battle-tested and deeply capable. If your organization is committed to Looker as its primary BI platform, LookML’s semantic layer is excellent. The risk is lock-in, not quality.

Built-in vs. warehouse-modeled semantic layers

There’s an emerging debate about where the semantic layer should live: inside the BI tool, or in the data warehouse.

Built-in semantic layers

Some BI platforms include their own semantic layer. You define metrics, dimensions, and relationships within the platform’s configuration or UI, and the platform uses those definitions when generating queries.

Basedash, for example, lets data teams save reusable SQL definitions — a metric like MRR or activation rate written once as SQL, with a name, reference name, and description — directly in the platform. Queries reference them with Liquid syntax like {{ definition("mrr") }}, and these definitions govern every AI-generated query and dashboard, so when any user asks a question in natural language, the answer is grounded in the team’s canonical metric definitions. This approach has the advantage of being tightly integrated with the AI query engine — the semantic context is available at every step of query generation, not just at the modeling layer.

Looker (LookML), ThoughtSpot (TML), and Holistics (AML) similarly have built-in semantic layers with varying degrees of sophistication.

Advantages: Tight integration with the BI tool’s features (especially AI). Lower setup complexity. Faster time to value for teams that standardize on one platform.

Disadvantages: Vendor lock-in. If you use multiple BI tools, the semantic layer doesn’t extend to all of them.

Warehouse-modeled semantic layers

The alternative is to define your semantic layer in or adjacent to your data warehouse, using tools like dbt (with MetricFlow), Cube, or AtScale. These tools sit between the warehouse and any number of downstream BI platforms, exposing governed metrics via APIs that any tool can query.

Advantages: Tool-agnostic. You can swap or add BI tools without redefining metrics. Works well in complex data stacks with multiple consumers.

Disadvantages: Additional infrastructure to deploy and maintain. Can add latency if the semantic layer becomes a bottleneck. Requires a more mature data team to set up and govern.

Which approach to choose

For most teams, the right answer depends on your data maturity and tool complexity:

Single BI tool, small team: A built-in semantic layer is simpler and faster. Define your metrics in the BI platform and move on.
Multiple BI tools, mature data team: A warehouse-modeled semantic layer provides consistency across all consumers and avoids duplicating definitions.
AI-first analytics: If you’re using an AI-native BI tool where natural language queries are the primary interface, a tightly integrated built-in semantic layer often produces better AI accuracy because the semantic context is deeply woven into the query generation pipeline.

How semantic layers work with AI-powered BI

The semantic layer is arguably the most important component for making AI-powered analytics trustworthy. Here’s why.

From question to query

When a user asks an AI-powered BI tool “what was our churn rate last month, broken down by plan type?”, the system needs to:

Understand intent. Map “churn rate” to a defined metric, “last month” to a date range, and “plan type” to a dimension.
Generate SQL. Write a query that correctly calculates churn using the governed definition, applies the date filter, and groups by plan type.
Execute and visualize. Run the query, format the results, and pick an appropriate chart type.

Without a semantic layer, step 1 is unreliable. The AI might pick the wrong table, use the wrong calculation, or misinterpret a column name. With a semantic layer, the AI has a map: “churn rate” resolves to a specific metric with a specific formula. “Plan type” resolves to a specific dimension on a specific table. The ambiguity is removed before a single line of SQL is written.

Guardrails and trust

One of the biggest barriers to AI adoption in analytics is trust. Decision-makers need to know that the AI-generated answer is correct. A semantic layer provides auditability: every AI-generated query can be traced back to the governed metric definitions. If the number looks wrong, you can inspect the metric definition and the generated SQL to understand exactly what was calculated.

This is fundamentally different from an AI that generates SQL by guessing from a raw schema. With a semantic layer, the AI operates within defined boundaries. Without one, it’s improvising.

The feedback loop

The best implementations create a virtuous cycle: data teams define metrics in the semantic layer, users ask questions via AI, and the AI’s queries are constrained to governed definitions. When users ask questions that the semantic layer can’t answer (because a metric isn’t defined, or a dimension is missing), that feedback signals to the data team what to add next. The semantic layer grows to match how the organization actually uses its data.

Implementing a semantic layer: practical considerations

Start with your most-argued-about metrics

Don’t try to model your entire data warehouse into a semantic layer on day one. Start with the metrics that cause the most confusion or disagreement. Revenue, churn, active users, conversion rate — whatever metrics different teams define differently. Define those first, get alignment, and expand from there.

Treat definitions as code

Metric definitions should be version-controlled, peer-reviewed, and tested. Whether you’re using dbt, LookML, or a BI platform’s built-in configuration, treat changes to metric definitions with the same rigor as changes to production code. A bad metric definition can be more damaging than a bad deploy — it silently produces wrong numbers that inform wrong decisions.

Invest in naming conventions

A semantic layer is only useful if people can find what they’re looking for. Establish clear naming conventions for metrics and dimensions. “Revenue” is ambiguous. “Net revenue (recognized, excl. refunds)” is specific. The semantic layer should make it easy to browse available metrics and understand what each one means without reading the SQL.

Plan for governance

Who can create new metrics? Who can modify existing ones? Who approves changes? Without governance, a semantic layer devolves into the same inconsistency problem it was designed to solve, just in a different place. Define an approval workflow for metric changes, especially for metrics that feed executive dashboards or external reporting.

Monitor usage

Track which metrics are queried most, which are never used, and which generate the most follow-up questions. This data tells you where to invest: heavily-used metrics deserve the most scrutiny and documentation, while unused metrics might be poorly named, redundant, or no longer relevant.

How Basedash approaches the semantic layer

Basedash takes the built-in semantic layer approach, tightly integrating metric governance with its AI-native query engine. Its semantic layer is powered by a feature called definitions: reusable SQL queries scoped to a data source, each with a name, a SQL-safe reference name, a description, and version history. A team writes the SQL for MRR, activation rate, cohort retention, or qualified pipeline once, then references it in any other query with Liquid syntax like {{ definition("mrr") }}. Basedash expands the definition inline when the query runs.

These definitions serve as the foundation for every AI-generated query. Basedash gives its AI agents the catalog of definitions for the data sources they are working with, so the AI can inspect a definition, reference it inside generated SQL, and even create or update definitions when an admin asks for a reusable metric. When a user asks a question in natural language, Basedash resolves business terms to those governed definitions before generating SQL. The result is that non-technical users can ask complex business questions and get answers that are consistent with how the data team has defined those metrics — and every change is captured in version history for auditing.

This approach works particularly well for teams that want governed, trustworthy analytics without building and maintaining a separate semantic layer infrastructure. The metric definitions, the AI engine, and the visualization layer are all part of the same system, which reduces the integration complexity and ensures that governance is applied end-to-end across chat, charts, dashboards, insights, and automations.

For teams using Basedash with a data warehouse like Snowflake, BigQuery, or PostgreSQL, the definitions connect directly to the warehouse tables. There’s no ETL step or data duplication. Queries run against the live data, governed by the semantic definitions.

Basedash also supports 750+ data source connectors through its built-in Fivetran integration, so teams that don’t yet have a centralized warehouse can pull from SaaS tools (Stripe, HubSpot, Salesforce, and others) into a managed warehouse with governed metric definitions from day one. In practice that means one product covers the full data stack — storage, syncing, the semantic layer, and the BI and reporting on top of it.

When you don’t need a semantic layer

Not every team needs a formal semantic layer. If your organization is small (under ~20 people using data), has a single data person who maintains all dashboards, and uses a single BI tool, the overhead of a semantic layer may not be justified. The data person’s institutional knowledge effectively serves as the semantic layer.

But this approach breaks down as soon as any of these conditions change: you hire a second data person, add a second BI tool, start embedding analytics in your product, or adopt an AI-powered analytics platform. At that point, the lack of a semantic layer becomes a source of inconsistency, and retrofitting one is harder than building it from the start.

If you’re evaluating BI tools today, prioritize platforms that either include a built-in semantic layer or integrate cleanly with an external one. The cost of inconsistent metrics compounds over time, and a semantic layer is the most effective way to prevent it.

What Is a Semantic Layer? The Complete Guide for Modern BI Teams