How to model data for BI: star schemas, wide tables, and how much you actually need
Max Musing
Max MusingFounder and CEO of Basedash
· June 23, 2026

Max Musing
Max MusingFounder and CEO of Basedash
· June 23, 2026

Most BI dashboards do not break because of the chart library or the tool. They break because the data underneath them is shaped wrong: the grain is ambiguous, a join fans out and doubles a revenue number, or a “total” depends on which table you started from. Data modeling for BI is the work of shaping tables so that questions have one correct answer and a dashboard is hard to read wrong.
The short version: get the grain of each table right, keep your fact tables and your descriptive tables separate enough to join cleanly, and do only as much modeling as the question requires. A two-person startup querying a production database does not need the same model as a 200-person company with a warehouse. This guide explains the patterns (star schema, one big table, wide tables), how to choose between them, and a practical model for how much modeling you actually need.
It is written for founders, analysts, and data leads who are about to build dashboards and want the underlying tables to be trustworthy, not theoretically perfect.
For BI, data modeling is mostly about three decisions:
Modeling for BI is not the same as modeling for an application. An app database (often third normal form) is optimized for writes and consistency: many small, normalized tables, foreign keys everywhere. Analytics is optimized for reads: fewer, wider tables that are cheap to scan and easy to join for reporting. The two goals pull in opposite directions, which is why you usually model an analytics layer rather than charting directly off raw application tables.
This is the question most guides skip. The honest answer is that modeling effort should scale with data volume, number of sources, and how many people build their own queries. Here is a practical three-tier model.
Tier 1: query the source directly. A small team, one production database, a handful of dashboards. You do not need a warehouse or a star schema. Build a few SQL views to hide joins and name things clearly, and query them. The cost of over-modeling here is real: weeks spent building a dimensional model nobody needs. If you are at this stage, see how to build a SQL dashboard.
Tier 2: a light reporting layer. Multiple sources (production DB, Stripe, a CRM), more dashboards, and a few non-technical people who want to slice data. Centralize into a warehouse and build a thin set of cleaned, well-named tables: a few fact tables, a few shared dimensions (customers, dates, products). You do not need full Kimball formality, but you do need consistent grain and one canonical customer table.
Tier 3: a governed dimensional model. Many sources, many builders, and metrics that must reconcile across teams (finance, board reporting, investor numbers). This is where a real star schema, a date dimension, conformed dimensions, and a semantic layer earn their cost. The signals that you have outgrown the lighter tiers are the same signals that tell you to add a data warehouse.
The mistake teams make in both directions: startups build Tier 3 models for Tier 1 problems, and scaling teams keep charting off raw tables long after they have Tier 3 complexity. Match the model to the stage.
Two patterns dominate analytics modeling. A star schema keeps one central fact table (orders, events, payments) surrounded by dimension tables (customers, products, dates) joined by keys. One big table (OBT, sometimes called a wide table) pre-joins everything into a single denormalized table where each row already carries its descriptive columns.
Neither is universally correct. They trade flexibility against simplicity.
| Attribute | Star schema | One big table (OBT) |
|---|---|---|
| Structure | Fact table + separate dimension tables | Single pre-joined wide table |
| Storage | Lower (no repeated attributes) | Higher (attributes repeat per row) |
| Query writing | Requires joins | No joins, simple GROUP BY |
| Non-technical friendliness | Lower (must understand joins) | Higher (everything in one place) |
| Reuse across dashboards | High (dimensions shared) | Low (table is purpose-built) |
| Risk of fan-out errors | Higher if joins are wrong | Lower (joins done once, correctly) |
| Schema changes | Localized to one dimension | Requires rebuilding the wide table |
| Best fit | Many metrics sharing dimensions, governed reporting | A specific dashboard or self-serve audience |
A reasonable default for most teams: model a small star schema as the source of truth, then build one or two wide tables on top of it for the dashboards non-technical people touch. The star keeps things correct and reusable; the wide table makes self-serve querying safe because the joins are already done. Columnar warehouses (Snowflake, BigQuery, Redshift, ClickHouse) make wide tables cheap to scan, which is why OBT has become more popular than it was in the on-prem era.
Grain is the definition of what one row means. Decide it explicitly for every fact table, write it down, and never mix grains in the same table.
The classic failure looks like this. You have an orders table at one-row-per-order, and an order_items table at one-row-per-line. You join them to get product names, then sum order_total. Because each order has several line items, the join repeats the order row once per item, and your revenue is now inflated by the number of items per order. This is “fan-out,” and it is the most common source of wrong numbers on dashboards.
Rules that prevent most grain bugs:
fct_orders is one row per order; fct_order_items is one row per line.If you only fix one thing about your modeling, make every fact table’s grain unambiguous.
A dedicated date table (one row per calendar day, with columns for week, month, quarter, fiscal period, weekend flag, holiday flag) is the highest-leverage piece of modeling for almost any BI setup. It lets every dashboard agree on what “last quarter” means, supports fiscal calendars that do not match the Gregorian one, and makes period-over-period comparisons trivial. It is cheap to build once and reused by every fact table. Most teams that struggle with inconsistent date filtering never built one.
The same logical model can be implemented in several places, and the right choice depends on your stack:
A good rule: keep the shape (joins, grain, cleaning) upstream in views or dbt, and keep only presentation-level calculations in the BI tool. This is the layering described in the modern BI stack for lean teams.
Dimensions change: a customer moves to a new plan, a sales rep changes territory, a product gets renamed. How you handle that history matters for accuracy.
Most early-stage teams over-build here, adding Type 2 history to dimensions nobody analyzes historically. Start with Type 1 everywhere, and add Type 2 only to the specific attributes where “what was true then” changes a reported number. Slowly changing dimensions are a real technique, not a default setting.
Before you build dashboards on a new model, confirm:
Basedash is built for teams at Tier 1 and Tier 2: it connects directly to a production database or warehouse, so you can build dashboards on SQL views and well-named tables without standing up a full dimensional model first. Its AI can write and explain queries against your schema, which is useful when you are still shaping a reporting layer and want non-technical teammates to ask follow-up questions safely. For teams that have grown into a governed dimensional model in dbt or a warehouse, Basedash queries that model directly rather than asking you to redefine it. The point is to model only as much as your stage requires, and let the tool meet the model where it is.
Do I need a star schema to build dashboards? No. A star schema pays off when many metrics share dimensions and numbers must reconcile across teams. For a single database and a handful of dashboards, clean SQL views are enough. Add a star schema when reuse and governance justify it.
Star schema or one big table for analytics? Use a star schema as the reusable source of truth, and build wide one-big-tables on top for the dashboards non-technical people query. Columnar warehouses make wide tables cheap, so OBT is a reasonable default for purpose-built, self-serve dashboards.
What is the grain of a fact table? The grain is what one row represents, for example one order, one order line, or one customer per day. Defining it explicitly and summing measures only at that grain prevents the fan-out joins that inflate totals.
Should I model data in the warehouse or in the BI tool? Keep joins, grain, and cleaning upstream in SQL views or dbt so they are reusable and testable. Keep only last-mile presentation calculations in the BI tool. Core business logic in a dashboard cannot be reused and tends to drift.
What is a slowly changing dimension, and do I need one? It is a way to track how a descriptive attribute changes over time. Use Type 1 (overwrite) by default, and apply Type 2 (keep history) only to the specific attributes where a reported number depends on past values, such as plan tier in historical revenue.
How do I avoid double-counting revenue in a dashboard? Double-counting usually comes from joining a coarser table (orders) to a finer one (order lines) and then summing at the wrong grain. Aggregate the finer table up to the grain you are summing at first, or sum each measure only in the table where it naturally lives.
Written by

Founder and CEO of Basedash
Max Musing is the founder and CEO of Basedash, an AI-native business intelligence platform designed to help teams explore analytics and build dashboards without writing SQL. His work focuses on applying large language models to structured data systems, improving query reliability, and building governed analytics workflows for production environments.
Basedash lets you build charts, dashboards, and reports in seconds using all your data.