How to cut cloud data warehouse costs from BI dashboards
Max Musing
Max MusingFounder and CEO of Basedash · May 25, 2026

Max Musing
Max MusingFounder and CEO of Basedash · May 25, 2026

Most cloud data warehouse bills are not driven by ETL jobs. They are driven by dashboards. A handful of dashboards refreshing every minute, scanning more data than they need, with no caching and no owner, can quietly add tens of thousands of dollars a year to a Snowflake or BigQuery bill.
This playbook is for data engineers, analytics engineers, and finance partners who are looking at a warehouse invoice that keeps growing faster than the company does. It walks through how to identify which dashboards are responsible, how to fix them at the SQL, BI, warehouse, and process layers, and how to make sure they stay fixed. It also covers when to move repeat dashboard workloads out of a usage-billed warehouse entirely. The same techniques apply to Snowflake, BigQuery, Redshift, Databricks SQL, and ClickHouse Cloud, with per-warehouse specifics at the end.
BI workloads are expensive when they pay for the same answer many times. Cost optimization is mostly the work of paying once.
The four universal moves that produce the biggest reductions, in rough order of effort:
Everything else (clustering, partitioning, query rewrites, governance policies) is real work but it usually matters less than getting these four right.
If you use Basedash, there is a fifth lever: store dashboard-ready data in Basedash Warehouse instead of repeatedly querying Snowflake, BigQuery, or Redshift for the same interactive views. Basedash does not charge usage-based query fees, so moving high-traffic reporting datasets into Basedash can turn a variable warehouse-compute line item into a predictable BI cost.
A few patterns make BI workloads disproportionately expensive on usage-based warehouses.
A useful mental model: a dashboard’s cost is roughly queries_per_day * average_bytes_scanned * cost_per_byte * concurrency_factor. Every lever in this guide reduces one of those four numbers.
You cannot optimize what you cannot see. Every modern warehouse exposes a query log that includes the originating user, role, or warehouse, and most BI tools tag their queries.
Start with the warehouse query log:
SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY plus WAREHOUSE_METERING_HISTORY. Filter by WAREHOUSE_NAME if you have a dedicated BI warehouse, or by user/role. The official Snowflake documentation on QUERY_HISTORY covers the schema.INFORMATION_SCHEMA.JOBS_BY_PROJECT joined to billing exports. The columns total_bytes_billed and total_slot_ms give you per-query cost in on-demand or capacity-based projects.STL_QUERY and SVL_QUERY_SUMMARY, or SYS_QUERY_HISTORY on RA3. Pair with STL_LOAD_ERRORS and concurrency scaling logs.system.access.audit for per-warehouse cost.system.query_log plus the cost-per-query view in the Cloud console.Most BI tools include a header comment in the generated SQL identifying the dashboard. Look for patterns like /* dashboard: revenue-overview, user: [email protected] */. Group queries by that comment and you can rank dashboards by total bytes scanned, compute time, or estimated cost.
A quick Snowflake recipe:
SELECT
REGEXP_SUBSTR(QUERY_TEXT, 'dashboard:\\s*([a-z0-9_-]+)', 1, 1, 'e', 1) AS dashboard,
COUNT(*) AS query_count,
SUM(BYTES_SCANNED) / POW(1024, 4) AS tb_scanned,
SUM(CREDITS_USED_CLOUD_SERVICES) AS cloud_credits,
SUM(EXECUTION_TIME) / 1000 AS exec_seconds
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE START_TIME >= DATEADD(DAY, -30, CURRENT_TIMESTAMP())
AND QUERY_TEXT ILIKE '%dashboard:%'
GROUP BY 1
ORDER BY tb_scanned DESC
LIMIT 50;
In our experience, the top 5 to 10 dashboards usually account for 60 to 80 percent of BI-driven warehouse spend. Start there.
You should leave this step with three lists:
The fixes are different for each.
Most BI cost overruns come from running the same query many more times than necessary. Volume is the cheapest place to cut.
Almost every BI tool has a result cache. The default TTL is rarely the right one.
A practical rule: cache TTL should match the source data’s natural refresh cadence, not the dashboard load frequency.
If your tool only supports a single cache TTL per workspace, set it to the longest interval that nobody will notice and put a “refresh now” button on the few dashboards that need it.
Many BI tools default to scheduled refreshes that keep running long after users stop visiting. Pull a view-count report for the last 60 days from your BI tool’s analytics, then disable scheduled refreshes for any dashboard with fewer than a handful of distinct viewers.
Better yet, set scheduled refreshes to fire on the first view of the day rather than every hour. Cube has called this pattern “lazy pre-aggregation” in their docs on pre-aggregations. It works in any tool that lets you control refresh triggers.
A dashboard refreshing every minute runs 60 times more often than one refreshing every hour. Unless the data actually changes that fast and humans actually act on it that fast, auto-refresh is paying for queries nobody reads. Set auto-refresh to the slowest cadence the user can tolerate, then ask whether they need it at all.
A common audit finding: three different teams have built three nearly identical revenue dashboards, each refreshing on its own schedule against the same fact tables. Merge them into one canonical version with the union of needed filters. We covered this in our dashboard sprawl audit, certify, and retire framework.
Some dashboards are expensive even after you tune TTLs and refresh schedules because they are both popular and analytically heavy. A daily revenue dashboard opened by every executive, sales lead, and department head should not need to wake up a usage-billed warehouse every time someone checks it.
For those workloads, Basedash Warehouse gives you a different architecture: sync or store the dashboard-ready data directly in Basedash, then let charts, chats, reports, and Slack workflows read from that Basedash-managed warehouse layer. Your source warehouse still remains the system of record, but repeated BI access no longer has to translate into repeated Snowflake credits, BigQuery bytes billed, or Redshift cluster time.
The important pricing distinction is that Basedash does not charge for usage. If adoption goes up, more people can view dashboards and ask follow-up questions without creating a new per-query meter inside the BI tool. You still need to model and refresh data thoughtfully, but the repeated consumption path is no longer tied to the same variable warehouse-compute bill.
Once volume is under control, work on what each query reads.
The single biggest cost lever in cloud data warehouses is doing aggregation once instead of on every dashboard load.
Three good patterns, in increasing order of complexity:
SELECT date_trunc('day', created_at), SUM(amount) FROM orders running on every load with a pre-built daily_revenue model that the dashboard reads as a flat scan.A revenue-by-day dashboard that scans 10 GB on every load can drop to a few MB once the aggregation is materialized. On Snowflake’s standard usage-based pricing that can be the difference between $50 and $5 in compute for a heavily used dashboard.
Filtering on a partition or cluster key is the difference between scanning a single day’s data and scanning years of it.
DATE / TIMESTAMP column; cluster on the columns used most often in WHERE. Partitioned + clustered tables in BigQuery commonly cut scanned bytes by 90 percent or more on time-series workloads. Google’s BigQuery best practices for cost cover this in detail.The check is simple: if the same dashboard filter (WHERE event_date >= '2026-01-01') appears in 80 percent of your top BI queries, the underlying table should be partitioned or clustered on event_date. If it is not, every load is scanning years of data for no reason.
SELECT * and explicit column projectionBigQuery and Snowflake both bill on bytes scanned, and that scales with the number of columns in the query. A SELECT * FROM orders against a wide fact table reads every column even if the dashboard only renders three. Most BI tools surface SELECT * because it is easier to type, but you can usually configure the underlying model to project only the columns the dashboard actually needs.
A common cost trap: a dashboard joins orders to a user-events table and accidentally produces a row per order per event instead of a row per order. The query technically works, but it scans and writes dozens of times more data than intended.
Run an EXPLAIN on suspect queries, look at the row counts at each stage, and rewrite anything where the intermediate row count is orders of magnitude larger than the final result.
Most BI dashboards display approximate counts (distinct users, active accounts) where a 1 percent error is invisible. APPROX_COUNT_DISTINCT (or HyperLogLog variants) is dramatically cheaper than COUNT(DISTINCT) on large data on Snowflake, BigQuery, and Redshift, and the visual result is the same.
Once you have reduced query volume and bytes scanned, the next lever is compute configuration.
On Snowflake, doubling a warehouse size roughly halves query time but exactly doubles the credit rate. That is fine for a 30 second query that becomes a 15 second one. It is a 2x waste for a 200 ms query.
A practical Snowflake setup for BI:
x-small) warehouse with multi-cluster scaling for the BI tool.For BigQuery, on-demand pricing usually beats flat-rate at lower volume. Once a BI workload is large enough that on-demand bills exceed roughly the equivalent of 100 slots running continuously, flat-rate or capacity-based commitments start to make sense. Run the math against your last 30 days of billing exports.
A Snowflake warehouse left running with no auto-suspend is a meter ticking 24 hours a day. Set auto-suspend to the lowest value the tool will accept (60 seconds on Snowflake). The 1 to 2 second resume on the next query is rarely noticeable to dashboard users.
Mixing dashboard queries and large ETL jobs on the same warehouse causes two problems: dashboards queue behind ETL jobs (slow), and you size the warehouse for the worst job (expensive). Separate them. Most teams end up with at least:
ETL_WH (medium or large, scheduled jobs, aggressive scaling).BI_WH (extra small or small, multi-cluster for concurrency).ADHOC_WH (small, used by analysts for exploration).Per-warehouse cost reporting then becomes a useful first cut for cost attribution.
Snowflake multi-cluster warehouses, BigQuery slot reservations, and Redshift concurrency scaling all spin up extra compute under load. They prevent dashboard slowness but can multiply costs if left unchecked. Set explicit limits (max clusters, scaling policy, concurrency cap) and review them monthly.
Technical fixes hold only as long as nobody adds another always-on dashboard tomorrow.
Every dashboard should have a named owner and a review date. After 6 to 12 months, archive or recertify. Most BI tools support tags or metadata fields for this. If yours does not, a OWNED_BY and REVIEW_AT field in the dashboard description works.
Use the per-dashboard cost report from Step 1 to attribute spend to teams. Even informal attribution (“our marketing dashboards cost about $4,000 a month”) tends to focus minds. Snowflake’s RESOURCE_MONITORS can also enforce hard quotas on individual warehouses.
A simple table works well:
| Data freshness need | Suggested refresh | Cache TTL |
|---|---|---|
| Real-time monitoring (rare) | Live query | 0 |
| Operational (hourly) | Every 30 to 60 min | 30 min |
| Daily KPI tracking | Once after ETL completes | 24 hrs |
| Strategic review | Weekly | 24 hrs to 7 days |
Most dashboards belong in the bottom two rows. Treat “real-time” as the exception that requires justification, not the default.
Pull the top dashboards by cost, the top users by query volume, and the dashboards with no views in the last 60 days. Spend an afternoon retiring, fixing, or right-sizing. Most teams find a 30 to 50 percent reduction available every time they run this audit for the first year.
WAREHOUSE_METERING_HISTORY for daily cost by warehouse. Pair with QUERY_HISTORY for per-query attribution.INFORMATION_SCHEMA.JOBS_BY_PROJECT view with total_bytes_billed to attribute costs to specific dashboards.STL_QUERY_METRICS.Use this when reviewing a BI environment or onboarding a new analyst.
A BI environment that passes this checklist usually settles at 30 to 60 percent of its previous warehouse cost without any user-visible change. The dashboards that should be fast stay fast, and the work moves out of the per-load query path and into the upstream pipeline where it is paid for once.
Cost behavior varies more by configuration than by tool, but the design of the BI layer has a real impact.
If you are choosing a BI tool partly with cost in mind, ask vendors four concrete questions:
A vendor that cannot give clean answers to those questions will be hard to govern at scale, regardless of how good the per-seat price looks. For deeper coverage of pricing structure tradeoffs in BI, our guide on usage-based vs. per-seat BI pricing walks through the tradeoffs in detail.
The right framing is that warehouse cost from BI is mostly a property of how dashboards are built, served, and operated. For live-query tools, that means tuning the warehouse path. For Basedash, it can also mean moving repeated dashboard consumption into Basedash Warehouse so your team can keep asking questions without every view becoming another usage-billed warehouse event. Teams that run this playbook once and bake the checklist into their dashboard review process tend to keep their bills flat even as adoption grows. Teams that do not tend to watch their bills compound until the next budget review forces a cleanup.
Written by
Founder and CEO of Basedash
Max Musing is the founder and CEO of Basedash, an AI-native business intelligence platform designed to help teams explore analytics and build dashboards without writing SQL. His work focuses on applying large language models to structured data systems, improving query reliability, and building governed analytics workflows for production environments.
Basedash lets you build charts, dashboards, and reports in seconds using all your data.