How to give AI agents safe access to your business data
Max Musing
Max Musing Founder and CEO of Basedash · May 12, 2026

Max Musing
Max Musing Founder and CEO of Basedash · May 12, 2026

The safest way to give an AI agent access to your business data is to put a governed layer between the agent and the database. That usually means a read-only role, masked columns, optional row-level security for sensitive or customer-scoped data, and an MCP server, BI tool, or semantic layer that enforces policies and writes every query to an audit log. Direct database credentials, broad service accounts, and “just give the agent admin and we’ll watch it” are the patterns that fail in production.
This guide is for engineers, data leads, and founders who are wiring up ChatGPT, Claude, Cursor agents, Slack bots, or custom LLM applications to real business data and want a practical setup that does not leak PII, blow up the warehouse bill, or quietly hand a model the keys to production. It walks through the access patterns that actually work, the controls that matter, a decision matrix for picking one, and the mistakes that show up most often once an agent is loose in your stack.
A human analyst with database access is governed by job context, training, and, if needed, a manager. An LLM agent is governed by a prompt and whatever tools it has. Three things change once you introduce an agent.
The query surface area expands. A human asks five questions an hour. An agent driven by another agent or a chat UI can fire hundreds of queries in minutes, including queries no human would write (full table scans on events, joins across every schema, repeated retries after a malformed SQL response).
Trust boundaries get fuzzy. When a customer-facing AI feature queries your database to answer a support question, your authorization model is no longer “is this employee allowed to see this row.” It is “is this end user, on behalf of whom an agent is asking, allowed to see this row.” Most data stacks were not designed for that.
Prompt injection is a real attack surface. If the agent reads anything an outsider can write to, like a support ticket, a CRM note, a webhook payload, or a scraped page, a hostile string can rewrite the agent’s instructions and convince it to run queries you did not authorize. (OWASP lists prompt injection as the top risk for LLM applications, and Simon Willison has written about it extensively since 2022.) The fix is not better prompting. The fix is constraints the agent cannot remove, enforced by the layer underneath it.
These three changes, query volume, fuzzy trust boundaries, and prompt injection, are why “just hand the agent a connection string” breaks down quickly.
There are five patterns for connecting an AI agent to business data. Most production setups use a combination.
Create a dedicated database role, grant it SELECT only, and point the agent at a curated schema of views that already have sensitive columns dropped or masked. The agent never sees the raw tables.
This is the simplest pattern that survives a security review. It works in PostgreSQL, MySQL, Snowflake, BigQuery, Redshift, and ClickHouse with minor syntactic differences.
A typical setup in PostgreSQL looks like this:
CREATE ROLE ai_agent LOGIN PASSWORD '...';
GRANT USAGE ON SCHEMA agent_views TO ai_agent;
GRANT SELECT ON ALL TABLES IN SCHEMA agent_views TO ai_agent;
ALTER DEFAULT PRIVILEGES IN SCHEMA agent_views
GRANT SELECT ON TABLES TO ai_agent;
CREATE VIEW agent_views.customers AS
SELECT
id,
created_at,
plan,
region,
-- email and billing details intentionally omitted
CASE WHEN deleted_at IS NULL THEN 'active' ELSE 'churned' END AS status
FROM app.customers;
Use this when the agent only needs to read aggregates and the curated surface is small. Avoid this when the agent needs to query the long tail of operational tables, because the curated schema becomes a maintenance burden.
The Model Context Protocol (MCP) is a standard published by Anthropic in 2024 for connecting LLM clients to external tools and data sources. (See the MCP spec.) An MCP server exposes a typed set of tools, such as query_orders, get_customer_summary, and list_metrics, and the agent can only call those tools. It cannot run arbitrary SQL unless you expose a tool that allows it.
This is now the default integration path for Claude, ChatGPT, Cursor, and several other clients. A growing number of BI and data tools ship MCP servers, and Basedash includes one as a built-in feature so an LLM client can query governed metrics, dashboards, and tables without holding warehouse credentials directly. Because Basedash already understands users, roles, dashboards, saved questions, and connected data sources, it can act as the security layer the agent talks to instead of exposing your warehouse directly.
Use an MCP server when you want a typed, narrow contract between the agent and the data, especially when multiple agents (Claude desktop, an internal Slack bot, a Cursor extension) need the same set of capabilities. For sensitive data, pair the MCP server with a permissions layer, masked data, and RLS where it fits. Basedash gives you that governed layer out of the box and can optionally sit on top of database-level RLS when you want both layers.
For more on which BI and data tools support MCP, see BI tools that let you use an MCP server.
If you already have a BI tool with permissions, dashboards, and an API, you can route the agent’s questions through it. The agent calls the BI tool, the BI tool runs the query against the warehouse using its existing connection, and the BI tool enforces user permissions and returns the result.
This pattern works well when:
It struggles when the agent needs ad-hoc exploration that does not match any saved query, because most BI APIs are not designed for unbounded SQL.
A semantic layer (Cube, dbt Semantic Layer, Looker’s modeling layer, an internal one) exposes business metrics as a typed API: revenue by region for last 30 days, not raw SQL. The agent picks dimensions and measures from a defined set; the semantic layer compiles that to SQL and runs it.
This is the cleanest way to keep an agent from inventing its own definition of “revenue.” It also tends to be the most expensive to build and maintain, so it pays off mostly when you have multiple agents and humans depending on the same metrics. We covered the broader tradeoffs in Where to define business metrics: SQL views, dbt, semantic layers, or BI calculations.
For higher-stakes use cases, like a customer support agent that needs to look up an account, a billing assistant, or a churn-risk classifier, the right pattern is often a small set of hand-written, parameterized queries exposed as tools. The agent calls get_account(account_id) and gets back a strict, typed payload.
This is the most restrictive pattern, and the most predictable. It is not “AI agents querying your data” so much as “AI agents calling your existing API, which happens to query data.” Use it for any workflow that touches customer-facing surfaces or regulated data.
| Pattern | Flexibility for the agent | Setup cost | Best for | Main risk |
|---|---|---|---|---|
| Read-only role + masked views | Medium | Low | Internal exploration over a small, curated schema | Curated views drift from real schema; agent finds gaps |
| MCP server | Medium-high | Medium | Multi-client setups (Claude, Cursor, Slack bot) needing parity | False sense of safety if there is no permissions layer behind it |
| BI tool API | Low-medium | Low if BI exists | Agents that should see the same numbers as humans | Limited ad-hoc query power; rate limits on BI APIs |
| Semantic layer API | Low | High | Multiple agents and humans on shared metric definitions | High build cost; slower to add net-new metrics |
| Prepared queries / custom tool | Lowest | Medium | Customer-facing or regulated workflows | Coverage gaps push users back to less safe channels |
Most teams start with pattern 1 or 2 for internal work, and move to pattern 5 for anything customer-facing.
Whichever pattern you pick, the controls below are what actually keep an agent contained. The pattern is just where you enforce them.
Filter rows automatically based on who (or what) is asking. PostgreSQL has native RLS policies; Snowflake has row access policies; BigQuery uses row-level access policies.
RLS is recommended when row access depends on the agent, team, tenant, or end user. It usually keys on either the agent’s role (an analytics agent only sees aggregated, non-PII tables) or the end user the agent is acting on behalf of (a support agent only sees the customer it was asked about). The second case requires passing a user identifier through every query as a session variable or context. If your agent only sees a curated, non-sensitive dataset through a governed layer like Basedash, RLS may be optional rather than mandatory.
Some columns should never be in an LLM context window, including emails, billing details, hashed passwords, and support ticket bodies that contain PII. Use Snowflake masking policies, BigQuery dynamic data masking, PostgreSQL views that drop or hash sensitive columns, or a BI layer like Basedash that only exposes the fields an agent should use.
A common mistake here is masking columns in views but leaving the raw table accessible to the same role. Always grant the role access to the masked layer, not the source.
If the agent is going to write SQL, restrict what it can write. Most warehouses let you grant SELECT only and revoke everything else, but that still allows expensive scans. A proxy, BI layer, or MCP server can go further by rejecting queries that touch certain schemas, exceed a row estimate, or use disallowed functions.
For agents that should never write, double-enforce: read-only role at the database level, and read-only validation in the proxy layer. Defense in depth means both.
Agents misbehave in cheap, repetitive ways. A retry loop on a malformed query can issue hundreds of multi-gigabyte scans before anyone notices.
At the database level, set:
SET statement_timeout = '30s' in PostgreSQL).STATEMENT_TIMEOUT_IN_SECONDS, RESOURCE_MONITORS).At the agent layer, add tool-call rate limits and a budget per session.
Every query, every tool call, every prompt, and every returned payload should be logged with the agent identity, the human identity (if any) the agent is acting on behalf of, and a request ID that ties them together. Without this, you cannot answer “what did the agent see” after an incident.
Most warehouses have query history (information_schema.query_history in Snowflake, pg_stat_statements in PostgreSQL); the gap is usually the prompt and response side. Capture those at the proxy or MCP layer.
Treat any text the agent reads as potentially adversarial. Practical guardrails:
{id, name}, reject anything else.The point is not to make injection impossible, which it is not today, but to make a successful injection useless because the agent has nothing dangerous to call.
Use this to choose where to start. It is not exhaustive, but it covers the cases most teams hit first.
| Use case | Recommended starting pattern | Notes |
|---|---|---|
| Internal Slack bot answering “what’s MRR this month?” | BI tool API or MCP server in front of governed metrics | Reuses existing definitions; humans and bot see the same numbers. |
| Cursor or Claude desktop querying your warehouse for analysis | MCP server + read-only role + masked views | Keeps credentials out of the IDE; lets multiple clients share the same governed surface. |
| AI feature inside your product that summarizes a user’s data | Prepared queries / custom tool layer | Customer-facing, so every query must scope to the requesting user. Pass user_id explicitly. |
| Internal “ask anything” data assistant | MCP server + semantic layer or Basedash permissions | Agent picks governed metrics and tables; add RLS in the DB when rows vary by user or tenant. |
| Support agent that looks up customer accounts | Prepared queries with strict input validation | Treat as a regulated workflow; log every lookup with the requesting agent and operator. |
| Quick research notebook for a single technical user | Read-only role + masked views | Lowest setup cost; fine if the user is trusted and the schema is small. |
If you find yourself wanting an agent to do all of these, do not build one God agent. Build several scoped agents, each with the smallest tool set that does its job. This is the same principle as the principle of least privilege for human users, but it matters more when the user is a probabilistic text generator.
A short list of the failure modes that show up across most agent rollouts.
Reusing a developer’s personal database credentials. Easy to set up, impossible to audit, and a fired engineer’s leftover access is now powering a customer-facing feature. Always create a dedicated role per integration.
Putting permissions in the prompt. “You are only allowed to query the analytics schema” is a suggestion, not a control. Anything that lives in text the LLM sees can be talked around. Enforce in the database or proxy.
Granting SELECT on public and assuming “it’s just read-only.” Read-only does not mean cheap or safe. A SELECT * on a billion-row events table will both leak data and rack up bills. Pair read-only with statement timeouts, row caps, and column masking.
Logging prompts but not results, or vice versa. You need both to reconstruct an incident. The most useful unit of audit is (prompt, tool calls, results, final answer) for a single agent turn, tied to a request ID.
Trusting embedding-based content filters. Content filters on prompts and outputs are a useful layer but not a perimeter. They miss novel attacks and false-positive on legitimate ones. Use them; do not rely on them alone.
Skipping rate limits because “it’s just one agent.” One agent calling itself in a loop is functionally a botnet. Per-agent and per-session rate limits matter from day one.
Letting the same agent both read untrusted input and write to production. This is the single most dangerous pattern. If an agent reads support tickets, scraped pages, or webhook payloads, it should not also be the one calling delete_account or issue_refund without a human-in-the-loop step.
If you are wiring up your first agent today and want a defensible baseline, this is roughly what to do:
SELECT only on a curated schema of views.This is enough to pass most security reviews and recover from most incidents without rebuilding the integration.
Basedash is built for exactly this middle layer between people, agents, and business data. It connects to PostgreSQL, MySQL, Snowflake, BigQuery, Redshift, ClickHouse, and others, then lets you define the governed surface area an agent is allowed to use: dashboards, saved questions, approved tables, role-based permissions, and masked or hidden columns. Instead of giving Claude, ChatGPT, Cursor, or a Slack bot warehouse credentials, you can point them at the Basedash MCP server and let Basedash enforce what they can query.
That makes Basedash a practical security layer for agent access. It handles database connections, query execution, role-aware access, governed dashboards, and auditability in one place, while still letting you keep database-level controls such as RLS when you want them. For many internal analytics agents, Basedash can be the fastest way to get from “the model can answer business questions” to “the model can answer business questions without bypassing permissions, leaking raw credentials, or inventing its own metric definitions.”
It is not the only way to do this. The patterns in this guide also work with a hand-rolled proxy, a semantic layer, or another BI tool. The point is to put something between the agent and the database that you control, and to enforce permissions in that layer rather than in the prompt. Basedash packages that layer for teams that want agent access without rebuilding BI permissions, MCP tooling, and audit logs from scratch.
Technically yes, and people do, but it skips every control in this guide. ChatGPT (or any client) holding raw credentials means there is no governed permission layer, no useful per-query audit, and no clean way to rotate access without changing the underlying password. Use an MCP server, Basedash, a BI tool, or a thin proxy instead.
Not always, but it is strongly recommended when the same table contains rows different users, teams, or tenants should not all see. The MCP server is a contract because it constrains which tools the agent can call. RLS is a database-level control because it constrains which rows any query can return. Basedash can sit between the MCP client and your database as a security layer, enforcing roles and permissions itself, and it can optionally work alongside database-level RLS when you need that extra boundary.
Assume they will succeed sometimes. Design so that a successful injection is uninteresting: the agent has no destructive tools, no access to data outside the requesting user’s scope, and every action is logged. The defenses that hold up are structural, not textual.
No. Fine-tuning bakes information into the weights and creates a different set of risks (model leakage, IP, deletion under privacy laws). Giving an agent runtime access to data is a separate decision and usually the safer one because you can revoke it.
Pass the end user’s identity all the way down to the data access layer, and use it in RLS policies, Basedash permissions, or parameterized queries. Do not rely on the agent to “remember” whose data it is allowed to see. The pattern is the same as multi-tenant authorization for any other service: the agent is just another caller.
Yes. Read-only patterns are forgiving; write patterns are not. For any agent that mutates state, prefer prepared, parameterized tool calls (pattern 5), require structured confirmations for high-impact actions, and never combine reading untrusted input with writing to production in the same agent without a human-in-the-loop step.
Written by
Founder and CEO of Basedash
Max Musing is the founder and CEO of Basedash, an AI-native business intelligence platform designed to help teams explore analytics and build dashboards without writing SQL. His work focuses on applying large language models to structured data systems, improving query reliability, and building governed analytics workflows for production environments.
Basedash lets you build charts, dashboards, and reports in seconds using all your data.