We spent the last few months talking to more than 50 data teams about agentic analytics.
Not the category slide version. The real version: what they actually wired up, what worked in a demo, what broke when someone asked a second question, and what they want to try next.
By agentic analytics, we mean an agent that answers a business question end to end. It reads the warehouse, the BI artifacts, the metric definitions, the docs, sometimes the codebase, then writes and runs the query. This is not a copilot suggesting SQL to an analyst, but something a non-technical person can ask directly and act on.
The first thing we learned is that “we are doing agentic analytics” means almost nothing by itself. The same words cover very different setups. Some teams point Claude Code at a warehouse. Some use the assistant built into their data platform. Some build a semantic layer. Some are trying to make that layer update itself from real usage.
So the useful question is not whether agentic analytics works. It is: which one you are running, what it knows, who maintains that knowledge, and how you know the answers are still right?
The five stages we kept seeing
What separates them is not the model but how much business meaning the agent has, and where that meaning comes from. The five line up as a progression: each gives the agent more context than the last, and most teams are trying to move further along it.
Stage 1: bringing a general agent to the warehouse
The simplest version is a general-purpose agent with database access. Claude Code, Cursor, an internal Slack bot, a thin MCP wrapper over Snowflake or BigQuery. The agent can inspect schemas, write SQL, run it, and explain the result.
What works: it is fast, flexible, and genuinely useful for technical users. It can explore a schema, draft queries, debug errors, and help an analyst move faster. For a data person who already knows the tables, it can feel like a better workbench. Several builders were comfortable giving an agent direct SQL access when the dataset was clean, narrow, and already well understood.
What breaks: the agent does not know the business meaning behind the schema. It can infer from table names, column names, dbt docs, and maybe a few pasted instructions, but that is not the same as knowing which definition of “active” the company uses. One head of analytics said the agent would pick the wrong table or grab the wrong column because it did not understand how the warehouse was structured.
A senior data engineer at a finance SaaS put the basic failure mode exactly: “You ask it a question, it answers with total confidence, but you already know the answer is wrong.” The quieter version of the same gap, from a data analyst at a public-sector org: the agent is “good on generic metric definitions, but it stays shallow on our specific context.”
Where teams go next: they start giving the agent more context. Not because the model cannot write SQL, but because SQL is the easy part once the meaning is clear.
Stage 2: using the native assistant in your data tool
The second version looks similar to the business user, but it is operationally different. The agent is built into the warehouse or BI tool: Snowflake Cortex Analyst, Gemini in BigQuery or Looker, Power BI Copilot, Databricks Genie, Omni’s AI agent, and the rest of the emerging native assistant layer.
What works: the assistant sits closer to the platform’s permissions, metadata, and semantic layer. There is less glue code. The demo path is cleaner. If the company has already modeled the right metrics inside that tool, the assistant can feel much safer than raw database access.
The strongest examples were scoped, not universal. One product analyst testing Omni’s agent described the setup as going “topic by topic”: add one domain, add context, make sure the dbt descriptions are correct for the relevant tables, then ask questions until the team is satisfied. Another operations leader said 80 to 85% of the data points they needed were already structured in Omni; the truly ad hoc questions were the exception. That is the happy path for this stage: a bounded domain, clean dimensions, known metrics, and enough documentation inside the tool.
What breaks: it is only as good as the meaning already modeled inside that platform. A native assistant still has to know whether “active user” means paid invoice in the last 30 days, app open in the last 90 days, or the definition hidden in an old dashboard filter. One head of data analytics at a mobility platform built a semantic layer, exposed it through the warehouse’s assistant, and put it in front of a few stakeholders: “The first wow effect lasted one week. By the end of week one, nearly 100% churn. Too many hallucinations.”
Another head of data described the same boundary from the other side: the agent guidance assumed something like 20 definitions per explore; their real explore had 500. The assistant did not fail because it was native. It failed because the modeled surface was too broad.
Where teams go next: they either narrow the surface area to a better-modeled domain, or they start looking for context outside the native tool. The platform integration helps. It does not remove the need to define the business.
Stage 3: pointing at the context you already have
The team does not yet build a new context layer. It points the agent at the context the company already has.
This took several forms in the conversations. Sometimes it was a skill or system prompt that tells the agent which repo, dashboard, data catalog, or wiki to inspect. Sometimes it was RAG over Confluence, Notion, PDFs, or internal documentation. Sometimes it was baked into an existing agent platform like Dust, with synced context files, LookML, catalog entries, or a connector to a BI tool. Sometimes it was simpler: markdown generated from dbt docs, query history, and warehouse metadata.
The scrappiest version came from a product manager, not a data team. He had built a Notion page with business terms, table definitions, column notes, and metric definitions, then copy-pasted that context into Claude whenever he wanted SQL for Metabase. Later he wired Claude to the Notion knowledge base through MCP. Same problem, smaller scale.
What works: the agent gets materially better. Existing docs and BI artifacts often carry the missing clues: how a question is used, why a metric exists, which table is the trusted one, which dashboard people actually cite. One data team described an exploratory project to “combine everything we have on Confluence with our dbt lineage” and the repos behind event pipelines, so a bot could answer in Slack. This is usually the first stage where teams feel they are getting beyond raw text-to-SQL.
What breaks: retrieval becomes the problem. The issue is no longer only whether the context exists. It is whether the agent can find the right slice, ignore the stale one, and explain why it used one source instead of another. One context engineer who had pushed this far put the lesson simply: “Where you store the context doesn’t really matter. What matters is the retrieval strategy.”
The pattern showed up in the negative cases too. One team tried a RAG platform by dumping thousands, then tens of thousands, of documents into it. The result was not productive. Another team found YAML alone too thin: it described tables, but not the difference between similar metrics or the way questions were used in practice.
Where teams go next: they stop treating context as a pile of documents and start trying to structure it.
Stage 4: building a curated context layer
Now the team authors a real layer above the tables. The business meaning the schema cannot hold. Which dashboard is trusted. Which tables are the entry points. Whether a familiar metric name means the finance definition, the operating definition, or the old reporting definition everyone still quotes.
What works: the agent finally has something it can rely on. One data governance lead described the shape of the problem: “Like any data warehouse, we have thousands and thousands of tables. In reality, I think 20 of them answer 90% of the questions the business can ask.” A curated layer gives the agent that map. It tells the agent which 20 matter, which definitions are certified, and when a question should not touch the long tail.
It also captures the small rules that make a number right. One group chief data officer gave the example of a table with eight date columns. “Which is the right date column for which operation?” is not a text-to-SQL problem. It is a business-context problem. The same applies to amount columns, tax treatment, time windows, and default filters. The agent stops guessing because someone owns the rule.
The teams that made this look easy usually had unusually clean foundations already. One head of analytics at a scale-up described a strict naming convention, one semantic view per dbt table, and an agent plugged into Looker through MCP. It worked because the context was already disciplined. The agent inherited that discipline.
What breaks: building the layer once is not the end of the work. It becomes something someone has to keep true. A group chief data officer named that as the oldest problem in the field: “The hard part isn’t having the tool. It’s getting people to fill it in and keep it up to date. That’s the problem of the last 30 years on data.”
In a demo call with a marketplace team, they had already done the first half: a one-time semantic-layer build. The open question was everything after it. As their head of analytics put it: “We did a one-shot pass on what exists, without really addressing maintenance: how we’ll keep it up over time.”
Some definitions are still contested. The layer can record a decision, expose its owner, and make the answer consistent. It cannot make Finance, Marketing, and Product agree by itself.
Where teams go next: they try to turn usage into maintenance. If people correct the agent, clarify a concept, or ask a question the layer cannot answer, that should not disappear into a chat transcript.
Stage 5: keeping the context alive
This is the frontier. The context updates from real use: corrections from conversations become reviewed changes to the shared layer. The team can replay questions, see what broke, and evaluate whether a change made answers better or worse.
Almost nobody is fully here. The teams closest to it are still candid about the gap. One team had automated the collection end to end, with DAGs pulling queries, YAML, and Notion on a schedule: “There’s not a lot of maintenance, but there’s also zero evaluation. So if things drift, we have no idea.”
That distinction matters. Automated collection is not the same as a feedback loop. A catalog rebuilt daily from dbt, lineage, table metadata, and BI metrics is useful. But if business-owner context is still manual, and if nobody knows which answers changed quality, the system is still closer to stage 4 than stage 5.
The first real loops we saw were mostly manual. In one pilot, the data referent read through the queries, figured out what the agent had misunderstood, edited the markdown context, or went back to the user to ask what they meant. Useful, but not yet scalable.
More advanced teams were trying to automate the triage. One built a skill that pulled every agent query from logs and ran an LLM-as-judge pass so the team could proactively improve both the context and the tool. Another described the target state more directly: after a conversation, the agent reads its trace, writes a feedback note, clusters those notes by impact, and opens reviewed PRs against the context layer.
What works: real user questions are the best source of missing context. They reveal the concepts nobody documented, the definitions people disagree on, the tables everyone avoids, the dashboards that encode tribal knowledge.
What breaks: without review and evals, the loop becomes a new way to create drift. The agent can collect assumptions forever. That does not mean any of them should become truth.
Where teams go next: reviewed changes, version history, replayable questions, eval sets, and a queue that tells the data team which context gaps actually matter.
The most immediate blockers to production
Trust in the numbers
People can forgive a generic chatbot that is confidently wrong. They do not forgive a wrong internal number. A few wrong answers from an experimental analytics agent can destroy user trust, cut adoption, and fail the pilot. Dashboards, board decks, and finance packs are read as the current truth of the business, and an analytics agent inherits that bar.
This is what caps deployment today. The teams getting useful answers are usually the ones still checking the agent by hand. A senior data analyst who tested one said they “couldn’t just roll it out and let people use it,” because every answer needed “interrogation and auditing and checking” against reality. A director of data analytics at a gaming company put it from the other side: they cope today only because they can verify what the agent does, and the real work now is making it reliable enough for anyone, so that check is no longer needed.
That is why evals are on the most advanced teams’ roadmap: without a way to see when the agent is right, when it is wrong, when it answers a question it cannot actually answer, and whether last week’s change helped, there is no systematic way to keep it from burning that critical user trust.
Data access rights
Most teams cannot give an agent real per-user access to the warehouse and be sure it only sees what the person asking is allowed to see. One context engineer called impersonation an open problem across warehouses and BI tools.
So teams hold the agent back. Some never connect it to the data at all: it writes the SQL, and a person pastes it into the trusted environment to run. Others let it run the query itself, but keep it inside tooling they already trust. “Our agents aren’t connected to Snowflake,” a data and AI lead at a SaaS company told us. The agent reaches production data through an MCP built on their existing production tools, not a direct connection to the warehouse.
Giving the agent its own scoped read-only account on a slice of the warehouse is about as far as anyone goes.
The stakes are real, and not specific to analytics agents. Lock it down and the agent is useless. Leave it open and the blast radius grows. As one head of analytics put it, if a CRM already lets a lot of people delete records, an agent just lets them delete faster. A lead data engineer at a consumer fintech went further: they killed an agent built by an outside vendor and rebuilt it in-house because the data ran through someone else’s system. Even though the vendor never kept the queries or the data, the access alone was “a big, big no.”
Until that is solved, access caps deployment as hard as trust does. The safe option keeps a technical person in the loop to run the query, which is exactly the bottleneck an analytics agent is supposed to remove.
Where everyone is heading
Almost nobody we talked to thinks they are done. The teams running a general agent want the context that stages 3 and 4 add. The teams that built a layer want it to stay true without a person babysitting it. The direction is the same everywhere: more business meaning, better maintained, so the agent can be trusted.
Two wants come up almost everywhere, and they are worth separating from the rest.
First, teams want to start from the context they already have: warehouse schemas, dbt models, dashboards, BI questions, Notion, internal wikis, code, query history. They do not want a six-month modeling project before the first useful answer. As one data analyst put it, the raw material has to be there somewhere already: “The context, the documentation, the schemas, the logical diagrams, the relationships between tables, objects, how it all works, which field maps to what - all that has to already exist. It’s not going to be created from nothing.”
Second, they want the agent to know when it lacks context instead of guessing. If “active user” is undefined, it should say so. If two metrics look similar, it should ask which one to use. If the data cannot support the question, it should stop.
The genuinely new part, the stage almost nobody has reached, is the loop that keeps context true on its own. Corrections from real conversations become reviewed, governed changes to the shared layer. Evals catch the drift before a stakeholder does. Today that work is manual where it happens at all, and most teams have neither the loop nor the evals. That is the open frontier, not a better chatbot.
All of it has to sit inside a security model that can survive production review.
That is where most teams are now. They have spent years building the warehouse, the BI layer, the docs, the dashboards. One person called it a castle with a tiny door: the data team and BI are the door. An agent widens it.
Widening the door is exciting, and it also means more edge cases and more ways for hidden context to quietly become production infrastructure. The hard part is no longer getting an agent to write a query. It is whether a company can put one in front of the business and live with the consequences.
If your setup looks different, we would genuinely like to compare notes.