The hard part of self-maintaining context for analytics agents

Context is now table stakes for analytics agents. The new problem is keeping it true as the business keeps moving.

Matthieu Blandineau June 11, 2026

The hard part of self-maintaining context for analytics agents

Context is now table stakes for analytics agents. The new problem is keeping it true.

A head of analytics at a large marketplace described the gap plainly: “We’ve done a one-shot on what exists. We haven’t really tackled how we’ll maintain it over time.”

The one-shot gives the agent a useful first version. From then on, every renamed model, new question, corrected answer, and changed business definition becomes a maintenance problem. Some of those changes leave a clean trace. Others do not. And even when a system finds the change, it still has to decide where it belongs, who can approve it, and what else it will affect.

That is what “self-maintaining context” actually has to solve. I know because we are trying to build it right now.

The changes a system can see

The first maintenance loops are appearing already.

On the technical side, a system notices that a dbt model changed or a column moved. It updates the relevant context, or proposes an edit, so the next agent does not rely on stale technical metadata.

On the business side, a question comes in and the agent cannot find an agreed metric for it. That failed question becomes a signal. The system drafts the missing definition and opens a change for the data team to review.

And, more realistically in many teams today, someone flags a bad answer. They paste the conversation into Slack, an analyst investigates, and the fix is pushed back into the context manually or with Claude helping along the way.

These cases are not equally difficult, but they follow the same basic loop. Something leaves evidence. The system catches it and proposes an edit.

That is the right place to start. A schema diff is observable. So is a failed query, a renamed model, or a metric that three dashboards calculate in roughly the same way but nobody has defined centrally.

These are already maintenance loops. They are also the tractable ones, because the change leaves something relatively tidy to detect.

The changes that leave no diff

Suppose “qualified pipeline” meant one thing when a company sold directly. Then the company launches a partner channel. RevOps starts including partner-sourced opportunities, but only after a partner manager accepts them. The old metric still runs. The dashboards still load. The agent still answers.

It just answers with last year’s meaning, and nothing throws an error.

There may be evidence that the definition moved, but it lives somewhere else. A sales leader corrects a number during a forecast review. Someone clarifies the term in a Slack thread. Analysts quietly add a filter whenever they use the official metric. None of those events looks like a schema change.

Now the system has to infer that a definition may have changed. It also has to distinguish a real company decision from one person’s preference, a temporary workaround, or an exception that only applies to one region. Finding the sentence is not enough. The useful output is a proposed change with the evidence that justifies it.

Business drift is much harder than technical synchronization. The system is no longer comparing two versions of code. It is reconstructing a decision the company may never have recorded as a decision.

Finding the change does not tell you where it belongs

Suppose the system gets that far. It correctly notices that “qualified pipeline” now has two interpretations and identifies the one RevOps uses. It still has not finished.

Where should that fact live? On the metric itself, in the sales domain, or as a channel-specific exception? Does it replace the old definition, or should the definition split in two?

The useful rule is simple: one fact, in one canonical home, at the most specific level where it applies. Put a narrow rule too high and the agent carries it into unrelated questions. Copy it into four places and the fifth version arrives a few months later.

So maintenance cannot simply mean “add the new truth.” It has to propose whether to replace, split, move, or deprecate what is already there. The system needs enough structure to keep context coherent, without turning every change into an ontology redesign.

An owner field is not an approval loop

Then someone has to approve the change.

Data catalogs already have much of the vocabulary for this. They have owners, stewards, domains, certifications, and workflows. So the missing piece is not another owner field.

The problem is that the catalog usually sits outside the moment when the meaning changes. Someone corrects a number during a forecast review, applies the new logic in a query, and moves on. Updating the catalog means opening another tool, finding the right asset, rewriting the definition, and working out who should approve it. Unsurprisingly, the catalog stays beautifully governed and slightly wrong.

Self-maintaining context changes the starting point. Real usage produces a proposed edit. The system brings the evidence, identifies where the definition should live, shows what depends on it, and routes one concrete decision to someone allowed to make it. A narrow wording clarification may need one domain owner. A change that alters finance and sales reporting may need both teams to agree.

The person who spots the drift, the person who owns the business definition, and the person who can change the underlying model are often three different people. The proposed edit has to travel between them with its reasoning intact.

It also needs a blast radius. Which dashboards, agent answers, or recurring analyses depended on the previous definition? Should historical answers keep the old version? Did the change actually fix the question that surfaced the problem? Without that feedback, the system is writing changes into context without learning whether they improved anything.

This may end up looking less like a replacement for the data catalog than a catalog that does most of its own maintenance. For the part that cannot be automated, the product challenge is simpler to state and harder to build: make the decision timely, specific, and easy enough that the right person actually makes it.

Maintenance is where the harder problems begin

Teams are making real progress with analytics agents. They are connecting them to dbt, semantic layers, catalogs, and whatever business context already exists. When something is missing, they generate it. When an answer is wrong, they capture the correction and push a fix back. These are useful first loops, and everyone building them is learning quickly.

But automating the update does not close the context problem. It opens the rest of it.

Once changes start flowing continuously, the system has to decide whether a correction represents a new company rule or a local exception. It has to place that rule without duplicating what already exists. It has to find the person allowed to approve it, show them enough evidence to decide, and track what the change will affect.

Each step depends on the previous one being right. Detect the wrong signal and you propose the wrong change. Put the right change in the wrong place and the agent may misuse it. Route it to the wrong owner and you have automated the production of confidently governed nonsense.

The first generation of agentic analytics is teaching agents where to find context. The next will have to keep that context true while the business keeps moving.

That is a much harder system to build. We are only beginning to see its shape.

Blog

Notes on data, meaning, and context

Rules vs examples: how much does spelling things out help?

Rules, metrics, or worked examples? We measured which context representations shorten an LLM's path to a correct answer, and what the wrong turns taught us about good context.

Context engineering for analytics agents: lessons from six months of building and rebuilding

Analytics agents need context. Great. How should you structure it? Lessons from six months of building, testing, and rebuilding it.

The state of agentic analytics, from 50 real data teams

Field notes from 50+ conversations with data teams: the five stages of agentic analytics, what breaks at each, and what teams want next.

Discuss your analytics agentproject with us

Book a call with our founding team, compare your setup with what the most advanced data teams are doing, and see how Cassis can help you build trusted agents grounded in governed, maintained context.