How we are learning to build intelligence layers above business systems

Applied intelligence does not start with agents or frameworks. It starts with understanding existing systems, finding the interpretation gap, building the smallest useful layer, and learning from how people actually use it.

I started looking at agentic AI as a technology question. Which frameworks matter? How should agents be orchestrated? What changes when models can call tools? Those questions still matter, but they no longer feel like the first questions.

The more I work in this space, the more I find that the useful question is not about the technology. It is about the system already in place, where it does its job, where it stops short, and where someone is still doing the interpretation by hand.

Most organisations that are seriously considering applied intelligence are not starting from nothing. They have systems, dashboards, reports, integrations, workflows, and some combination of tools and people that have been holding things together for years. The opportunity is rarely to replace all of that. The opportunity is to understand where it stops before the interpretation happens.

This is the practice we are developing at Konstant Variables. Not a finished methodology but a working process that gets refined with each engagement. This post describes how we are thinking about it now.

Discovery

Discovery is not about documenting requirements. It is about understanding the real system, which is almost never the same as the documented one.

Process diagrams tend to show the clean version. They do not show the WhatsApp message that moved a deadline, the spreadsheet a coordinator trusts more than the official dashboard, or the operations manager who holds the context that makes a particular client make sense. That informal layer is not a problem to be eliminated. It is often where the real business logic lives. But it is fragile, it moves with people, and it means the same synthesis is being done by hand, every week, by whoever happens to be paying attention.

The useful discovery questions are, where do people leave the formal system? Where is context living in email threads and call notes rather than in any structured record? Where do reports show what happened without saying what to do next? Where does a decision depend on one experienced person being available? And the question that often gives the most signal is, what happens when that person is not there?

Planning

Not every gap needs AI. This is something worth being clear about, because discovery often surfaces multiple things that look like opportunities.

Some gaps are data quality problems. The system has the information but it is stale, inconsistent, or incomplete. Putting an intelligence layer above bad data produces confident-sounding nonsense. Some gaps are process ownership problems, where nobody is sure who should act on the output, so even a correct interpretation does not change anything. Some gaps are simpler than they look, a deterministic query, a well-designed report, or a clear workflow would do the job without any model.

The AI opportunity is usually narrower: where the gap is specifically about interpretation, synthesis, or decision preparation. Where there is too much context for a human to hold at once, spread across systems that do not talk to each other, updated at a frequency that makes manual review impractical. Planning means being honest about that boundary before committing to a build.

Design

The design question is, what is the smallest useful intelligence layer?

Not the most ambitious one. Not the one that covers the whole problem. The one that proves the interpretation is valuable. That might be a single briefing for a single recurring meeting, one exception pattern made visible and explained, or one weekly summary that replaces an hour of manual reading. The scope should be narrow enough that working or not working is clearly observable.

An online academy is a useful example. By the time it has run for a year or two, it usually has quite a bit of data, student profiles, batches, attendance records, assessments, fee status, instructor schedules, curriculum progress. The data is there. The interpretation is not.

Every week, someone still has to ask, which students are quietly disengaging? Which batch is losing momentum? Where is follow-up overdue? Each answer may exist somewhere in the system. The synthesis rarely does. A well-designed intelligence layer here might produce a weekly briefing containing a clear view of where attention is needed and why, built from data that already exists and delivered in time to be useful. That is a narrow scope and can be verified.

Execution

Building it like software, not like a demo, is the thing that matters most in execution.

A demo is fast to build and often impressive. Production is different. The inputs need to be defined — what data is being read, from where, at what frequency. The outputs need to be structured so they can be evaluated. The system needs permissions that reflect what it is actually allowed to do. Logs need to exist. Fallback behaviour needs to be designed for when data is missing or inconsistent. Prompts need to be versioned. Evaluation examples need to be collected early, not retrofitted later.

Human review should be present at the right points, not as an afterthought, not everywhere, but at the places where the cost of a wrong interpretation is high. These are ordinary software engineering decisions. The vocabulary of the AI layer is new. The discipline around it is not.

Delivery

Building the right thing and delivering it to the wrong place is a common failure mode.

A briefing that arrives in a system no one checks becomes another report. A recommendation that lands without context for why it matters gets ignored. Delivery means understanding where work actually happens, what the operational rhythm looks like, and how the output needs to fit into that rhythm to be used.

This is less visible than the build. It is also where a lot of applied intelligence work quietly fails.

Evaluation

The useful question after delivery is not whether the system is running. It is whether behaviour changed.

Did follow-up happen earlier? Did managers notice risk sooner? Did the weekly review improve? Did people spend less time assembling context and more time acting on it? Did decisions become clearer? These questions take time to answer. They require talking to the people using the system, not just checking whether the output was produced. The output being produced is the floor. Behaviour changing is the actual measure.

Optimization

The system should get better as the organisation learns what signals matter.

That might mean changing which data is included, refining how exceptions are described, adjusting when a briefing is delivered, or adding a category of risk that became visible through use. This is not magic autonomy. It is the same disciplined iteration that any production system requires, informed by what real users noticed, what they acted on, and what they did not.

This series is our attempt to think through that work in public. The next posts will take each stage separately, discovery, planning, design, execution, delivery, evaluation, and optimization, and make the thinking more concrete as we learn.