Estimated reading time: 15 minutes
Key Takeaways:
- The term "AI harness" refers to everything you build around a language model: context, tools, orchestration and feedback loops.
- For SMEs, the choice of model (Claude, GPT, Gemini) has become surprisingly unimportant; the harness around it almost entirely determines whether an AI implementation delivers value.
- An AI system in production consists of four moving parts: the model, the context it receives, the tools it is allowed to use, and the orchestration between tasks.
- A harness becomes important as soon as a task recurs, has multiple steps, or is used by multiple people; for one-off tasks a regular ChatGPT session is enough.
- The competitive advantage is not in choosing the best model, but in organizing the context, tools and feedback around it: something where a well-organized SME can beat a larger competitor.
Table of Contents
- Key Takeaways
- Table of Contents
- Introduction: why model choice is the wrong discussion
- What is an AI harness?
- Why the harness matters more than the model
- The four parts, made concrete
- When do you not need a harness?
- How to start practically
- Competitive advantage lies in execution
- Final thoughts
- Frequently Asked Questions
Reading time: 8 min
Key Takeaways
-
The term "AI harness" refers to everything you build around a language model: context, tools, orchestration and feedback loops.
-
For SMEs, the choice of model (Claude, GPT, Gemini) has become surprisingly unimportant; the harness around it almost entirely determines whether an AI implementation delivers value.
-
An AI system in production consists of four moving parts: the model, the context it receives, the tools it is allowed to use, and the orchestration between tasks.
-
A harness becomes important as soon as a task recurs, has multiple steps, or is used by multiple people; for one-off tasks a regular ChatGPT session is enough.
-
The competitive advantage is not in choosing the best model, but in organizing the context, tools and feedback around it: something where a well-organized SME can beat a larger competitor.
Table of Contents
-
Introduction: why model choice is the wrong discussion
-
What is an AI harness?
-
Why the harness matters more than the model
-
The four parts, made concrete
-
When do you not need a harness?
-
How to start practically
-
Competitive advantage lies in execution
-
Final thoughts
-
Frequently Asked Questions
Introduction: why model choice is the wrong discussion
Almost every SME conversation about AI I have starts with a model choice. "We want to do something with Claude." "We bought a ChatGPT license." "What do you think, Gemini or GPT-4?"
That is a logical starting point: models are what the news is about. But it is also the wrong starting point. The question of which model you use has become surprisingly unimportant for most business applications. What really matters is the harness around it.
In this article I explain what an AI harness is, why it determines whether your AI project delivers value or quietly dies, and how you as an SME can think about it practically.
What is an AI harness?
The term comes from the world of AI engineering and was really put on the map in early February 2026 by Mitchell Hashimoto, co-founder of HashiCorp and the creator of Terraform. In a blog post about his own experience with AI agents, he introduced what he called "engineering the harness": every time an agent makes a mistake, you do not fix the mistake itself, but the environment around it, so that the same mistake can no longer occur (mitchellh.com). A few days later, OpenAI published an extensive report on an internal experiment in which a team of three engineers used this approach to build a software product of about a million lines of code: without a single line of manually written code (openai.com). The term spread through the entire field within weeks.
A harness is, literally translated, a tackle: the fitting that connects a horse to the cart. Without a harness, a horse is still a horse, but it does not pull a cart. It is a metaphor that works well: you can have the most powerful model in the world, but without the structure around it, that does not produce a working system.
Concretely, an AI system in production consists of roughly four moving parts:
-
The model. Claude, GPT, Gemini, an open-source variant such as Llama or Qwen. The thing that produces text.
-
The context. What the model receives per task: documents, previous conversations, customer data, company policy, processes, tone of voice.
-
The tools. What the model can do besides talking: send email, query a database, read a calendar, create an invoice, generate a PDF.
-
The orchestration. How tasks are broken up and passed on. Which agent does what. Who controls whom. What happens when something goes wrong.
The harness is everything except the model. It is the infrastructure that ensures the model works in your business, for your customers, with your data, in your way. Birgitta Bockeler of Thoughtworks summarizes this in the short formula now circulating across the industry: Agent = Model + Harness (martinfowler.com).
A handy analogy: the model is the engine. The harness is the car around it. A Ferrari engine without a gearbox, steering wheel, brakes, dashboard and seats is not a car. It is an exciting metal block in your garage. And yet half the SME world talks about AI as if you buy the engine and you are done.
Why the harness matters more than the model
Three reasons.
One: models are quickly becoming a commodity. The distance between the top of Claude, GPT and Gemini is getting smaller on most tasks, not larger. For specific benchmarks, the leader changes month by month. For the kind of work SMEs do (summarizing, writing, classifying, simple analyses) all major models are already plenty good enough. You can choose Claude today and switch to GPT in six months without loss of quality, provided your harness is prepared for it.
Two: the model knows nothing about your business. A freshly started language model knows everything about the French Revolution and nothing about your customer service policy, your supplier prices, or that Marcel from the warehouse takes Thursdays off. What the model produces is at best as good as the context it receives. Bad context, bad output. That applies regardless of which model you use.
Three: one answer is not a workflow. An AI that writes a polite email is nice. An AI that recognizes an incoming complaint, looks up the right customer file, checks whether it has already been responded to, drafts an appropriate reply, has that reply checked by a colleague if it is a sensitive case, and sends it after approval: that is something valuable. The difference is almost entirely harness.
This principle is now broadly endorsed in the engineering world. Ryan Lopopolo of OpenAI describes how his team did not ask "how do we prompt better?" with every error from an agent, but "which capability, context or structure is missing?" (openai.com). The common thread in the literature is clear: a disciplined harness on a weaker model beats an undisciplined approach on a stronger model, every time.
In other words: model choice perhaps influences 10% of the outcome. The harness influences the other 90%.
The four parts, made concrete
Context: what does the AI actually know about you?
The biggest gap I see in SMEs is between what an AI tool should know and what it actually knows when you ask it a question. Suppose you say to ChatGPT: "Write a quote for client X." Then ChatGPT needs to know somewhere:
-
Who client X is and what the history is
-
Which products or services you offer and at what price
-
Which discount, if any, applies
-
What your quotes normally look like
-
Whether there are any open points from earlier conversations
In most SME implementations, there is nothing here at all. The AI starts every task from scratch. The company hopes that the employee types all of this into a prompt themselves, which means that the employee is actually doing the work and the AI is only formatting the text.
A serious harness solves this. We use, among other things, structured Obsidian vaults, ClickUp data, email history via Gmail integrations, and customer-call transcriptions via Fireflies: all accessible to the relevant AI agents. Not all data at once, because that creates noise. The OpenAI team calls this principle "give Codex a map, not a thousand-page manual": context is a scarce resource, and giving everything at once means in practice that the model receives nothing targeted (openai.com).
Concretely: when I ask an agent to write a customer proposal, it independently retrieves the customer file, looks at our previous proposals, reads back the meeting transcripts, and produces something I can take 80% further. Not because the model is special (it is the same Claude model anyone can use), but because the context is there.
Tools: what can the AI actually do?
A language model without tools can produce text. Period. It cannot send email, create an invoice, read a calendar, create a customer in your CRM. For that you need tools: standardized connections between the model and the outside world.
In SMEs I see two extremes. On one hand, companies that add no tools at all and have their employees copy and paste between ChatGPT and the rest of their software. That works for incidental tasks, but it is not a system. On the other hand, companies that want to tie everything together, resulting in a spaghetti no one can untangle anymore.
The middle road is targeted tool choices per use case. For our customer service agent, for example, three tools are relevant: read email, retrieve customer file, save a draft response. Not send: a human does that, after review. For our quote agent, other tools are relevant: read ClickUp, retrieve previous proposals, create a new Google Doc. Each agent gets only the tools it needs, no more.
That is a deliberate design choice. The more tools an agent has, the greater the chance of wrong choices and hard-to-debug problems. Limitation is a feature.
Orchestration: how do the pieces work together?
As soon as you automate more than one task, you get an orchestration question. Which agent does what? Who passes what on? What happens if step three fails? Who checks the end result?
This is where I see many SME projects get stuck. They start with "we want one AI assistant that can do everything." That sounds nice, but in practice that is a generalist that does everything mediocrely. It works much better to build specialized agents (one for incoming mail, one for quotes, one for planning) and let them collaborate through an orchestration layer. Anthropic has explicitly tested this principle: a setup with a separate planner, generator and evaluator performed considerably better than a single agent on the same task (milvus.io).
A concrete example from our work. We have a client in online retail where incoming customer questions go through three different agents before there is a draft response. The first classifies what type of question it is (complaint, return, product question, other). The second retrieves the right data based on that classification (order history for returns, product specs for product questions). The third writes the draft. Only after that does a human come in for approval.
Three agents, each good at one thing, with clear handoff between them. Much more reliable than one mega-prompt where you stuff everything in and hope it goes well. And much easier to maintain: if classification goes wrong, you know exactly where to look.
Feedback and memory: does the system actually learn?
This is the part most often skipped and afterwards makes the difference between an AI system that gets good and one that always stays mediocre. It is also exactly the point with which Hashimoto started: every error of the agent is a reason to adjust the environment so that the error can no longer happen (mitchellh.com).
A typical AI implementation works like this: employee asks question, AI answers, employee corrects the answer manually, customer gets answer. The correction disappears into the wind. Next time the AI makes exactly the same mistake.
A harness with feedback loops does it differently. The corrections are stored somewhere, as examples, as rules, as context that goes into future prompts. The system gets better while you use it, not only when someone adjusts the prompt.
For us these are structured memory systems that track per agent and per client what works and what does not. No magic: just well-maintained example files, with version control, that are inserted at the right moment. But the difference between a system that has this and one that does not is huge after three months.
When do you not need a harness?
For completeness: not every AI application requires a harness. For one-off tasks (translating an email, shortening a text, brainstorming a name) opening ChatGPT and typing is fine. There the harness is the user's own brain, and that is enough.
The harness becomes important as soon as one or more of these things are true:
-
The task recurs repeatedly
-
The result must be consistent over time or across employees
-
There are multiple steps or multiple data sources
-
Multiple people use the system
-
Errors have costs (customer, money, reputation)
In those cases an ad-hoc setup quickly crumbles. Someone forgets a step, someone else uses a different prompt, no one knows anymore why the output of the past month looked different.
How to start practically
Not by building a harness from scratch tomorrow. But by asking the four questions for every AI idea that comes up in your business:
-
Which model is actually needed here? Often a less relevant question than you think. Choose a good model and park the discussion.
-
What context does this need? Which data should the AI be able to consult? Where is that data now? How does it get there?
-
Which tools are needed? What should the AI be able to do besides answering? Which of those can it do autonomously, and which require human approval?
-
What does the orchestration look like? Is this one task or a chain? Who checks the result? What happens with errors?
If you ask those four questions for every AI project, you naturally move from "we are buying ChatGPT licenses" to something more substantial. Not necessarily larger in technology, often smaller and more targeted, but more substantial in effect.
Competitive advantage lies in execution
Models themselves are available to everyone. The difference is in how you deploy them. My prediction: in a few years the model will be even less important than now. The major players are converging to comparable performance on common tasks, prices are dropping, and most business software builds in model choice as a configuration setting that you can change month by month.
What remains as a competitive advantage is the harness. Companies that have organized their context well, that have built targeted tools for their specific workflows, that have closed feedback loops, that have grown memory systems: those will have noticeably better AI systems than companies that still keep clinging to model choice.
For SMEs, this is good news. It is not a contest of who has the largest compute budget. It is a contest of who best understands how their own work fits together. That is a game in which a well-organized SME company can beat a much larger competitor: provided they stop staring at the engine and start building the car.
Final thoughts
An AI harness is not a technical feat. It is simply how you deploy this technology responsibly and effectively if you want to move beyond the experimentation stage. Context, tools, orchestration and feedback are means, not ends in themselves. What counts is that the system does what it should do, works reliably, and fits within your way of working.
For SMEs, the opportunity lies there: not the largest implementation, but the smartest. Start small, learn what works in your context, and expand step by step.
In future articles I will work this out further with concrete examples from our work: how we set up our own harness, which mistakes we made along the way, and what we learned by treating ourselves as the first customer.
Frequently Asked Questions
What is an AI harness?
An AI harness is everything you build around a language model to make it usable for a specific business application: the context the model receives, the tools it is allowed to use, the orchestration between tasks, and the feedback loops that ensure the system learns from mistakes.
Why is the harness more important than the model?
Because modern language models have become interchangeably good for most business tasks. The difference between an AI implementation that works and one that stays mediocre lies almost entirely in how you have organized context, tools and feedback, not in which model you have chosen.
Do I always need a harness?
No. For incidental, one-off tasks (translating an email, brainstorming, shortening a text) a direct ChatGPT or Claude session suffices. A harness becomes important as soon as a task recurs, has multiple steps, or is used by multiple people.
Where should I start as an SME?
With one concrete process that costs a lot of time or causes visible frustration. Build a working harness around it (context in order, the right tools, clear orchestration) and scale from there. This way you keep control of costs and quickly know whether the approach works for your situation.
How does this relate to AI regulations like the EU AI Act?
A well-designed harness helps maintain control over what an AI system can and cannot do: think of which data it sees, which actions it can perform autonomously, and which steps require human approval. That kind of control is exactly what compliance frameworks like the EU AI Act ask of companies, especially for higher-risk applications.
Ready to transform your organization with AI?
Discover how we can help you with AI workflow automation.
Get in Touch