Estimated reading time: 11 minutes

Key Takeaways:

  • In the previous article we explained why the AI harness is becoming more important than the choice of model. In this follow-up we show how we set up that harness at And AI ourselves.
  • At Andai we treat ourselves as our first customer. Only once we have felt for ourselves where AI scrapes in production do we then build the same principles for clients.
  • Our own harness consists of three context layers, eighteen specialized agents, two agent teams, three main pipelines, and a feedback system with three levels.
  • The biggest lesson: an AI system does not learn by itself. Corrections must be actively captured, enforced, and re-used at the right moment.
  • Presentations, exports, and skills turned out to be far more treacherous than expected. Validation and iteration in particular cost more time than you estimate up front.
  • For SMEs the lesson is not that you should copy our setup. The lesson is that every serious AI project must answer four questions: context, tools, orchestration, and feedback.

Table of Contents

  1. Table of Contents
  2. From theory to practice
  3. Ourselves as the first customer
  4. The four parts of our AI harness
  5. Three challenges we did not see coming
  6. What this means for SMEs
  7. How to start practically
  8. Frequently Asked Questions

Reading time: 9 min

Key Takeaways

  • In the previous article we explained why the AI harness is becoming more important than the choice of model. In this follow-up we show how we set up that harness at And AI ourselves.

  • At Andai we treat ourselves as our first customer. Only once we have felt for ourselves where AI scrapes in production do we then build the same principles for clients.

  • Our own harness consists of three context layers, eighteen specialized agents, two agent teams, three main pipelines, and a feedback system with three levels.

  • The biggest lesson: an AI system does not learn by itself. Corrections must be actively captured, enforced, and re-used at the right moment.

  • Presentations, exports, and skills turned out to be far more treacherous than expected. Validation and iteration in particular cost more time than you estimate up front.

  • For SMEs the lesson is not that you should copy our setup. The lesson is that every serious AI project must answer four questions: context, tools, orchestration, and feedback.

Table of Contents

  • From theory to practice

  • Ourselves as the first customer

  • The four parts of our AI harness

  • Three challenges we did not see coming

  • What this means for SMEs

  • How to start practically

  • Frequently Asked Questions

From theory to practice

In the previous article we described why the AI harness is becoming more important than the choice of model.

The short version: for many SME tasks the large language models are now good enough. The difference is no longer mainly in Claude, GPT, or Gemini. The difference is in what you organize around them: the context the model receives, the tools it is allowed to use, the way tasks are divided, and the feedback that makes the system better.

That was the theory.

This article is the practical follow-up.

How did we set up our own AI harness? What works well? Where did we get stuck? And which lessons are relevant for SME businesses that do not want to use AI as a standalone prompt, but as a serious part of their operations?

So this is not a polished showcase. It is an honest field report.

Ourselves as the first customer

At And AI we deliberately treat ourselves as our first customer.

We build AI systems first for our own workflows, use them daily, run into their limits, and only then refine them further for clients.

That sounds logical, but in the AI market it is not always the case. Many agencies build AI solutions for clients while internally still working mostly with standalone ChatGPT prompts. We chose the opposite route: feel where it scrapes ourselves first.

The advantage is concrete. You discover earlier why a system that works on paper falls apart in practice. For example because of missing context, too broad tool access, poor validation, faulty exports, or corrections that are not stored anywhere.

The most important lesson: an AI system in production is never finished. It needs to be monitored, has to be able to learn, and needs continuous adjustment.

The four parts of our AI harness

The four parts of our AI harness

A working AI system is not just a model. The model is important, but the difference usually lies in what surrounds it.

For us, the harness consists of four parts: context, tools, orchestration, and feedback.

Context: three vaults, one deliberately without AI

We use three separate Obsidian vaults.

The management vault contains strategic decisions, project statuses, customer files, and meeting summaries.

The development vault contains technical documentation, architecture choices, and session logs of our build work.

The business vault contains HR data, financial agreements, contract values, and confidential information.

That last vault deliberately has no AI access. Not because it is technically impossible, but because we believe that some information should not have to flow through a language model. Even if it could be done safely on paper. Some lines you do not want to draw only after something goes wrong.

In addition, we use context from Fireflies, ClickUp, Gmail, and GitHub. Not all at once, but only where it is relevant for the task.

A meeting agent does not need to see financial agreements. A development agent does not need to read sales history. A sales agent does not need access to technical session logs if that adds nothing to its task.

The principle behind it is simple: give a map, not a manual.

An agent does not need to receive all information at once. It must first be able to see where the right information is, and then retrieve only what is needed. That prevents noise, speeds up the work, and makes the outcome more verifiable.

Tools: deliberately limit per agent

We currently have eighteen specialized agents in production. Each agent has a deliberately limited set of tools.

Some agents may only read. Others may create tasks, change code, or prepare files. Not every agent gets access to everything.

That is not a limitation born of weakness, but a design choice.

The more tools an agent has, the greater the chance that it picks the wrong route. A broad agent looks handy, but is harder to control and harder to debug.

A good agent does not have to be able to do everything. It must be able to do exactly enough to perform its task reliably.

That makes the system more honest. You know what an agent can do, what it cannot do, and where human control remains necessary. That is much more useful than an agent that can "do everything" but where no one really understands what it is doing.

Orchestration: two teams, three pipelines

Our agents are divided across two teams.

AIKO DEV focuses on development, technical research, code reviews, and pull requests.

AIKO OPS focuses on sales, finance, intake, customer overview, and internal operations.

On top of that run three main pipelines.

The sales pipeline starts with a new lead and runs via prospect research, conversation preparation, meeting recording, transcript processing, and quote attachment.

The meeting pipeline starts with a Fireflies recording and runs via transcript extraction, ticket creation, and memory update.

The development pipeline starts with a specification and runs via epic creation, coding, security review, and pull request.

An important choice: agents pass work to each other through the filesystem, not through the full conversation context.

That sounds technical, but the principle is straightforward. We do not want one agent to blindly carry over the reasoning of another agent. Each agent has to perform its own task based on the right input, not on the whole train of thought that came before.

This way the work stays more verifiable. And if something goes wrong, you can trace back where it went wrong.

Feedback: corrections need to go somewhere

The feedback system cost us the most time, but also delivers the most.

An AI system does not get better just because someone says once that something is wrong. That correction has to be captured somewhere and re-used at the right moment.

We work with three levels.

Tier 1 contains critical rules that are always read and live directly inside agent prompts.

Tier 2 contains active working rules that are available per session as background information.

Tier 3 is archive: useful historical context, but not active in the daily flow.

In addition, we use three mechanisms.

After a session, corrections can be captured.

Manually corrected output can be compared with the original output to find patterns.

And critical rules are enforced via hooks and validation steps.

The lesson is simple: feedback has to be designed. Otherwise it disappears.

Three challenges we did not see coming

Three challenges we did not see coming

The setup sounds logical in hindsight. But in practice we ran into three structural problems.

The self-learning system does not work by itself

The idea was simple: capture corrections as memory, and the system learns.

In practice that does not happen so automatically.

An example: we had recorded that we did not want to use em-dashes. With PDF generation that worked. In LinkedIn posts the em-dash still appeared. The rule existed, but was not enforced in every context.

Another example: customer figures were spread across multiple documents. One document was corrected, but the old figures remained in other places. The system therefore had no central awareness that the truth had changed.

The lesson: "remembering" is not enough. A rule has to be enforced where it matters.

For that there are roughly three options: build it into the agent prompt, enforce it via a hook, or accept that it does not happen reliably.

Since then, with every correction we explicitly ask the question: where should this rule apply?

Only in quotes?

Also in LinkedIn posts?

Only with PDF generation?

Or everywhere?

Without that choice, memory becomes a messy bin of good intentions.

Presentations turned out to be treacherous

Making presentations with AI looks easy. Until you seriously try to automate it.

We ran into all kinds of small errors that become big in practice. A font that was not installed. A text layer that disappeared underneath a shape in the PDF export. The wrong logo on a dark background. A slide that looked fine in the preview, but was off in the export.

These are not spectacular AI errors. But that is exactly why they are dangerous. They look small, until you send a presentation to a customer and the cover text is not legible.

The lesson: a presentation is not one task, but a chain of steps.

Structure, copy, design, fonts, layers, export, contrast, logos, brand identity, and validation all have to be right.

That is why validation is not an extra step at the end, but part of the system.

An AI that builds a presentation should not just generate slides. It should also check whether the output is correct.

Skills cost more time than you think

We have now built or tested dozens of skills. Some work well. Others have been rewritten. And a few turned out, in hindsight, not to be worth the effort.

Our own website was originally estimated at twelve hours. In the end it cost forty hours. Not because AI could not do anything, but because making it stable, testing it, correcting it, and restructuring it costs a lot of time.

That is a recurring pattern.

The question is not: can we build this as a skill?

Usually the answer is yes.

The better question is: is this skill worth the time, how many iterations does it need, and is there already a native solution that is faster or more reliable?

That is an important difference. AI makes building easier, but does not automatically make everything worth building.

Some automations deserve a robust skill. Others are better off remaining a prompt, checklist, or standard process.

What this means for SMEs

What this means for SMEs

The lesson for SMEs is not that every business should build its own AI harness from scratch. That is usually not realistic and also not necessary.

The lesson is that every serious AI project has to go through the same four questions.

Which context does the AI need?

Which tools may the AI use?

How are tasks divided and controlled?

How do corrections flow back?

If those questions are not answered, you do not have an AI implementation. You have a standalone prompt, a chatbot, or a demo.

That can be useful, but it is not yet a system that structurally takes work off your hands.

For SME businesses, three lessons are directly applicable.

One: start internally.

Pick a process you understand well yourself and use that as your first test. You learn faster from that than from an external customer process.

Two: plan generously.

AI projects often cost two to three times more iteration than your first estimate. Not because the technology cannot do it, but because reliability takes time.

Three: build in feedback earlier than extra functionality.

A small system that learns well from corrections is more valuable than a large system that keeps repeating the same mistakes.

How to start practically

Do not start with the question of which model you should choose. Start with the process.

Pick one process with a lot of repetition, where enough information is available, and where human control logically remains.

Think of project reports, meeting notes, quote preparation, internal knowledge questions, or document analysis.

Then walk through four questions.

Context.

Which information does the AI need to deliver good work? Is that information in one place, or does an employee have to deliver it again every time?

Tools.

What may the AI do besides giving answers? May it read documents, create tasks, draft emails, or update systems?

Orchestration.

Is this one broad agent, or multiple small steps with clear control points?

Feedback.

What happens when someone corrects the output? Is that correction captured, or does it disappear into the inbox?

If three of the four questions do not have a clear answer, you know where your first work lies.

Competitive advantage is not in model choice. Not in yet another tool. It is in how well your organization learns from what goes wrong.

Frequently Asked Questions

How does this article connect to the previous article on AI harness?

The previous article explained what an AI harness is and why it is becoming more important than the choice of model. This article shows how we apply that principle at And AI in our own workflows.

Why do you treat yourselves as the first customer?

Because you only really discover where AI scrapes when you use it daily in your own work. As a result we do not build theoretical systems, but solutions that have to hold up in practice.

What was the biggest lesson from your own harness?

That an AI system does not learn by itself. Corrections must be actively captured, linked to the right process, and enforced at the right moment.

Does every SME business have to build such an extensive harness itself?

No. For most SME businesses that is not necessary. But every serious AI project has to think about context, tools, orchestration, and feedback.

What can an SME business take from this?

Start with one concrete process, limit the scope, give the AI only the context and tools it needs, and from day one build in a way to let corrections flow back.

Where do you start practically?

Pick a recurring process with a lot of manual work and where human control logically remains. Think of project reports, meeting notes, quote preparation, or internal knowledge questions.