Two Kinds of Agents, Two Kinds of Harness

One of the more useful ideas to emerge in AI engineering this year is harness engineering. Every time an agent makes a mistake, you engineer the fix into the environment around it, so that specific mistake becomes structurally impossible to repeat. The model reasons. The harness makes that reasoning reliable.

This resonated with me because I have been feeling it for a while. The model is rarely the problem. When an agent fails, it is almost always because we did not give it the right context, the right tools, or the right guardrails. The better question is always: what did we fail to give it?

But as I have spent more time thinking about this, I have noticed that most of the conversation around harness engineering assumes a specific kind of agent. An agent that does work over time. It plans, executes, verifies, iterates. It has seconds or minutes between steps. It can pause, check its output, and course-correct.

There is another kind of agent that works very differently. An agent that is in a live conversation with a human being, responding in real time, with no room to pause or retry.

The harness both of these need looks quite different.

Diagram comparing the harness for an agent that does work with the harness for an agent that talks. — Same model. Different harness. Long-running workflow agents can pause and retry; live conversational agents need their harness to run alongside the conversation.

Agents That Do Work

Some agents are designed to execute multi-step workflows. A coding agent that writes, tests, and iterates. A research operations agent that recruits participants, schedules sessions, and compiles findings. A data pipeline agent that ingests, transforms, and validates. These agents operate over minutes or hours, sometimes longer.

The harness here gives the agent a clear set of instructions and lets it execute step by step. Verification gates between steps let it check its own work before moving on. When something goes wrong, you build the guardrail into the environment. A map at the top level, more detailed instructions deeper down, and the agent navigates through them deliberately.

The key advantage is time. Between every step, the agent can pause. It can make multiple sequential calls. It can retrieve additional context. It can run a verification check and retry if the output is not right. If the agent drifts off course, it has room to catch itself or be caught by a downstream gate.

Agents That Talk

Some agents are designed to hold a live conversation. A customer support agent on a voice call. An AI interviewer conducting a research session. A coaching assistant working through a problem with someone in real time. These agents operate turn by turn, with a human waiting on the other side.

The agent has maybe 800 milliseconds to listen to what someone said, place it in the context of everything that has happened in the conversation so far, decide what to say next, and deliver the response in natural speech. There is no "let me check my notes." There is no retry loop. The moment passes.

I have spent a lot of time working on this kind of agent, and a few things have stood out about how the harness has to differ.

Context has to be staged before you need it

A long-running agent can pause mid-task to go look something up. A conversational agent cannot break the flow. Everything the agent might need, the conversation objectives, the participant's background, the cultural context for the language being spoken, has to be in the context window before the conversation begins.

For a long-running agent, you retrieve context on demand as the task unfolds. For a conversational agent, you pre-load and compact the context upfront, making hard decisions about what to include and what to leave out, because the window is finite and the conversation has not happened yet. You are predicting what context will matter, rather than fetching it when you know it matters.

Verification runs alongside, rather than after

A long-running agent can generate output, check it, and retry if needed. A conversational agent delivers its response in real time. There is no separate verification step.

So the checking has to happen in parallel. While the agent is forming its next response, a separate process is tracking whether the conversation is still on track, whether something said five minutes ago contradicts what is being said now, whether the person's engagement level is shifting. These signals feed into the next response but they cannot gate it. The response has to go out. The quality checks inform what comes next, rather than blocking what comes now.

When something goes wrong, you cannot retry. You adjust.

When a coding agent produces bad output, the harness catches it and the agent tries again. The human never sees the mistake. When a conversational agent asks a question that does not land well, that question has already been heard. There is no undo.

The harness has to detect the problem through conversational signals, a shift in vocal tone, a change in engagement, and adjust course in the following turn. The agent has to go somewhere else, naturally, without the person feeling like something went wrong.

State is continuous rather than queried

A long-running agent can re-read its instruction file or query a database when it needs information from an earlier step. It has the time to do that lookup. A conversational agent has to hold a living representation of everything that has happened in the session. What topics have been covered. What emotional moments have occurred. What the person said in minute 8 that might become relevant in minute 52.

This state has to be instantly accessible during response generation, because there is no time to go retrieve it. The session state manager ends up being one of the most critical components of the whole system, and it has to operate at conversation speed.

Same Model, Different Harness

Both types of agent can run on the same underlying model. The reasoning capability is the same. What changes entirely is the infrastructure around it. One gets the luxury of time. The other has to get it right in the moment.

We have seen this play out repeatedly: improvements to the harness produce bigger gains than upgrading the model underneath. That has held true for both types. The model is the engine. The harness is the car. A great engine in a poorly designed car is still a bad experience.

What This Has Been Like for Us

At Echovane, we happen to be building both types of agent. Our research operations agents handle multi-step workflows: recruiting participants, scheduling sessions, monitoring quality, synthesising findings. Our AI interviewer sits in a live conversation with a real person, sometimes for over an hour, across dozens of languages.

Working on both simultaneously has been one of the more exciting parts of building this company. The design instincts you develop for one type do not always transfer to the other. A pattern that works beautifully for a long-running agent actively breaks the experience for a conversational one. And a pattern that works for real-time interaction would be unnecessarily rigid for an agent that has time to think.

It keeps the engineering interesting. And it has given us a lot of respect for how different these two problems really are, even when the underlying model is the same.

Two Kinds of Agents, Two Kinds of Harness

Agents That Do Work

Agents That Talk

Context has to be staged before you need it

Verification runs alongside, rather than after

When something goes wrong, you cannot retry. You adjust.

State is continuous rather than queried

Same Model, Different Harness

What This Has Been Like for Us

Key methods covered

Agents That Do Work

Agents That Talk

Different Harnesses