Article

Harness-engineering the Legal Brain

30 June 2026

Back

Home

Insights & News

Insights & News Details

Foreward

If your junior associate says to you, “We discussed a lot already, so just confirm what you want”, how would you react?

That was exactly how my personal AI agent responded to my requests after a brief dialogue:

Yes, we had gone back and forth on the workflow design a few times, but, hey, was it getting impatient with me?

Even if he[i] did show his impatience or even some borderline impertinence, I only have myself to blame. I did configure my AI agent as an assistant who “is helpful and direct; has opinions; and would disagree when needed” – serves me right!

Hermes Agent – the Self-Improving AI Agent

One of my previous articles has shared the experience of working with OpenClaw – a personal AI agent system aka “lobster farming”[ii]. The “impatient” agent above came from Hermes Agent – a recent personal AI agent system built by Nous Research[iii]. Hermes Agent is branded as “The Agent that Grows with You”. The AI agent created under the Hermes Agent system has better memory configuration and is able to learn from the mistakes and lessons when it interacts with the user. Stored in the long-term memory files, over time the agent can assist you better given its lessons learned and saved.

It is this self-improving feature that, according to the tech commentators[iv], sets Hermes Agent apart from OpenClaw. Even if a user runs both his Hermes Agent and OpenClaw on the same large-language-model (LLM) (e.g., both use GPT-5.5), the user would still likely conclude that Hermes Agent shows better ability to self-improve.

Speaking from firsthand user experience with both systems, I can see certain unique engineering improvements in Hermes Agent. True to the slogan, the agent does save, retrieve and apply memory knowledge more proactively.[v] I also find Hermes Agent very stable to run and I have not experienced any major flaws such as getting stuck in some unknown loops and burning thousands of tokens[vi].

Prompt Engineering – Context Engineering – Harness Engineering[vii]

The design and architecture that wrap around an LLM to turn it into a usable AI agent are known as harness engineering. An excellent concise definition I have seen is as follows:

Harness engineering is the discipline of building the environment that surrounds an AI model, not the model itself. The model reasons and decides. The harness executes, constrains, and connects. A well-designed harness gives the model precisely the tools it needs, nothing more, and governs exactly what it is allowed to do with them.[viii]

The word “harness” already figuratively describes what harness engineering really is. The LLM is like a powerful horse that can sprint at top speed for miles without breaking a sweat. However, without a harness, the horse would remain just a wild horse, not a racing steed or a cavalry horse. The harness is a tool for human to exert controls over something powerful, so that the thing can become a tool for use.

The LLM is like a very smart brain. When you ask it questions in a chat window (in tech terms, you are doing prompt engineering), it would give you intelligent replies and answers. However, hallucination is known to be a common problem for LLM. The LLM may sometimes give you answers that are seemingly correct in high confidence, but often the answers may turn out to be baseless.

If you feed some documents to the LLM first and query the LLM based on those documents, you are providing some solid contexts to the LLM to reply on (in tech terms, you are doing context engineering). After digesting those documents, the LLM is less likely to hallucinate when it replies to your questions, assuming you would also prompt it to refer to those documents when answering.

Harness engineering takes it to the next level. For complex AI agent systems such as Hermes Agent, software engineers need to build the “body and flesh” for the smart brain so that the whole system can function as a consistent, predictable, and useful assistant. Like a true robot, the agent that can receive instructions, ingest and interpret data, make plans by reasoning through steps, and perform tasks. The agent’s outputs are no longer limited to textual response, but can be in the form of various artifacts such as text files, spreadsheets, slide decks, images, or videos.

Harness-engineering a Lawyer

The path to qualify as a lawyer in Hong Kong resembles a harness engineering process. The first step is a law degree, which provides the foundational training to the legal brain. The vocational training in P.C.LL. equips the law graduate with the practical skills required to become a practicing lawyer. After that, the training or pupillage (for a solicitor and a barrister respectively) provides yet another level of practical training.

Further now, some concepts in harness engineering can indeed shed lights on how a lawyer can be sharpened as a useful tool:

1. The Agentic Loop (aka the ReAct Loop: Reason-Act-Observe)
Harness Engineering Concept	Legal Translation
Agentic loop is the foundation framework for building AI agents: i. The agent reasons about a problem; ii. The harness acts by using a tool; and iii. The harness observes the real-world feedback and updates the agent in the next turn.	Traditional lawyers often work linearly: they receive instructions, disappear into an office, execute a massive block of legal work, and present it as a finished product. In a rapidly changing business environment, lawyers must break this silo. Implementing a continuous loop of real-time feedback and iterative adjustments ensures the legal work remains aligned with the client’s shifting commercial realities.
2. Orchestrating Agent & Sub-Agents Architecture
Harness Engineering Concept	Legal Translation
In a complex agentic AI system, the Orchestrator Agent understands the macro-objective, the system constraints, and the ultimate user intent. The Orchestrator does not do the grunt work. Instead, it spawns Worker Sub-Agents to execute specific, isolated tasks (e.g., researching a specific topic, writing specific code). Sub-agents possess zero global context. Unless the Orchestrator passes them the relevant parameters (in tech term: Context Hydration), the sub-agents will hallucinate or return useless data. Finally, when sub-agents return conflicting outputs, the Orchestrator runs a Synthesis and Resolution Loop to reconcile the data into a single, cohesive response.	This mirrors the dynamics between a law firm Partner (the Orchestrator) and their Associates (the Sub-Agents). The Partner holds the global context — the client’s commercial drivers, the judge’s temperament, and the overall macro-strategy. Because the Associates lack this bird’s-eye view, the team efficiency depends entirely on the Partner’s ability to “hydrate” them with the precise context. The contexts provided by the Partner is also often beneficial for the Associate’s growth, as the “why” would often dictate the “how” and “what”. Similar to AI harnesses, the Partner or the Senior Associate will need to synthesize the Associates’ work and resolve when there is any conflicting finding.
3. Session Context Isolation
Harness Engineering Concept	Legal Translation
An LLM can generate and check code. But after generating a complex block of code, it creates an internal “path of reasoning.” If you ask that exact same LLM session to find bugs in its own code, it will almost always fail. Why? Because the session is trapped in its own context history. It reads its own logic, assumes its initial premises were correct, and suffers from AI confirmation bias. To fix this, harness engineers use Session Context Isolation. They pipe the generated code into a completely blank, independent “Reviewer AI.” This AI only sees the raw output and evaluates it against an objective rubric. Because it is isolated from the generation session, it spots the logical flaws instantly.	Lawyers suffer from the exact same psychological trap. If a lawyer spends twelve straight hours drafting a complex, 80-page commercial lease, their brain forms a rigid mental map of the document. When they proofread their own work, they do not read what is actually on the page; they read what they think they wrote. They are trapped in their own “session context.” Enforcing an independent peer-review system is not a luxury; it is an essential architecture to break a writer’s confirmation bias. Having a “checker/proofreader” system is essential in legal drafting.
4. Modular Tool-Use
Harness Engineering Concept	Legal Translation
Tool-use is a critical feature that makes agentic AI so scalable. Tools are completely self-contained, modular code snippets with strict, standardized input/output interfaces. The LLM does not need to hardcode the tools in its own brain. For example, if the LLM reasons that it should conduct a web search, it simply calls the readily-coded web search tool. Because these tools are decoupled, developers can upgrade, swap, or add new tools without rewriting the core LLM.	Traditional legal practices are “monolithic.” A lawyer or firm treats every case as an entirely bespoke, integrated universe where the same person tries to handle deep legal research, client relationship management, high-volume document formatting, and commercial negotiation simultaneously. This is highly inefficient. A modular legal practice breaks its collective expertise down into distinct, repeatable ‘toolkits.’ This allows a lean core team to dynamically compose tailored solutions for diverse client budgets and needs. Law firms that structure their services as plug-and-play modules – rather than a singular, opaque art form – will remain nimble and highly scalable.
5. Long-Term Memory Configuration
Harness Engineering Concept	Legal Translation
Agentic AI systems use long-term memory databases to save, retrieve, and apply lessons learned from past interactions. Rather than treating every prompt as a completely isolated event, the agent references historical data to continuously self-improve and prevent repeating its past mistakes.	A law firm must have a robust corporate memory. Many firms already deploy dedicated Professional Support Lawyers (PSLs) and Knowledge Management systems to handle systematic archiving, closing bibles, and precedent sharing. However, the next major breakthrough will likely be the capability to systematically codify the lessons learned from past mistakes (ideally, others’ mistakes) and instantly distill those insights into actionable wisdom for the future.

Concluding Remarks

One advantage of having cross-disciplinary knowledge is that it enables us to see one system of knowledge from the lens of another system. Because the world’s most innovative architectural thinking is currently concentrated in AI engineering, lawyers have a unique opportunity to learn from this discipline.

At CFN Lawyers LLP, we are constantly refining our own operational harness to ensure our structural execution matches our legal intellect, delivering top-notch service that meets our clients’ evolving needs.

[i] The choice of using “he” as pronoun is to denote the agent that I’m using personally. I will use “it” to refer to other AI agents in general. I do not believe my agent has become sentient…yet.

[ii] https://www.cfnlaw.com.hk/the-claude-code-leak-and-ip-considerations/

[iii] https://hermes-agent.nousresearch.com/

[iv] See for example https://composio.dev/content/openclaw-vs-hermes-agent

[v] OpenClaw, of course, excels in many other areas such as tools selection and utilisation. The technical comparison of the two platforms is beyond the scope of this article.

[vi] I may not be doing justice to OpenClaw entirely but it did occur to me once. Approximately 300,000 tokens were burned within two hours because my OpenClaw agent was caught in some execution loop. I had to manually stop it, otherwise the burn would be catastrophic.

[vii] As of time of writing (June 2026), a new term “Loop Engineering” already emerged in the AI circle. That may be a topic for future articles.

[viii] Building Claude Code with Harness Engineering by Fareed Khan, published on 7 Apr 2026: https://medium.com/@fareedkhandev/d2e8c0da85f0?sk=f67a164f042bf73b89077b71e8d76370