Testing Your Playbook

Chapter 8

8 min read

A Playbook is not finished when you save it. The rules you've written are your best guess at how the AI should behave. Testing tells you whether that guess was right. Most Playbooks need one or two rounds of adjustment before they perform reliably. That's expected. The goal of testing is to get there quickly.

How to test

The simplest approach is to run your Playbook against a contract you already know well. A document you've reviewed manually before is ideal, because you already have a view on what the issues are and what a good output should look like. That gives you something to compare the Playbook output against.

Run the Playbook. Go through every result. For each one, ask yourself three questions:

Is this correct? Did the AI identify a genuine issue?
Is this useful? Is the proposed redline or suggestion something you'd actually act on?
Is anything missing? Are there issues you'd expect to see flagged that didn't appear?

Note the answers. They tell you exactly which rules need adjusting.

What to look for

Rules that fire when they shouldn't. If a rule is flagging clauses that are actually fine, the instruction is probably too broad. It's catching things it wasn't meant to catch. Tighten the language or add more specific conditions.

Rules that don't fire when they should. If a clause clearly has a problem but the Playbook missed it, the rule either isn't written precisely enough or isn't capturing the right principle. Rewrite the instruction with more specific language describing what to look for.

Redlines that are directionally right but poorly drafted. The AI has identified the correct issue but the proposed language isn't quite what you'd write. This is a refinement problem rather than a rule problem. Use the Retry with Additional Instructions option to steer the output, and if you find yourself making the same correction repeatedly, update the underlying rule.

Rules that fire on every contract regardless of context. If a rule produces a redline on every single agreement you run it against, it may be so broad that it isn't useful. Consider whether it's targeting the right thing or whether it needs to be narrowed.

How many contracts to test against

Start with one contract you know well. Fix the obvious problems. Then run it against two or three more contracts of the same type, ideally from different counterparties so you're testing across different drafting styles.

A rule that works on one contract but fails on another is usually written in a way that's too specific to that first document's structure. The fix is to make the instruction more principle-based so it recognises the concept regardless of how it's drafted.

By the time your Playbook has run cleanly against three or four different contracts, it is ready for wider use.

Making adjustments

Go back into your Playbook rules and edit the instructions that aren't performing well. Keep the edits small and targeted. Change one thing at a time, then re-run to see whether it made the difference. Changing multiple things at once makes it hard to know what fixed the problem.

The most common adjustments are:

Tightening a rule that's firing too broadly. Add more specific conditions to the instruction. Instead of "ensure liability is capped," try "ensure liability is capped at no more than 12 months' fees. If the cap is higher or unlimited, redline to 12 months."

Broadening a rule that's missing things. Remove overly specific references to clause numbers or exact phrasing. Describe the principle instead of the location. The AI needs to recognize the concept wherever it appears in the document, not just in the place you first saw it.

Splitting a rule that's doing too much. If one rule is checking two things and getting confused, split it into two separate rules, one for each check. Rules that contain more than one instruction often produce inconsistent output.

💡 Use the Improve Instruction button before rewriting from scratch. If a rule isn't performing well, paste your current instruction into the rule editor and hit Improve Instruction. The AI will rewrite it into a cleaner, more precise format. That is often faster than rewriting it yourself, and you can edit the result from there.

When your Playbook is ready

A Playbook is ready for wider use when it consistently does three things: catches the issues it's supposed to catch, doesn't flag things it shouldn't, and produces redlines you'd be comfortable sending to a counterparty with only minor edits.

That standard doesn't require perfection. Every Playbook will occasionally produce a suggestion that doesn't apply to a particular deal. The test is whether the output is reliable enough that it saves more time than it creates. For most Playbooks, that threshold is reached after one or two rounds of testing and adjustment.

Once you're there, share it with your team and start using it on live contracts. The feedback you get from real use will surface the remaining rough edges faster than any amount of test runs.

Next chapter → Playbook Governance

‍

No items found.