Don’t worry if your playbook doesn’t perform perfectly the first time you run it - that’s part of the process. Your job at this stage is to see how well the AI understands your rules in practice. Run the playbook against a sample agreement, review the results, and note where the AI flags issues correctly versus where it misses or over-flags. We will then make small adjustments to the rule language to sharpen accuracy. Usually, after one or two test cycles, your playbook will be performing at a high level of precision and ready for wider use.

Step 1: Run the playbook against a sample agreement

  • Choose one or more representative agreements for the playbook type (e.g., for an NDA playbook, test both a mutual and a unilateral NDA).
  • Upload the document into the AI Review tool and select the playbook you want to test.
  • Run the review to see how the AI applies your rules in real time.

Tip: Use clean, well-formatted agreements first. Formatting inconsistencies or OCR errors can distort results and make troubleshooting harder.

Step 2: Check whether the AI correctly identifies compliant/non-compliant language

  • Review each clause the AI has analyzed and confirm whether its classification is correct:

    • Compliant: Clause satisfies the rule as written.
    • ⚠️ Non-compliant: Clause triggers the rule and needs a redline.
    • Missed: Clause should have been flagged but wasn’t.

  • Instead of using a results sheet, it’s often easier to mark up the same agreement you’re testing. Add in-line comments where the AI has over- or under-flagged and note briefly what should have happened.
  • Focus your comments on patterns rather than one-off errors—this will help you refine the underlying rule more efficiently in the next iteration.

Goal: to confirm that the AI is interpreting your rules as intended and identifying compliance accurately in context.

Step 3: Review flagged issues—are they helpful or too broad?

  • Evaluate each flagged issue for precision and usefulness:
    • Does the flag pinpoint the exact compliance gap?
    • Or does it produce a vague comment that adds noise rather than clarity?

  • Identify patterns:
    • Too broad: The AI is triggering the rule in irrelevant contexts → tighten the conditions or rephrase the rule more specifically.
    • Too narrow: The AI is missing relevant cases → expand the trigger terms or simplify the instruction.

  • Ask yourself: Would this feedback help a human reviewer make a faster or better decision?

Step 4: Adjust prompts if the AI is over- or under-flagging

Once you’ve reviewed the results, you can either:

  • Make adjustments yourself directly within the playbook, or
  • Share your marked-up agreement or notes with our team, and we’ll refine the rules for you.

If you choose to edit the rules directly:

  • For over-flagging, narrow the scope by:
    • Removing broad qualifiers like “any” or “all.”
    • Adding context to limit when the rule applies (e.g., “only if missing from the confidentiality clause”).

  • For under-flagging, increase precision by:
    • Including examples of target language or known risk triggers.
    • Rewording abstract rules into concrete instructions (e.g., “Ensure the clause caps liability at 12 months’ fees”).

Re-test after each adjustment or round of feedback. Most playbooks reach strong, reliable accuracy after one or two short iterations.

Optional Step 5: Benchmark against additional agreements

Once initial results look solid, test across 3–5 contracts from different sources or counterparties. This reveals whether your rules generalize across drafting styles and templates.

Objective: a playbook that performs consistently and accurately—regardless of author, format, or jurisdiction.

Wrapping Up

Once your playbook has been tested and refined, it’s ready for real-world use. You’ve now built a reliable framework the AI can apply consistently across agreements—reducing manual review time and ensuring key risks are never missed. Remember, your playbook isn’t static; as your contracting positions evolve, you can revisit and update rules at any time. But for now, you should have a calibrated, high-performing playbook ready to roll out across your contracts with confidence.