January 21, 2026

Evals & Benchmarking Legal AI with Anna Guo

59 minutes

We sit down with Anna Guo — a Singapore-based lawyer, startup advisor, and founder of LegalBenchmarks.ai — who has quietly built one of the most rigorous practitioner-driven evaluation frameworks for legal AI tools in the industry. Her community now spans close to 900 legal and AI professionals. Her research has produced findings that challenge industry assumptions: that legal-specific AI tools don't always outperform general-purpose models, that accuracy isn't actually the top driver of lawyer adoption, and that in some drafting tasks, AI is already matching or exceeding human reliability.

This is a watch-don't-only-listen episode. Anna shares her screen throughout — running us through a live, double-blind benchmarking exercise where we rank outputs from legal AI, general-purpose AI, and human lawyers without knowing which is which. She also demonstrates how prompt injection attacks can bypass AI guardrails using techniques as simple as low-resource languages (Vietnamese or ASCII code?), surfacing security risks that become particularly acute as we move closer toward widespread agentic AI adoption.

What You'll Learn:

The Three Dimensions of Tool Evaluation — Why measuring accuracy alone misses the point, and how Anna assesses output reliability, output usefulness, and platform workflow support as distinct layers

What Actually Drives Adoption — Survey data revealing that lawyers prioritise context management and verification over raw accuracy when choosing AI tools

Where Humans Still Win — High-judgment, context-sparse tasks requiring commercial reasoning remain firmly in human territory; routine, context-complete work is where AI excels

Prompt Injection in Practice — Live demonstrations of how attackers can trick AI models into revealing harmful information using low-resource languages and clever framing

---

Connect with Anna: LinkedIn | LegalBenchmarks.ai

---

If you found this episode interesting, please tell us and do share it with a friend, colleague or community who might take something from it! For more, head to lawwhatsnext.substack.com for: (i) Focused conversations with leading practitioners, technologists, and educators; (ii) Deep dives into the intersection of law, technology, and organisational behaviour; and (iii) Practical analysis of how AI is augmenting our potential.

...more

View all episodes

By Tom Rice and Alex Herrity

January 21, 2026

Evals & Benchmarking Legal AI with Anna Guo

59 minutes

What You'll Learn:

What Actually Drives Adoption — Survey data revealing that lawyers prioritise context management and verification over raw accuracy when choosing AI tools

Where Humans Still Win — High-judgment, context-sparse tasks requiring commercial reasoning remain firmly in human territory; routine, context-complete work is where AI excels

Prompt Injection in Practice — Live demonstrations of how attackers can trick AI models into revealing harmful information using low-resource languages and clever framing

---

Connect with Anna: LinkedIn | LegalBenchmarks.ai

---

...more

Share Evals & Benchmarking Legal AI with Anna Guo

Sign up to save your podcasts

Evals & Benchmarking Legal AI with Anna Guo

Evals & Benchmarking Legal AI with Anna Guo