Technically Legal - A Legal Technology and Innovation Podcast

Benchmarking Legal AI: Measuring the Delta Between Man and Machine (Anna Guo Legalbenchmarks.ai)


Listen Later

Is artificial intelligence custom-made for legal tasks better than general AI tools like Google Gemini and ChatGPT? That is the topic of this episode featuring Legalbenchmarks.ai Founder Anna Guo. Anna is a former BigLaw lawyer who left the practice to become an entrepreneur and now focuses her energies on quantifying the utility of AI in the legal industry. Anna's initial anecdotal research for colleagues quickly revealed a strong community interest in a systematic approach to evaluating legal AI tools. This led to the creation of Legalbenchmarks.AI, dedicated to finding out where the promise of humans plus AI is truly better than humans alone or AI alone.

The core of the research involves measuring the "delta," or the extent to which AI can elevate human performance. To date, Legalbenchmarks.ai conducted two major studies: one on information extraction from legal sources and a second on contract review and redlining.

Key Findings from the Studies:
  • Accuracy vs. Qualitative Usefulness: The highest-performing general-purpose AI tools (like Gemini) were often found to be more accurate and consistent. However, the legal-specific AI tools often received higher marks in qualitative usefulness and helpfulness, as they align more closely with existing legal workflows.

  • Methodology: The testing goes beyond simple accuracy. It includes a three-part assessment: Reliability (objective accuracy and legal adequacy), Usability (qualitative metrics like helpfulness and coherence for tasks such as brainstorming), and Platform Workflow Support (integration, citation checks, and other features).

  • Human-AI Performance: In the contract analysis study, AI tools matched or exceeded the human baseline for reliability in producing first drafts. Crucially, the data demonstrated that the common belief that "human plus AI will always outperform AI alone" was false; the top-performing AI tool alone still had a higher accuracy rate than the human-plus-AI combo.

  • Risk Analysis: A significant finding was that legal AI tools were better at flagging material risks, such as compliance or unenforceability issues in high-risk scenarios, that human lawyers missed entirely. This suggests AI can act as a crucial safety net.

  • Strengths Comparison: AI excels at brainstorming, challenging human bias, and performing mass-scale routine tasks (e.g., mass contract review for simple terms). Humans retain a significant edge in ingesting nuanced context and making commercially reasonable decisions that AI's instruction-following can sometimes lack.

Discussion Highlights:
  • [0:00] – Introduction and background of Anna Guo and Legal Benchmarks AI.

  • [4:30] – The impetus for starting systematic AI benchmarking.

  • [6:00] – Explaining the concept of measuring the "delta" in performance.

  • [9:00] – Detailed breakdown of the three-part AI assessment methodology.

  • [15:00] – Discussion of the contrasting results: general LLM accuracy vs. legal AI qualitative value.

  • [19:00] – Results on AI performance matching human reliability in contract drafting.

  • [21:00] – Debunking the myth about Human + AI always outperforming AI alone.

  • [23:00] – The finding that legal AI excels at surface material risks that lawyers miss.

  • [27:00] – A SWOT analysis of when to use humans and when to use AI.

  • [30:00] – Future roadmap for Legal Benchmarks AI research.

...more
View all episodesView all episodes
Download on the App Store

Technically Legal - A Legal Technology and Innovation PodcastBy Percipient - Chad Main

  • 4.8
  • 4.8
  • 4.8
  • 4.8
  • 4.8

4.8

25 ratings


More shows like Technically Legal - A Legal Technology and Innovation Podcast

View all
Hidden Brain by Hidden Brain, Shankar Vedantam

Hidden Brain

43,623 Listeners

Bloomberg Law by Bloomberg

Bloomberg Law

374 Listeners

The Knowledge Project by Shane Parrish

The Knowledge Project

2,673 Listeners

The Daily by The New York Times

The Daily

112,394 Listeners

The Indicator from Planet Money by NPR

The Indicator from Planet Money

9,517 Listeners

The Dr. Hyman Show by Dr. Mark Hyman

The Dr. Hyman Show

9,258 Listeners

The Geek In Review by Greg Lambert & Marlene Gebauer

The Geek In Review

25 Listeners

LawNext by Populus Radio, Robert Ambrogi

LawNext

37 Listeners

FT News Briefing by Financial Times

FT News Briefing

672 Listeners

All-In with Chamath, Jason, Sacks  Friedberg by All-In Podcast, LLC

All-In with Chamath, Jason, Sacks Friedberg

9,835 Listeners

Huberman Lab by Scicomm Media

Huberman Lab

29,199 Listeners

ZOE Science & Nutrition by ZOE

ZOE Science & Nutrition

2,064 Listeners

The Mel Robbins Podcast by Mel Robbins

The Mel Robbins Podcast

20,420 Listeners

The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief: Artificial Intelligence News and Analysis

559 Listeners

BG2Pod with Brad Gerstner and Bill Gurley by BG2Pod

BG2Pod with Brad Gerstner and Bill Gurley

497 Listeners