SlatorPod

#270 AI Translation State of the Art with Tom Kocmi and Alon Lavie


Listen Later

Tom Kocmi, Researcher at Cohere, and Alon Lavie, Distinguished Career Professor at Carnegie Mellon University, join Florian and Slator language AI Research Analyst, Maria Stasimioti, on SlatorPod to talk about the state-of-the-art in AI translation and what the latest WMT25 results reveal about progress and remaining challenges.

Tom outlines how the WMT conference has become a crucial annual benchmark for assessing AI translation quality and ensuring systems are tested on fresh, demanding datasets. He notes that systems now face literary text, social-media language, ASR-noisy speech transcripts, and data selected through a difficulty-sampling algorithm. He stresses that these harder inputs expose far more system weaknesses than in previous years.

He adds that human translators also struggle as they face fatigue, time pressure, and constraints such as not being allowed to post-edit. He emphasizes that human parity claims are unreliable and highlights the need for improved human evaluation design.

Alon underscores that harder test data also challenges evaluators. He explains that segment-level scoring is now more difficult, and even human evaluators miss different subsets of errors. He highlights that automated metrics built on earlier-era training data underperformed, particularly COMET, because they absorbed their own biases.

He reports that the strongest performers in the evaluation task were reasoning-capable large language models (LLMs), either lightly prompted or submitted with elaborate evaluation-specific prompting. He notes that while these LLM-as-judge setups outperformed traditional neural metrics overall, their segment-level performance varied.

Tom points out that the translation task also revealed notable progress from smaller academic models around 9B parameters, some ranking near trillion-parameter frontier models. He sees this as a sign that competitive research is still widely accessible.

The duo concludes that they must carefully choose evaluation methods, avoid assessing models with the same metric used during training, and adopt LLM-based judging for more reliable assessments.

...more
View all episodesView all episodes
Download on the App Store

SlatorPodBy Slator

  • 4.3
  • 4.3
  • 4.3
  • 4.3
  • 4.3

4.3

6 ratings


More shows like SlatorPod

View all
This American Life by This American Life

This American Life

91,097 Listeners

Economist Podcasts by The Economist

Economist Podcasts

4,184 Listeners

Pod Save America by Crooked Media

Pod Save America

87,797 Listeners

The Daily by The New York Times

The Daily

112,891 Listeners

Up First from NPR by NPR

Up First from NPR

56,673 Listeners

Conan O’Brien Needs A Friend by Team Coco & Earwolf

Conan O’Brien Needs A Friend

59,244 Listeners

Fall of Civilizations Podcast by Fall of Civilizations Podcast

Fall of Civilizations Podcast

5,187 Listeners

Sh**ged Married Annoyed by Chris & Rosie Ramsey

Sh**ged Married Annoyed

1,197 Listeners

Americast by BBC News

Americast

754 Listeners

All-In with Chamath, Jason, Sacks & Friedberg by All-In Podcast, LLC

All-In with Chamath, Jason, Sacks & Friedberg

9,888 Listeners

The Sales Management. Simplified. Podcast with Mike Weinberg by Mike Weinberg

The Sales Management. Simplified. Podcast with Mike Weinberg

259 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,068 Listeners

Localization Today by MultiLingual Media

Localization Today

0 Listeners