Deep Papers

CUGA Agent: From Benchmarks to Business Impact of IBM's Generalist Agent


Listen Later

We dive into the latest paper from a team of researchers at IBM: "From Benchmarks to Business Impact: Deploying IBM Generalist Agent in Enterprise Production." We're excited to host several of the paper's authors, who walk us through the research and its implications. The paper reports IBM’s experience developing and piloting the Computer Using Generalist Agent (CUGA), which has been open-sourced for the community. CUGA adopts a hierarchical planner–executor architecture with strong analytical foundations, achieving state-of-the-art performance on AppWorld and WebArena. Beyond benchmarks, it was evaluated in a pilot within the Business-Process-Outsourcing talent acquisition domain, addressing enterprise requirements for scalability, auditability, safety, and governance. 

CUGA code: https://github.com/cuga-project/cuga-agent 

Paper: https://arxiv.org/abs/2510.23856

Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.

...more
View all episodesView all episodes
Download on the App Store

Deep PapersBy Arize AI

  • 5
  • 5
  • 5
  • 5
  • 5

5

15 ratings


More shows like Deep Papers

View all
Freakonomics Radio by Freakonomics Radio + Stitcher

Freakonomics Radio

32,238 Listeners

Profile by BBC Radio 4

Profile

109 Listeners

The Quanta Podcast by Quanta Magazine

The Quanta Podcast

548 Listeners

Into the Impossible With Brian Keating by Big Bang Productions Inc.

Into the Impossible With Brian Keating

1,064 Listeners

The Daily by The New York Times

The Daily

113,520 Listeners

Y Combinator Startup Podcast by Y Combinator

Y Combinator Startup Podcast

235 Listeners

Physics World Weekly Podcast by Physics World

Physics World Weekly Podcast

82 Listeners

The Journal. by The Wall Street Journal & Spotify Studios

The Journal.

6,113 Listeners

Google DeepMind: The Podcast by Hannah Fry

Google DeepMind: The Podcast

204 Listeners

Americast by BBC News

Americast

780 Listeners

All-In with Chamath, Jason, Sacks & Friedberg by All-In Podcast, LLC

All-In with Chamath, Jason, Sacks & Friedberg

10,218 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

101 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

561 Listeners

Hard Fork by The New York Times

Hard Fork

5,594 Listeners

Latent Space: The AI Engineer Podcast by Latent.Space

Latent Space: The AI Engineer Podcast

102 Listeners