The New Stack Podcast

Keeping GPUs Ticking Like Clockwork


Listen Later

Clockwork began with a narrow goal—keeping clocks synchronized across servers—but soon realized that its precise latency measurements could reveal deeper data center networking issues. This insight led the company to build a hardware-agnostic monitoring and remediation platform capable of automatically routing around faults. Today, Clockwork’s technology is especially valuable for large GPU clusters used in training LLMs, where communication efficiency and reliability are critical. CEO Suresh Vasudevan explains that AI workloads are among the most demanding distributed applications ever, and Clockwork provides building blocks that improve visibility, performance and fault tolerance. Its flagship feature, FleetIQ, can reroute traffic around failing switches, preventing costly interruptions that might otherwise force teams to restart training from hours-old checkpoints. Although the company originated from Stanford research focused on clock synchronization for financial institutions, the team eventually recognized that packet-timing data could underpin powerful network telemetry and dynamic traffic control. By integrating with NVIDIA NCCL, TCP and RDMA libraries, Clockwork can not only measure congestion but also actively manage GPU communication to enhance both uptime and training efficiency. 

Learn more from The New Stack about the latest in Clockwork: 

Clockwork’s FleetIQ Aims To Fix AI’s Costly Network Bottleneck 

What Happens When 116 Makers Reimagine the Clock? 

Join our community of newsletter subscribers to stay on top of the news and at the top of your game. 

 


Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

...more
View all episodesView all episodes
Download on the App Store

The New Stack PodcastBy The New Stack

  • 4.3
  • 4.3
  • 4.3
  • 4.3
  • 4.3

4.3

31 ratings


More shows like The New Stack Podcast

View all
The New Stack Analysts by The New Stack

The New Stack Analysts

9 Listeners

The New Stack @ Scale by The New Stack

The New Stack @ Scale

3 Listeners

The Changelog: Software Development, Open Source by Changelog Media

The Changelog: Software Development, Open Source

290 Listeners

a16z Podcast by Andreessen Horowitz

a16z Podcast

1,094 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

627 Listeners

Thoughtworks Technology Podcast by Thoughtworks

Thoughtworks Technology Podcast

42 Listeners

The New Stack Context by The New Stack

The New Stack Context

4 Listeners

Y Combinator Startup Podcast by Y Combinator

Y Combinator Startup Podcast

230 Listeners

Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

Syntax - Tasty Web Development Treats

984 Listeners

CoRecursive: Coding Stories by Adam Gordon Bell - Software Developer

CoRecursive: Coding Stories

189 Listeners

Practical AI by Practical AI LLC

Practical AI

211 Listeners

AWS Podcast by Amazon Web Services

AWS Podcast

202 Listeners

The Stack Overflow Podcast by The Stack Overflow Podcast

The Stack Overflow Podcast

63 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

500 Listeners

Big Technology Podcast by Alex Kantrowitz

Big Technology Podcast

480 Listeners

AI and I by Dan Shipper

AI and I

37 Listeners

BG2Pod with Brad Gerstner and Bill Gurley by BG2Pod

BG2Pod with Brad Gerstner and Bill Gurley

512 Listeners

AI + a16z by a16z

AI + a16z

34 Listeners