NVIDIA AI Podcast

Snap’s Secret to Processing 10 Petabytes a Day: GPU-Accelerated Spark | NVIDIA AI Podcast Ep. 298


Listen Later

Snap processes more than 10 petabytes of experimentation data every single morning—and with NVIDIA GPU-accelerated Apache Spark on Google Cloud, Snap cut job costs by 76%, reduced memory usage by 80%, and eliminated 120 terabytes of disk spill from its pipelines.


Prudhvi Vatala, head of engineering platforms at Snap, joins the NVIDIA AI Podcast to break down how he and his team completely modernized data infrastructure for a social platform serving nearly a billion monthly active users—using NVIDIA cuDF plugin (formerly referred to as NVIDIA RAPIDS plugin) for Apache Spark on Google Kubernetes Engine, with zero application code changes.


🔬Topics covered:

How Snap runs A/B tests at planetary scale using rigorous statistical methods like heterogeneous treatment effect detection and variance reduction


Why Snap reuses idle inference GPUs between 1–5 a.m. for batch data processing—and how it built a Kubernetes-based platform to do it


How NVIDIA cuDF delivered 3x+ speedups on join-heavy Spark jobs with no code rewrites


The full business impact: 76% cost reduction, 62% fewer cores, 80% less memory, 120 TB of spill eliminated


How a three-way partnership between Snap, NVIDIA, and Google Cloud made it possible in just 8–9 months


Chapters:

0:00 Introduction and Snap overview

3:35 What is Snap’s experimentation platform?

4:05 Why experimentation, safety, and privacy are core at Snap

4:52 How A/B testing works at billion-user scale

8:14 Discovering NVIDIA cuDF plugin

9:06 Benchmarking results: join, union, and aggregation jobs

12:00 Reusing idle GPUs overnight via GKE

13:24 Building a bottom-up GPU data platform at Snap

17:48 Results: 76% cost reduction and partnership impact

20:56 Snap’s evolution and what’s next


Learn more:

NVIDIA cuDF: https://developer.nvidia.com/topics/ai/data-science/cuda-x-data-science-libraries/cudf#accel-apache

...more
View all episodesView all episodes
Download on the App Store

NVIDIA AI PodcastBy NVIDIA

  • 4.4
  • 4.4
  • 4.4
  • 4.4
  • 4.4

4.4

335 ratings


More shows like NVIDIA AI Podcast

View all
The a16z Show by Andreessen Horowitz

The a16z Show

1,107 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

302 Listeners

Y Combinator Startup Podcast by Y Combinator

Y Combinator Startup Podcast

227 Listeners

Practical AI by Practical AI LLC

Practical AI

214 Listeners

Google DeepMind: The Podcast by Hannah Fry

Google DeepMind: The Podcast

197 Listeners

Big Technology Podcast by Alex Kantrowitz

Big Technology Podcast

510 Listeners

The Artificial Intelligence Show by Paul Roetzer and Mike Kaput

The Artificial Intelligence Show

215 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

146 Listeners

Latent Space: The AI Engineer Podcast by Latent.Space

Latent Space: The AI Engineer Podcast

101 Listeners

This Day in AI Podcast by Michael Sharkey, Chris Sharkey

This Day in AI Podcast

224 Listeners

The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief: Artificial Intelligence News and Analysis

691 Listeners

Everyday AI Podcast – An AI and ChatGPT Podcast by Everyday AI

Everyday AI Podcast – An AI and ChatGPT Podcast

112 Listeners

A Beginner's Guide to AI by Dietmar Fischer

A Beginner's Guide to AI

55 Listeners

The Next Wave - AI and The Future of Technology by Mindstream (Hubspot Media)

The Next Wave - AI and The Future of Technology

54 Listeners

AI + a16z by a16z

AI + a16z

33 Listeners