The Data Stack Show

166: Data Processing Fundamentals and Building a Unified Execution Engine Featuring Pedro Pedreira of Meta


Listen Later

Highlights from this week’s conversation include:

  • The concept of composable at a lower level of data infrastructure (1:28)
  • New architectures and components that allow developers to build databases (3:44)
  • Pedro's background and experience in data infrastructure (6:18)
  • The Spectrum of Latency and Analytics (12:59)
  • Different Query Engines for Different Use Cases (16:32)
  • Vectorized vs Code Gen Data Processing (19:33)
  • Vectorization and Code Generation (21:21)
  • Examples of Vectorized Engines (24:33)
  • Rewriting Execution Engine in C++ (27:22)
  • Different Organization of Presto and Spark (33:17)
  • Arrow and its Extensions (37:15)
  • The similarities between analytics and ML (44:33)
  • Offline feature engineering and data preprocessing for training (48:00)
  • Dialect and semantic differences in using Velox for different engines (50:01)
  • The convergence of dialects (52:23)
  • Challenges of substrate and semantics (53:18)
  • Future plans for Velox (58:09)
  • The discussion on evolving Parquet (1:03:38)
  • The integration of the relational model and the tensor model (1:07:29)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.


Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

...more
View all episodesView all episodes
Download on the App Store

The Data Stack ShowBy Rudderstack

  • 5
  • 5
  • 5
  • 5
  • 5

5

13 ratings


More shows like The Data Stack Show

View all
Planet Money by NPR

Planet Money

30,681 Listeners

Mind Pump: Raw Fitness Truth by Sal Di Stefano, Adam Schafer, Justin Andrews, Doug Egge

Mind Pump: Raw Fitness Truth

12,061 Listeners

Matt and Shane's Secret Podcast by Matt McCusker & Shane Gillis

Matt and Shane's Secret Podcast

11,424 Listeners

The Daily by The New York Times

The Daily

112,360 Listeners

DataFramed by DataCamp

DataFramed

269 Listeners

The Peter Attia Drive by Peter Attia, MD

The Peter Attia Drive

8,549 Listeners

The Analytics Power Hour by Michael Helbling, Moe Kiss, Tim Wilson, Val Kroll, and Julie Hoyer

The Analytics Power Hour

167 Listeners

The Journal. by The Wall Street Journal & Spotify Studios

The Journal.

6,097 Listeners

Waveform: The MKBHD Podcast by Vox Media Podcast Network

Waveform: The MKBHD Podcast

5,978 Listeners

All-In with Chamath, Jason, Sacks & Friedberg by All-In Podcast, LLC

All-In with Chamath, Jason, Sacks & Friedberg

9,922 Listeners

The Data Chief by ThoughtSpot

The Data Chief

76 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

15,930 Listeners

The Analytics Engineering Podcast by dbt Labs, Inc.

The Analytics Engineering Podcast

30 Listeners

The Joe Reis Show by Joe Reis

The Joe Reis Show

17 Listeners

What's New In Data by Striim

What's New In Data

8 Listeners