The Data Stack Show

166: Data Processing Fundamentals and Building a Unified Execution Engine Featuring Pedro Pedreira of Meta


Listen Later

Highlights from this week’s conversation include:

  • The concept of composable at a lower level of data infrastructure (1:28)
  • New architectures and components that allow developers to build databases (3:44)
  • Pedro's background and experience in data infrastructure (6:18)
  • The Spectrum of Latency and Analytics (12:59)
  • Different Query Engines for Different Use Cases (16:32)
  • Vectorized vs Code Gen Data Processing (19:33)
  • Vectorization and Code Generation (21:21)
  • Examples of Vectorized Engines (24:33)
  • Rewriting Execution Engine in C++ (27:22)
  • Different Organization of Presto and Spark (33:17)
  • Arrow and its Extensions (37:15)
  • The similarities between analytics and ML (44:33)
  • Offline feature engineering and data preprocessing for training (48:00)
  • Dialect and semantic differences in using Velox for different engines (50:01)
  • The convergence of dialects (52:23)
  • Challenges of substrate and semantics (53:18)
  • Future plans for Velox (58:09)
  • The discussion on evolving Parquet (1:03:38)
  • The integration of the relational model and the tensor model (1:07:29)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

...more
View all episodesView all episodes
Download on the App Store

The Data Stack ShowBy Rudderstack

  • 5
  • 5
  • 5
  • 5
  • 5

5

12 ratings


More shows like The Data Stack Show

View all
a16z Podcast by Andreessen Horowitz

a16z Podcast

1,002 Listeners

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch by Harry Stebbings

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

512 Listeners

Talk Python To Me by Michael Kennedy

Talk Python To Me

586 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

629 Listeners

Odd Lots by Bloomberg

Odd Lots

1,776 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

296 Listeners

NVIDIA AI Podcast by NVIDIA

NVIDIA AI Podcast

324 Listeners

Data Engineering Podcast by Tobias Macey

Data Engineering Podcast

140 Listeners

DataFramed by DataCamp

DataFramed

269 Listeners

Practical AI by Practical AI LLC

Practical AI

190 Listeners

Switched On by Bloomberg

Switched On

98 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

352 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

125 Listeners

BG2Pod with Brad Gerstner and Bill Gurley by BG2Pod

BG2Pod with Brad Gerstner and Bill Gurley

434 Listeners

Money Stuff: The Podcast by Bloomberg

Money Stuff: The Podcast

373 Listeners