Apache Spark is a system for processing large data sets in parallel. The core abstraction of Spark is the resilient distributed dataset (RDD), a working set of data that sits in memory for fast, iterative processing. Matei Zaharia created Spark with two goals: to provide a composable, high-level set of APIs for performing distributed processing;

Spark and Streaming with Matei Zaharia

Databases and data engineering episodes of Software Engineering Daily

Share Spark and Streaming with Matei Zaharia

Sign up to save your podcasts

Spark and Streaming with Matei Zaharia

Spark and Streaming with Matei Zaharia

More shows like Data Archives - Software Engineering Daily

TechLab by Bol