AWS re:Invent 2019

ANT308-R1: Deep dive into running Apache Spark on Amazon EMR


Listen Later

Amazon EMR enables customers to run ETL, machine learning, real-time processing, data science, and low-latency SQL at petabyte scale. We focus this session on running Apache Spark on Amazon EMR. We introduce design patterns such as using Amazon S3 instead of HDFS, running long- and short-lived clusters, using notebooks, and performance-related enhancements. We discuss lowering cost with auto scaling and Spot Instances, and security with encryption and fine-grained access control with AWS Lake Formation.
...more
View all episodesView all episodes
Download on the App Store

AWS re:Invent 2019By AWS

  • 4.3
  • 4.3
  • 4.3
  • 4.3
  • 4.3

4.3

6 ratings


More shows like AWS re:Invent 2019

View all
AWS Podcast by Amazon Web Services

AWS Podcast

200 Listeners