AGPIAL A Good Person Is Always Learning.

Amazon EMR Migration Guide How to Move Apache Spark and Apache Hadoop From On-Premises to AWS


Listen Later

About this Guide

For many customers, migrating to Amazon EMR raises many questions about assessment, planning, architectural choices, and how to meet the many requirements of moving analytics applications like Apache Spark and Apache Hadoop from on-premises data centers to a new AWS Cloud environment.

Many customers have concerns about the viability of distribution vendors or a purely open-source software approach, and they need practical advice about making a change.

This guide includes the overall steps of migration and provides best practices that we have accumulated to help customers with their migration journey.

Overview

Businesses worldwide are discovering the power of new big data processing and analytics frameworks like Apache Hadoop and Apache Spark, but they are also discovering some of the challenges of operating these technologies in on-premises data lake environments.

Not least, many customers need a safe long-term choice of platform as the big data industry is rapidly changing and some vendors are now struggling.

Common problems include a lack of agility, excessive costs, and administrative headaches, as IT organizations wrestle with the effort of provisioning resources, handling uneven workloads at large scale, and keeping up with the pace of rapidly changing, community-driven, open-source software innovation.

Many big data initiatives suffer from the delay and burden of evaluating, selecting, purchasing, receiving, deploying, integrating, provisioning, patching, maintaining, upgrading, and supporting the underlying hardware and software infrastructure.

A subtler, if equally critical, problem is the way companies’ data center deployments of Apache Hadoop and Apache Spark directly tie together the compute and storage resources in the same servers, creating an inflexible model where they must scale in lock step.

This means that almost any on-premises environment pays for high amounts of under-used disk capacity, processing power, or system memory, as each workload has different requirements for these components.

How can smart businesses find success with their big data initiatives?

Migrating big data (and machine learning) to the cloud offers many advantages.

Cloud infrastructure service providers, such as Amazon Web Services (AWS), offer a broad choice of on-demand and elastic compute resources, resilient and inexpensive persistent storage, and managed services that provide up-to-date, familiar environments to develop and operate big data applications.

Data engineers, developers, data scientists, and IT personnel can focus their efforts on preparing data and extracting valuable insights.

Services like Amazon EMR, AWS Glue, and Amazon S3 enable you to decouple and scale your compute and storage independently, while providing an integrated, well- managed, highly resilient environment, immediately reducing so many of the problems of on-premises approaches.

This approach leads to faster, more agile, easier to use, and more cost-efficient big data and data lake initiatives.

...more
View all episodesView all episodes
Download on the App Store

AGPIAL A Good Person Is Always Learning.By AGPIAL Phillip J. Murphy


More shows like AGPIAL A Good Person Is Always Learning.

View all
Bannon`s War Room by WarRoom.org

Bannon`s War Room

16,894 Listeners