AWS re:Invent 2015

(BDT306) How Hearst Publishing Manages Clickstream Analytics with AWS | AWS re:Invent 2015


Listen Later

Hearst Corporation monitors trending content on 250+ sites worldwide, providing metrics to editors and promoting cross-platform content sharing. To facilitate this, Hearst built a clickstream analytics platform on AWS that transmits and processes over 30 TB of data a day using AWS resources such as AWS Elastic Beanstalk, Amazon Kinesis, Spark on Amazon EMR, Amazon S3, Amazon Redshift, and Amazon Elasticsearch. In this session, learn how Hearst designed their clickstream analytics application and how you can use the same architecture to build your own and be ready to handle the changing world of clickstream data. Dive into how to do Spark streaming from an Amazon Kinesis stream, use timestamps to cleanse and validate data coming from diverse sources, and see how the system has evolved as data types have change from HTTP GET to RESTful JSON requests. Finally, see how Hearst's data scientists interact with and use cleansed data provided by the platform to perform ad hoc analyses, develop home-grown algorithms, and create visualizations and dashboards that support Hearst business stakeholders.
...more
View all episodesView all episodes
Download on the App Store

AWS re:Invent 2015By Amazon Web Services