Data Engineering Podcast

Take Control Of Your Web Analytics Using Snowplow With Alexander Dean - Episode 48


Listen Later

Summary

Every business with a website needs some way to keep track of how much traffic they are getting, where it is coming from, and which actions are being taken. The default in most cases is Google Analytics, but this can be limiting when you wish to perform detailed analysis of the captured data. To address this problem, Alex Dean co-founded Snowplow Analytics to build an open source platform that gives you total control of your website traffic data. In this episode he explains how the project and company got started, how the platform is architected, and how you can start using it today to get a clearer view of how your customers are interacting with your web and mobile applications.

Preamble
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute.
  • You work hard to make sure that your data is reliable and accurate, but can you say the same about the deployment of your machine learning models? The Skafos platform from Metis Machine was built to give your data scientists the end-to-end support that they need throughout the machine learning lifecycle. Skafos maximizes interoperability with your existing tools and platforms, and offers real-time insights and the ability to be up and running with cloud-based production scale infrastructure instantaneously. Request a demo at dataengineeringpodcast.com/metis-machine to learn more about how Metis Machine is operationalizing data science.
  • Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch.
  • Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat
  • This is your host Tobias Macey and today I’m interviewing Alexander Dean about Snowplow Analytics
  • Interview
    • Introductions
    • How did you get involved in the area of data engineering and data management?
    • What is Snowplow Analytics and what problem were you trying to solve when you started the company?
    • What is unique about customer event data from an ingestion and processing perspective?
    • Challenges with properly matching up data between sources
    • Data collection is one of the more difficult aspects of an analytics pipeline because of the potential for inconsistency or incorrect information. How is the collection portion of the Snowplow stack designed and how do you validate the correctness of the data?
      • Cleanliness/accuracy

      • What kinds of metrics should be tracked in an ingestion pipeline and how do you monitor them to ensure that everything is operating properly?

      • Can you describe the overall architecture of the ingest pipeline that Snowplow provides?

        • How has that architecture evolved from when you first started?
        • What would you do differently if you were to start over today?

        • Ensuring appropriate use of enrichment sources

        • What have been some of the biggest challenges encountered while building and evolving Snowplow?

        • What are some of the most interesting uses of your platform that you are aware of?

        • Keep In Touch
          • Alex
            • @alexcrdean on Twitter
            • LinkedIn

            • Snowplow

              • @snowplowdata on Twitter

              • Parting Question
                • From your perspective, what is the biggest gap in the tooling or technology for data management today?
                • Links
                  • Snowplow
                    • GitHub

                    • Deloitte Consulting

                    • OpenX

                    • Hadoop

                    • AWS

                    • EMR (Elastic Map-Reduce)

                    • Business Intelligence

                    • Data Warehousing

                    • Google Analytics

                    • CRM (Customer Relationship Management)

                    • S3

                    • GDPR (General Data Protection Regulation)

                    • Kinesis

                    • Kafka

                    • Google Cloud Pub-Sub

                    • JSON-Schema

                    • Iglu

                    • IAB Bots And Spiders List

                    • Heap Analytics

                      • Podcast Interview

                      • Redshift

                      • SnowflakeDB

                      • Snowplow Insights

                      • Google Cloud Platform

                      • Azure

                      • GitLab

                      • The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

                        Support Data Engineering Podcast

                        ...more
                        View all episodesView all episodes
                        Download on the App Store

                        Data Engineering PodcastBy Tobias Macey

                        • 4.5
                        • 4.5
                        • 4.5
                        • 4.5
                        • 4.5

                        4.5

                        142 ratings


                        More shows like Data Engineering Podcast

                        View all
                        This Week in Startups by Jason Calacanis

                        This Week in Startups

                        1,299 Listeners

                        The Changelog: Software Development, Open Source by Changelog Media

                        The Changelog: Software Development, Open Source

                        288 Listeners

                        The a16z Show by Andreessen Horowitz

                        The a16z Show

                        1,106 Listeners

                        Software Engineering Daily by Software Engineering Daily

                        Software Engineering Daily

                        630 Listeners

                        Risky Business by Risky Business Media

                        Risky Business

                        372 Listeners

                        Talk Python To Me by Michael Kennedy

                        Talk Python To Me

                        583 Listeners

                        Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

                        Super Data Science: ML & AI Podcast with Jon Krohn

                        309 Listeners

                        NVIDIA AI Podcast by NVIDIA

                        NVIDIA AI Podcast

                        346 Listeners

                        Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

                        Syntax - Tasty Web Development Treats

                        987 Listeners

                        Practical AI by Practical AI LLC

                        Practical AI

                        210 Listeners

                        Dwarkesh Podcast by Dwarkesh Patel

                        Dwarkesh Podcast

                        550 Listeners

                        The Data Engineering Show by The Firebolt Data Bros

                        The Data Engineering Show

                        10 Listeners

                        Latent Space: The AI Engineer Podcast by Latent.Space

                        Latent Space: The AI Engineer Podcast

                        104 Listeners

                        This Day in AI Podcast by Michael Sharkey, Chris Sharkey

                        This Day in AI Podcast

                        227 Listeners

                        The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

                        The AI Daily Brief: Artificial Intelligence News and Analysis

                        680 Listeners