Data Engineering Podcast

Building Data Flows In Apache NiFi With Kevin Doran and Andy LoPresto - Episode 39


Listen Later

Summary

Data integration and routing is a constantly evolving problem and one that is fraught with edge cases and complicated requirements. The Apache NiFi project models this problem as a collection of data flows that are created through a self-service graphical interface. This framework provides a flexible platform for building a wide variety of integrations that can be managed and scaled easily to fit your particular needs. In this episode project members Kevin Doran and Andy LoPresto discuss the ways that NiFi can be used, how to start using it in your environment, and plans for future development. They also explained how it fits in the broad landscape of data tools, the interesting and challenging aspects of the project, and how to build new extensions.

Preamble
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute.
  • Are you struggling to keep up with customer request and letting errors slip into production? Want to try some of the innovative ideas in this podcast but don’t have time? DataKitchen’s DataOps software allows your team to quickly iterate and deploy pipelines of code, models, and data sets while improving quality. Unlike a patchwork of manual operations, DataKitchen makes your team shine by providing an end to end DataOps solution with minimal programming that uses the tools you love. Join the DataOps movement and sign up for the newsletter at datakitchen.io/de today. After that learn more about why you should be doing DataOps by listening to the Head Chef in the Data Kitchen at dataengineeringpodcast.com/datakitchen
  • Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch.
  • Your host is Tobias Macey and today I’m interviewing Kevin Doran and Andy LoPresto about Apache NiFi
  • Interview
    • Introduction
    • How did you get involved in the area of data management?
    • Can you start by explaining what NiFi is?
    • What is the motivation for building a GUI as the primary interface for the tool when the current trend is to represent everything as code?
    • How did you get involved with the project?
      • Where does it sit in the broader landscape of data tools?

      • Does the data that is processed by NiFi flow through the servers that it is running on (á la Spark/Flink/Kafka), or does it orchestrate actions on other systems (á la Airflow/Oozie)?

        • How do you manage versioning and backup of data flows, as well as promoting them between environments?

        • One of the advertised features is tracking provenance for data flows that are managed by NiFi. How is that data collected and managed?

          • What types of reporting are available across this information?

          • What are some of the use cases or requirements that lend themselves well to being solved by NiFi?

            • When is NiFi the wrong choice?

            • What is involved in deploying and scaling a NiFi installation?

              • What are some of the system/network parameters that should be considered?
              • What are the scaling limitations?

              • What have you found to be some of the most interesting, unexpected, and/or challenging aspects of building and maintaining the NiFi project and community?

              • What do you have planned for the future of NiFi?

              • Contact Info
                • Kevin Doran
                  • @kevdoran on Twitter
                  • Email

                  • Andy LoPresto

                    • @yolopey on Twitter
                    • Email

                    • Parting Question
                      • From your perspective, what is the biggest gap in the tooling or technology for data management today?
                      • Links
                        • NiFi
                        • HortonWorks DataFlow
                        • HortonWorks
                        • Apache Software Foundation
                        • Apple
                        • CSV
                        • XML
                        • JSON
                        • Perl
                        • Python
                        • Internet Scale
                        • Asset Management
                        • Documentum
                        • DataFlow
                        • NSA (National Security Agency)
                        • 24 (TV Show)
                        • Technology Transfer Program
                        • Agile Software Development
                        • Waterfall
                        • Spark
                        • Flink
                        • Kafka
                        • Oozie
                        • Luigi
                        • Airflow
                        • FluentD
                        • ETL (Extract, Transform, and Load)
                        • ESB (Enterprise Service Bus)
                        • MiNiFi
                        • Java
                        • C++
                        • Provenance
                        • Kubernetes
                        • Apache Atlas
                        • Data Governance
                        • Kibana
                        • K-Nearest Neighbors
                        • DevOps
                        • DSL (Domain Specific Language)
                        • NiFi Registry
                        • Artifact Repository
                        • Nexus
                        • NiFi CLI
                        • Maven Archetype
                        • IoT
                        • Docker
                        • Backpressure
                        • NiFi Wiki
                        • TLS (Transport Layer Security)
                        • Mozilla TLS Observatory
                        • NiFi Flow Design System
                        • Data Lineage
                        • GDPR (General Data Protection Regulation)
                        • The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

                          Support Data Engineering Podcast

                          ...more
                          View all episodesView all episodes
                          Download on the App Store

                          Data Engineering PodcastBy Tobias Macey

                          • 4.6
                          • 4.6
                          • 4.6
                          • 4.6
                          • 4.6

                          4.6

                          135 ratings


                          More shows like Data Engineering Podcast

                          View all
                          Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

                          Software Engineering Radio - the podcast for professional software developers

                          272 Listeners

                          The Changelog: Software Development, Open Source by Changelog Media

                          The Changelog: Software Development, Open Source

                          283 Listeners

                          The Cloudcast by Massive Studios

                          The Cloudcast

                          153 Listeners

                          Thoughtworks Technology Podcast by Thoughtworks

                          Thoughtworks Technology Podcast

                          41 Listeners

                          Data Skeptic by Kyle Polich

                          Data Skeptic

                          483 Listeners

                          Talk Python To Me by Michael Kennedy

                          Talk Python To Me

                          592 Listeners

                          Software Engineering Daily by Software Engineering Daily

                          Software Engineering Daily

                          624 Listeners

                          The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

                          The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

                          444 Listeners

                          Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

                          Super Data Science: ML & AI Podcast with Jon Krohn

                          298 Listeners

                          Python Bytes by Michael Kennedy and Brian Okken

                          Python Bytes

                          213 Listeners

                          DataFramed by DataCamp

                          DataFramed

                          266 Listeners

                          Practical AI by Practical AI LLC

                          Practical AI

                          190 Listeners

                          The Stack Overflow Podcast by The Stack Overflow Podcast

                          The Stack Overflow Podcast

                          64 Listeners

                          The Real Python Podcast by Real Python

                          The Real Python Podcast

                          140 Listeners

                          Latent Space: The AI Engineer Podcast by swyx + Alessio

                          Latent Space: The AI Engineer Podcast

                          77 Listeners