The Python Podcast.__init__

Bonobo: Lightweight ETL Toolkit for Python 3 with Romain Dorgueil


Listen Later

Summary

A majority of the work that we do as programmers involves data manipulation in some manner. This can range from large scale collection, aggregation, and statistical analysis across distrbuted systems, or it can be as simple as making a graph in a spreadsheet. In the middle of that range is the general task of ETL (Extract, Transform, and Load) which has its own range of scale. In this episode Romain Dorgueil discusses his experiences building ETL systems and the problems that he routinely encountered that led him to creating Bonobo, a lightweight, easy to use toolkit for data processing in Python 3. He also explains how the system works under the hood, how you can use it for your projects, and what he has planned for the future.

Preface
  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • I would like to thank everyone who supports us on Patreon. Your contributions help to make the show sustainable.
  • When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at podastinit.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app. And now you can deliver your work to your users even faster with the newly upgraded 200 GBit network in all of their datacenters.
  • If you’re tired of cobbling together your deployment pipeline then it’s time to try out GoCD, the open source continuous delivery platform built by the people at ThoughtWorks who wrote the book about it. With GoCD you get complete visibility into the life-cycle of your software from one location. To download it now go to podcatinit.com/gocd. Professional support and enterprise plugins are available for added piece of mind.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email [email protected])
  • To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
  • Your host as usual is Tobias Macey and today I’m interviewing Romain Dorgueil about Bonobo, a data processing toolkit for modern Python
  • Interview
    • Introductions
    • How did you get introduced to Python?
    • What is Bonobo and what was your motivation for creating it?
      • What is the story behind the name?

      • How does Bonobo differ from projects such as Luigi or Airflow?

      • [RD] After I explain why that’s totally different things, maybe a good follow up would be to ask about differences from other data streaming solutions, like Apache Beam or Spark.
      • How is Bonobo implemented and how has its architecture evolved since you began working on it?

      • What have been some of the most challenging aspects of building and maintaining Bonobo?

      • What are some extensions that you would like to have but don’t have the time to implement?

      • What are some of the most interesting or creative uses of Bonobo that you are aware of?

      • What do you have planned for the future of Bonobo?

      • Keep In Touch
        • Bonobo Project
          • Bonobo ETL
          • Slack
          • GitHub

          • Romain

            • Website
            • @rdorgueil on Twitter
            • hartym on GitHub

            • Picks
              • Tobias
                • Data Skeptic: Quantum Computing

                • Romain

                  • Medikit, or how to manage hundreds of projects at the same time, still being able to sleep at night.
                  • Rocker, a better builder for docker images.

                  • Links
                    • Bonobo
                    • RedHat
                    • Anaconda Installer
                    • ETL
                    • Pentaho
                    • RDC.ETL
                    • DAG (Directed Acyclic Graph)
                    • Luigi
                    • Airflow
                    • NamedTuple
                    • Jupyter
                    • OAuth
                    • Graphviz
                    • Dask
                    • Data Engineering Podcast
                    • Dask Interview
                    • Selenium
                    • Zapier
                    • IFTTT (If This Then That)
                    • FPGA
                    • The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

                      ...more
                      View all episodesView all episodes
                      Download on the App Store

                      The Python Podcast.__init__By Tobias Macey

                      • 4.4
                      • 4.4
                      • 4.4
                      • 4.4
                      • 4.4

                      4.4

                      100 ratings


                      More shows like The Python Podcast.__init__

                      View all
                      The Changelog: Software Development, Open Source by Changelog Media

                      The Changelog: Software Development, Open Source

                      283 Listeners

                      Data Skeptic by Kyle Polich

                      Data Skeptic

                      482 Listeners

                      Chat With Traders by Tessa Dao

                      Chat With Traders

                      1,979 Listeners

                      Talk Python To Me by Michael Kennedy

                      Talk Python To Me

                      592 Listeners

                      Software Engineering Daily by Software Engineering Daily

                      Software Engineering Daily

                      623 Listeners

                      The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

                      The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

                      446 Listeners

                      Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

                      Super Data Science: ML & AI Podcast with Jon Krohn

                      297 Listeners

                      Python Bytes by Michael Kennedy and Brian Okken

                      Python Bytes

                      215 Listeners

                      Data Engineering Podcast by Tobias Macey

                      Data Engineering Podcast

                      142 Listeners

                      Machine Learning Guide by OCDevel

                      Machine Learning Guide

                      764 Listeners

                      Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

                      Syntax - Tasty Web Development Treats

                      986 Listeners

                      DataFramed by DataCamp

                      DataFramed

                      267 Listeners

                      Practical AI by Practical AI LLC

                      Practical AI

                      192 Listeners

                      The Real Python Podcast by Real Python

                      The Real Python Podcast

                      140 Listeners

                      Hard Fork by The New York Times

                      Hard Fork

                      5,432 Listeners