The Python Podcast.__init__

A Data Catalog For Your PyData Projects


Listen Later

Summary

One of the biggest pain points when working with data is getting is dealing with the boilerplate code to load it into a usable format. Intake encapsulates all of that and puts it behind a single API. In this episode Martin Durant explains how to use the Intake data catalogs for encapsulating source information, how it simplifies data science workflows, and how to incorporate it into your projects. It is a lightweight way to enable collaboration between data engineers and data scientists in the PyData ecosystem.

Announcements
  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they just launched dedicated CPU instances. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
  • You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Go to pythonpodcast.com/conferences to learn more and take advantage of our partner discounts when you register.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email [email protected])
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
  • Your host as usual is Tobias Macey and today I’m interviewing Martin Durant about Intake, a lightweight package for finding, investigating, loading and disseminating data
  • Interview
    • Introductions
    • How did you get introduced to Python?
    • Can you start by explaining what Intake is and the story behind its creation?
      • Can you outline some of the other projects and products that intersect with the functionality of Intake and describe where it fits in terms of use case and capabilities? (e.g. Quilt Data, Arrow, Data Retriever)
      • Can you describe the workflows for using Intake, both from the data scientist and the data engineer perspective?
      • One of the persistent challenges in working with data is that of cataloging and discovery of what already exists. In what ways does Intake address that problem?
        • Does it have any facilities for capturing and exposing data lineage?
        • For someone who needs to customize their usage of Intake, what are the extension points and what is involved in building a plugin?
        • Can you describe how Intake is implemented and how it has evolved since it first started?
          • What are some of the most challenging, complex, or novel aspects of the Intake implementation?
          • Intake focuses primarily on integrating with the PyData ecosystem (e.g. NumPy, Pandas, SciPy, etc.). What are some other communities that are, or could be, benefiting from the work being done on Intake?
            • What are some of the assumptions that are baked into Intake that would need to be modified to make it more broadly applicable?
            • What are some of the assumptions that were made going into this project that have needed to be reconsidered after digging deeper into the problem space?
            • What are some of the most interesting/unexpected/innovative ways that you have seen Intake leveraged?
            • What are your plans for the future of Intake?
            • Keep In Touch
              • martindurant on GitHub
              • Website
              • @martin_durant_ on Twitter
              • Picks
                • Tobias
                  • Ubersuggest SEO tool
                  • Links
                    • Intake
                    • Anaconda
                    • Dask
                      • Data Engineering Podcast Interview
                      • Fast Parquet
                      • IDL
                      • Space Telescope Institute
                      • Blaze
                      • Quilt Data
                        • Podcast Interview
                        • Arrow
                        • Data Retriever
                          • Podcast Interview
                          • Parquet
                            • Data Engineering Podcast Interview
                            • DataFrame
                            • Apache Spark
                            • Dremio
                              • Data Engineering Podcast Interview
                              • Dat Project – distributed peer-to-peer data sharing
                                • Data Engineering Podcast Interview
                                • GeoPandas
                                • XArray
                                • Solr
                                • Streamz
                                • PyViz
                                • S3FS
                                • The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

                                  ...more
                                  View all episodesView all episodes
                                  Download on the App Store

                                  The Python Podcast.__init__By Tobias Macey

                                  • 4.4
                                  • 4.4
                                  • 4.4
                                  • 4.4
                                  • 4.4

                                  4.4

                                  100 ratings


                                  More shows like The Python Podcast.__init__

                                  View all
                                  The Changelog: Software Development, Open Source by Changelog Media

                                  The Changelog: Software Development, Open Source

                                  283 Listeners

                                  Data Skeptic by Kyle Polich

                                  Data Skeptic

                                  476 Listeners

                                  Talk Python To Me by Michael Kennedy

                                  Talk Python To Me

                                  584 Listeners

                                  Software Engineering Daily by Software Engineering Daily

                                  Software Engineering Daily

                                  624 Listeners

                                  Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

                                  Super Data Science: ML & AI Podcast with Jon Krohn

                                  297 Listeners

                                  Python Bytes by Michael Kennedy and Brian Okken

                                  Python Bytes

                                  214 Listeners

                                  Data Engineering Podcast by Tobias Macey

                                  Data Engineering Podcast

                                  141 Listeners

                                  The Daily by The New York Times

                                  The Daily

                                  110,759 Listeners

                                  Machine Learning Guide by OCDevel

                                  Machine Learning Guide

                                  770 Listeners

                                  Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

                                  Syntax - Tasty Web Development Treats

                                  989 Listeners

                                  Darknet Diaries by Jack Rhysider

                                  Darknet Diaries

                                  7,929 Listeners

                                  DataFramed by DataCamp

                                  DataFramed

                                  271 Listeners

                                  Practical AI by Practical AI LLC

                                  Practical AI

                                  189 Listeners

                                  The Real Python Podcast by Real Python

                                  The Real Python Podcast

                                  140 Listeners

                                  岩中花述 by GIADA | JustPod

                                  岩中花述

                                  263 Listeners