The Python Podcast.__init__

Build Composable And Reusable Feature Engineering Pipelines with Feature-Engine


Listen Later

Summary

Every machine learning model has to start with feature engineering. This is the process of combining input variables into a more meaningful signal for the problem that you are trying to solve. Many times this process can lead to duplicating code from previous projects, or introducing technical debt in the form of poorly maintained feature pipelines. In order to make the practice more manageable Soledad Galli created the feature-engine library. In this episode she explains how it has helped her and others build reusable transformations that can be applied in a composable manner with your scikit-learn projects. She also discusses the importance of understanding the data that you are working with and the domain in which your model will be used to ensure that you are selecting the right features.

Announcements
  • Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
  • Your host as usual is Tobias Macey and today I’m interviewing Soledad Galli about feature-engine, a Python library to engineer features for use in machine learning models
  • Interview
    • Introductions
    • How did you get introduced to Python?
    • Can you describe what feature-engine is and the story behind it?
    • What are the complexities that are inherent to feature engineering?
      • What are the problems that are introduced due to incidental complexity and technical debt?
      • What was missing in the available set of libraries/frameworks/toolkits for feature engineering that you are solving for with feature-engine?
      • What are some examples of the types of domain knowledge that are needed to effectively build features for an ML model?
      • Given the fact that features are constructed through methods such as normalizing data distributions, imputing missing values, combining attributes, etc. what are some of the potential risks that are introduced by incorrectly applied transformations or invalid assumptions about the impact of these manipulations?
      • Can you describe how feature-engine is implemented?
        • How have the design and goals of the project changed or evolved since you started working on it?
        • What (if any) difference exists in the feature engineering process for frameworks like scikit-learn as compared to deep learning approaches using PyTorch, Tensorflow, etc.?
        • Can you describe the workflow of identifying and generating useful features during model development?
          • What are the tools that are available for testing and debugging of the feature pipelines?
          • What do you see as the potential benefits or drawbacks of integrating feature-engine with a feature store such as Feast or Tecton?
          • What are the most interesting, innovative, or unexpected ways that you have seen feature-engine used?
          • What are the most interesting, unexpected, or challenging lessons that you have learned while working on feature-engine?
          • When is feature-engine the wrong choice?
          • What do you have planned for the future of feature-engine?
          • Keep In Touch
            • LinkedIn
            • @Soledad_Galli on Twitter
            • solegalli on GitHub
            • Picks
              • Tobias
                • Dune Movie
                • Dune Series
                • Soledad
                  • The Social Dilemma
                  • Don’t Be Evil by Rana Foroohar
                  • Closing Announcements
                    • Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
                    • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
                    • If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story.
                    • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
                    • Links
                      • feature-engine
                      • Feature Engineering
                      • Python Feature Engineering Cookbook
                      • scikit-learn
                      • Feature Stores
                        • Podcast Episode
                        • Pandas
                          • Podcast Episode
                          • PyTorch
                            • Podcast Episode
                            • Tensorflow
                            • Feast
                            • Tecton
                              • Data Engineering Podcast Episode
                              • Kaggle
                              • Dask
                                • Data Engineering Podcast Episode
                                • The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

                                  ...more
                                  View all episodesView all episodes
                                  Download on the App Store

                                  The Python Podcast.__init__By Tobias Macey

                                  • 4.4
                                  • 4.4
                                  • 4.4
                                  • 4.4
                                  • 4.4

                                  4.4

                                  100 ratings


                                  More shows like The Python Podcast.__init__

                                  View all
                                  TED Talks Daily by TED

                                  TED Talks Daily

                                  11,284 Listeners

                                  6 Minute English by BBC Radio

                                  6 Minute English

                                  1,779 Listeners

                                  The Changelog: Software Development, Open Source by Changelog Media

                                  The Changelog: Software Development, Open Source

                                  285 Listeners

                                  Data Skeptic by Kyle Polich

                                  Data Skeptic

                                  474 Listeners

                                  Talk Python To Me by Michael Kennedy

                                  Talk Python To Me

                                  585 Listeners

                                  Software Engineering Daily by Software Engineering Daily

                                  Software Engineering Daily

                                  629 Listeners

                                  The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

                                  The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

                                  425 Listeners

                                  Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

                                  Super Data Science: ML & AI Podcast with Jon Krohn

                                  296 Listeners

                                  Python Bytes by Michael Kennedy and Brian Okken

                                  Python Bytes

                                  213 Listeners

                                  Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

                                  Syntax - Tasty Web Development Treats

                                  987 Listeners

                                  DataFramed by DataCamp

                                  DataFramed

                                  267 Listeners

                                  Practical AI by Practical AI LLC

                                  Practical AI

                                  196 Listeners

                                  The Real Python Podcast by Real Python

                                  The Real Python Podcast

                                  137 Listeners

                                  Last Week in AI by Skynet Today

                                  Last Week in AI

                                  275 Listeners

                                  Latent Space: The AI Engineer Podcast by swyx + Alessio

                                  Latent Space: The AI Engineer Podcast

                                  66 Listeners