July 03, 2023

How Data Engineering Teams Power Machine Learning With Feature Platforms

1 hour 3 minutes

Summary

Feature engineering is a crucial aspect of the machine learning workflow. To make that possible, there are a number of technical and procedural capabilities that must be in place first. In this episode Razi Raziuddin shares how data engineering teams can support the machine learning workflow through the development and support of systems that empower data scientists and ML engineers to build and maintain their own features.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management

Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack

Your host is Tobias Macey and today I'm interviewing Razi Raziuddin about how data engineers can empower data scientists to develop and deploy better ML models through feature engineering

Interview

Introduction

How did you get involved in the area of data management?

What is feature engineering is and why/to whom it matters?

A topic that commonly comes up in relation to feature engineering is the importance of a feature store. What are the tradeoffs for that to be a separate infrastructure/architecture component?

What is the overall lifecycle of a feature, from definition to deployment and maintenance?

How is this distinct from other forms of data pipeline development and delivery?

Who are the participants in that workflow?

What are the sharp edges/roadblocks that typically manifest in that lifecycle?

What are the interfaces that are needed for data scientists/ML engineers to be able to self-serve their feature management?

What is the role of the data engineer in supporting those interfaces?

What are the communication/collaboration channels that are necessary to make the overall process a success?

From an implementation/architecture perspective, what are the patterns that you have seen teams build around for feature development/serving?

What are the most interesting, innovative, or unexpected ways that you have seen feature platforms used?

What are the most interesting, unexpected, or challenging lessons that you have learned while working on feature engineering?

What are the resources that you find most helpful in understanding and designing feature platforms?

Contact Info

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.

Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.

If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story.

To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

FeatureByte

DataRobot

Feature Store

Feast Feature Store

Feathr

Kaggle

Yann LeCun

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA