Data Engineering Podcast

Bringing Feature Stores and MLOps to the Enterprise at Tecton


Listen Later

Summary

As more organizations are gaining experience with data management and incorporating analytics into their decision making, their next move is to adopt machine learning. In order to make those efforts sustainable, the core capability they need is for data scientists and analysts to be able to build and deploy features in a self service manner. As a result the feature store is becoming a required piece of the data platform. To fill that need Kevin Stumpf and the team at Tecton are building an enterprise feature store as a service. In this episode he explains how his experience building the Michelanagelo platform at Uber has informed the design and architecture of Tecton, how it integrates with your existing data systems, and the elements that are required for well engineered feature store.

Announcements
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
  • Do you want to get better at Python? Now is an excellent time to take an online course. Whether you’re just learning Python or you’re looking for deep dives on topics like APIs, memory mangement, async and await, and more, our friends at Talk Python Training have a top-notch course for you. If you’re just getting started, be sure to check out the Python for Absolute Beginners course. It’s like the first year of computer science that you never took compressed into 10 fun hours of Python coding and problem solving. Go to dataengineeringpodcast.com/talkpython today and get 10% off the course that will help you find your next level. That’s dataengineeringpodcast.com/talkpython, and don’t forget to thank them for supporting the show.
  • You invest so much in your data infrastructure – you simply can’t afford to settle for unreliable data. Fortunately, there’s hope: in the same way that New Relic, DataDog, and other Application Performance Management solutions ensure reliable software and keep application downtime at bay, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo’s end-to-end Data Observability Platform monitors and alerts for data issues across your data warehouses, data lakes, ETL, and business intelligence. The platform uses machine learning to infer and learn your data, proactively identify data issues, assess its impact through lineage, and notify those who need to know before it impacts the business. By empowering data teams with end-to-end data reliability, Monte Carlo helps organizations save time, increase revenue, and restore trust in their data. Visit dataengineeringpodcast.com/montecarlo today to request a demo and see how Monte Carlo delivers data observability across your data infrastructure. The first 25 will receive a free, limited edition Monte Carlo hat!
  • Your host is Tobias Macey and today I’m interviewing Kevin Stumpf about Tecton and the role that the feature store plays in a modern MLOps platform
  • Interview
    • Introduction
    • How did you get involved in the area of data management?
    • Can you start by describing what you are building at Tecton and your motivation for starting the business?
    • For anyone who isn’t familiar with the concept, what is an example of a feature?
    • How do you define what a feature store is?
    • What role does a feature store play in the overall lifecycle of a machine learning project?
    • How would you characterize the current landscape of feature stores?
    • What are the other components that are necessary for a complete ML operations platform?
    • At what points in the lifecycle of data does the feature store get integrated?
    • What types of data can feature stores manage? (e.g. text vs. image/binary vs. spatial, etc.)
    • How is the Tecton platform implemented?
      • How has the design evolved since you first began building it?
        • How did your work on Uber’s Michelangelo inform your work on Tecton?
        • What is the workflow and lifecycle of developing, testing, and deploying a feature to a feature store?
        • What aspects of a feature do you monitor to determine whether it has drifted?
          • How do you define drift in the context of a feature?
            • How does that differ from drift in an ML model?
            • How does Tecton handle versioning of features and associating those different versions with the models that are using them?
            • What are some of the most interesting, innovative, or unexpected projects that you have seen built with Tecton?
            • When is Tecton the wrong choice?
            • What do you have planned for the future of the product?
            • Contact Info
              • LinkedIn
              • kevinstumpf on GitHub
              • @kevinstumpf on Twitter
              • Parting Question
                • From your perspective, what is the biggest gap in the tooling or technology for data management today?
                • Closing Announcements
                  • Thank you for listening! Don’t forget to check out our other show, Podcast.__init__ to learn about the Python language, its community, and the innovative ways it is being used.
                  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
                  • If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story.
                  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
                  • Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat
                  • Links
                    • Tecton
                    • Uber Michelangelo
                    • MLOps
                    • Feature Store
                    • Blog: What Is A Feature Store
                    • StreamSQL
                      • Podcast Episode
                      • AWS Feature Store
                      • Logical Clocks
                      • EMR
                      • Kotlin
                      • DynamoDB
                      • scikit-learn
                      • Tensorflow
                      • MLFlow
                      • Algorithmia
                      • SageMaker
                      • Feast open source feature store
                      • Jaeger
                      • OpenTelemetry
                      • The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

                        Support Data Engineering Podcast

                        ...more
                        View all episodesView all episodes
                        Download on the App Store

                        Data Engineering PodcastBy Tobias Macey

                        • 4.5
                        • 4.5
                        • 4.5
                        • 4.5
                        • 4.5

                        4.5

                        142 ratings


                        More shows like Data Engineering Podcast

                        View all
                        The Changelog: Software Development, Open Source by Changelog Media

                        The Changelog: Software Development, Open Source

                        289 Listeners

                        Software Engineering Daily by Software Engineering Daily

                        Software Engineering Daily

                        623 Listeners

                        Talk Python To Me by Michael Kennedy

                        Talk Python To Me

                        583 Listeners

                        Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

                        Super Data Science: ML & AI Podcast with Jon Krohn

                        302 Listeners

                        NVIDIA AI Podcast by NVIDIA

                        NVIDIA AI Podcast

                        334 Listeners

                        Practical AI by Practical AI LLC

                        Practical AI

                        203 Listeners

                        AWS Podcast by Amazon Web Services

                        AWS Podcast

                        205 Listeners

                        Last Week in AI by Skynet Today

                        Last Week in AI

                        305 Listeners

                        Dwarkesh Podcast by Dwarkesh Patel

                        Dwarkesh Podcast

                        517 Listeners

                        The Data Engineering Show by The Firebolt Data Bros

                        The Data Engineering Show

                        8 Listeners

                        No Priors: Artificial Intelligence | Technology | Startups by Conviction

                        No Priors: Artificial Intelligence | Technology | Startups

                        130 Listeners

                        Latent Space: The AI Engineer Podcast by swyx + Alessio

                        Latent Space: The AI Engineer Podcast

                        92 Listeners

                        This Day in AI Podcast by Michael Sharkey, Chris Sharkey

                        This Day in AI Podcast

                        228 Listeners

                        The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

                        The AI Daily Brief: Artificial Intelligence News and Analysis

                        631 Listeners

                        AI + a16z by a16z

                        AI + a16z

                        36 Listeners