Data Engineering Podcast

Convert Your Unstructured Data To Embedding Vectors For More Efficient Machine Learning With Towhee


Listen Later

Preamble

This is a cross-over episode from our new show The Machine Learning Podcast, the show about going from idea to production with machine learning.

Summary

Data is one of the core ingredients for machine learning, but the format in which it is understandable to humans is not a useful representation for models. Embedding vectors are a way to structure data in a way that is native to how models interpret and manipulate information. In this episode Frank Liu shares how the Towhee library simplifies the work of translating your unstructured data assets (e.g. images, audio, video, etc.) into embeddings that you can use efficiently for machine learning, and how it fits into your workflow for model development.

Announcements
  • Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.
  • Building good ML models is hard, but testing them properly is even harder. At Deepchecks, they built an open-source testing framework that follows best practices, ensuring that your models behave as expected. Get started quickly using their built-in library of checks for testing and validating your model’s behavior and performance, and extend it to meet your specific needs as your model evolves. Accelerate your machine learning projects by building trust in your models and automating the testing that you used to do manually. Go to themachinelearningpodcast.com/deepchecks today to get started!
  • Your host is Tobias Macey and today I’m interviewing Frank Liu about how to use vector embeddings in your ML projects and how Towhee can reduce the effort involved
  • Interview
    • Introduction
    • How did you get involved in machine learning?
    • Can you describe what Towhee is and the story behind it?
    • What is the problem that Towhee is aimed at solving?
    • What are the elements of generating vector embeddings that pose the greatest challenge or require the most effort?
    • Once you have an embedding, what are some of the ways that it might be used in a machine learning project?
      • Are there any design considerations that need to be addressed in the form that an embedding takes and how it impacts the resultant model that relies on it? (whether for training or inference)
      • Can you describe how the Towhee framework is implemented?
        • What are some of the interesting engineering challenges that needed to be addressed?
        • How have the design/goals/scope of the project shifted since it began?
        • What is the workflow for someone using Towhee in the context of an ML project?
        • What are some of the types optimizations that you have incorporated into Towhee?
          • What are some of the scaling considerations that users need to be aware of as they increase the volume or complexity of data that they are processing?
          • What are some of the ways that using Towhee impacts the way a data scientist or ML engineer approach the design development of their model code?
          • What are the interfaces available for integrating with and extending Towhee?
          • What are the most interesting, innovative, or unexpected ways that you have seen Towhee used?
          • What are the most interesting, unexpected, or challenging lessons that you have learned while working on Towhee?
          • When is Towhee the wrong choice?
          • What do you have planned for the future of Towhee?
          • Contact Info
            • LinkedIn
            • fzliu on GitHub
            • Website
            • @frankzliu on Twitter
            • Parting Question
              • From your perspective, what is the biggest barrier to adoption of machine learning today?
              • Closing Announcements
                • Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.
                • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
                • If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story.
                • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
                • Links
                  • Towhee
                  • Zilliz
                  • Milvus
                    • Data Engineering Podcast Episode
                    • Computer Vision
                    • Tensor
                    • Autoencoder
                    • Latent Space
                    • Diffusion Model
                    • HSL == Hue, Saturation, Lightness
                    • Weights and Biases
                    • The intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

                      Support Data Engineering Podcast

                      ...more
                      View all episodesView all episodes
                      Download on the App Store

                      Data Engineering PodcastBy Tobias Macey

                      • 4.5
                      • 4.5
                      • 4.5
                      • 4.5
                      • 4.5

                      4.5

                      142 ratings


                      More shows like Data Engineering Podcast

                      View all
                      This Week in Startups by Jason Calacanis

                      This Week in Startups

                      1,301 Listeners

                      The Changelog: Software Development, Open Source by Changelog Media

                      The Changelog: Software Development, Open Source

                      288 Listeners

                      The a16z Show by Andreessen Horowitz

                      The a16z Show

                      1,107 Listeners

                      Software Engineering Daily by Software Engineering Daily

                      Software Engineering Daily

                      630 Listeners

                      Risky Business by Risky Business Media

                      Risky Business

                      373 Listeners

                      Talk Python To Me by Michael Kennedy

                      Talk Python To Me

                      583 Listeners

                      Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

                      Super Data Science: ML & AI Podcast with Jon Krohn

                      308 Listeners

                      NVIDIA AI Podcast by NVIDIA

                      NVIDIA AI Podcast

                      347 Listeners

                      Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

                      Syntax - Tasty Web Development Treats

                      988 Listeners

                      Practical AI by Practical AI LLC

                      Practical AI

                      211 Listeners

                      Dwarkesh Podcast by Dwarkesh Patel

                      Dwarkesh Podcast

                      549 Listeners

                      The Data Engineering Show by The Firebolt Data Bros

                      The Data Engineering Show

                      9 Listeners

                      Latent Space: The AI Engineer Podcast by Latent.Space

                      Latent Space: The AI Engineer Podcast

                      104 Listeners

                      This Day in AI Podcast by Michael Sharkey, Chris Sharkey

                      This Day in AI Podcast

                      227 Listeners

                      The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

                      The AI Daily Brief: Artificial Intelligence News and Analysis

                      683 Listeners