In this episode, I'm speaking with Julien Chaumond from 🤗 HuggingFace, about how they got started, getting large language models to production in millisecond inference times, and the CERN for machine learning.
Join our Discord community: https://discord.gg/tEYvqxwhah
01:00 - Guest intro02:14 - Origin of HuggingFace05:37 - Why the focus on NLP?07:45 - The success of the HuggingFace community13:14 - Reproducing models and scaling for the community18:14 - Enabling large models in production23:14 - How HuggingFace scales so many models27:34 - The biggest challenge HuggingFace solved in MLOps32:02 - How HuggingFace transitions from research to production34:44 - Using notebooks vs python modules38:27 - The most interesting topic in ML production40:10 - Fascinating ML research45:24 - Learning new things51:14 - Something that is true but most people disagree with56:54 - Tips to organize research teams1:00:05 - New features for accelerated inference1:01:35 - Most common use case of HuggingFace1:04:17 - Integrating search algorithms into transformer library1:05:09 - Integrating vision models1:06:06 - Long term business model1:10:55 - Automation and simplification of the process of building models1:13:02 - Support for real-time inference1:14:40 - Recommendations for the audienceFastDS: https://github.com/DAGsHub/fdsBigScience: https://bigscience.huggingface.co
https://www.linkedin.com/company/dagshub/https://www.linkedin.com/company/huggingface/
https://twitter.com/TheRealDAGsHubhttps://twitter.com/huggingface