Learning GenAI via SOTA Papers

EP084: Microsoft Phi-3 Fits Supercomputing in Your Pocket


Listen Later

The paper presents Microsoft's Phi-3 family of Small Language Models (SLMs), notably highlighting the phi-3-mini, a 3.8 billion parameter model that is compact enough to run locally on a smartphone. Despite its small size, phi-3-mini rivals the overall performance of much larger models, such as GPT-3.5 and Mixtral 8x7B, across various academic benchmarks measuring reasoning, math, and coding abilities.

The core breakthrough stems from the researchers' focus on a "data optimal regime," which relies on the meticulous curation of high-quality training data rather than simply scaling up the model's parameters. The training dataset consists of heavily filtered publicly available web data and LLM-generated synthetic data designed to teach the model general knowledge and logical reasoning.

Beyond the mini model, the report introduces scaled-up and specialized versions:

  • Larger models: The 7B parameter phi-3-small and 14B parameter phi-3-medium.
  • The Phi-3.5 series: Created to enhance long-context, multilingual, and multimodal capabilities. This series includes the phi-3.5-mini, the phi-3.5-MoE (a highly efficient Mixture-of-Experts model), and the phi-3.5-Vision (a multimodal model that processes both text and images).

The models underwent rigorous post-training, including Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO), to align them with Responsible AI (RAI) safety principles and format them into helpful AI assistants. While the models exhibit exceptional reasoning capabilities, the paper notes that their small size limits their capacity to store factual knowledge (often leading to factual inaccuracies on trivia tasks), though this weakness can be effectively mitigated by augmenting the models with a search engine.

...more
View all episodesView all episodes
Download on the App Store

Learning GenAI via SOTA PapersBy Yun Wu