
Sign up to save your podcasts
Or


The paper presents Microsoft's Phi-3 family of Small Language Models (SLMs), notably highlighting the phi-3-mini, a 3.8 billion parameter model that is compact enough to run locally on a smartphone. Despite its small size, phi-3-mini rivals the overall performance of much larger models, such as GPT-3.5 and Mixtral 8x7B, across various academic benchmarks measuring reasoning, math, and coding abilities.
The core breakthrough stems from the researchers' focus on a "data optimal regime," which relies on the meticulous curation of high-quality training data rather than simply scaling up the model's parameters. The training dataset consists of heavily filtered publicly available web data and LLM-generated synthetic data designed to teach the model general knowledge and logical reasoning.
Beyond the mini model, the report introduces scaled-up and specialized versions:
The models underwent rigorous post-training, including Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO), to align them with Responsible AI (RAI) safety principles and format them into helpful AI assistants. While the models exhibit exceptional reasoning capabilities, the paper notes that their small size limits their capacity to store factual knowledge (often leading to factual inaccuracies on trivia tasks), though this weakness can be effectively mitigated by augmenting the models with a search engine.
By Yun WuThe paper presents Microsoft's Phi-3 family of Small Language Models (SLMs), notably highlighting the phi-3-mini, a 3.8 billion parameter model that is compact enough to run locally on a smartphone. Despite its small size, phi-3-mini rivals the overall performance of much larger models, such as GPT-3.5 and Mixtral 8x7B, across various academic benchmarks measuring reasoning, math, and coding abilities.
The core breakthrough stems from the researchers' focus on a "data optimal regime," which relies on the meticulous curation of high-quality training data rather than simply scaling up the model's parameters. The training dataset consists of heavily filtered publicly available web data and LLM-generated synthetic data designed to teach the model general knowledge and logical reasoning.
Beyond the mini model, the report introduces scaled-up and specialized versions:
The models underwent rigorous post-training, including Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO), to align them with Responsible AI (RAI) safety principles and format them into helpful AI assistants. While the models exhibit exceptional reasoning capabilities, the paper notes that their small size limits their capacity to store factual knowledge (often leading to factual inaccuracies on trivia tasks), though this weakness can be effectively mitigated by augmenting the models with a search engine.