
Sign up to save your podcasts
Or


This episode addresses how we turn a raw base model into something that behaves like a real assistant using Supervised Fine-Tuning (SFT). We explore instruction and response training data, why SFT makes behaviors consistent beyond prompting, and the practical engineering choices that keep fine-tuning efficient and safe, including low learning rates and LoRA-style adapters. By the end, you will understand what SFT solves, and why the next layer (RLHF) is needed to add human preference and nuance.
By Sheetal ’Shay’ DharThis episode addresses how we turn a raw base model into something that behaves like a real assistant using Supervised Fine-Tuning (SFT). We explore instruction and response training data, why SFT makes behaviors consistent beyond prompting, and the practical engineering choices that keep fine-tuning efficient and safe, including low learning rates and LoRA-style adapters. By the end, you will understand what SFT solves, and why the next layer (RLHF) is needed to add human preference and nuance.