
Sign up to save your podcasts
Or
This article from Amazon Science, published in May 2025, focuses on machine learning and conversational AI, specifically addressing improvements in reinforcement learning with human feedback (RLHF) for large language models (LLMs). The authors, Sailik Sengupta and Saket Dingliwal, introduce SeRA (self-reviewing and alignment), a novel training method designed to mitigate spurious correlations that can arise during direct preference optimization (DPO). SeRA refines the training process by prioritizing preference pairs with significant reward differences, which the researchers assert can boost performance by 20% to 40%. The text also highlights related Amazon Science work and outlines various applied scientist career opportunities within Amazon, emphasizing roles in generative AI and LLMsacross different departments like Prime Video and Alexa AI.
This article from Amazon Science, published in May 2025, focuses on machine learning and conversational AI, specifically addressing improvements in reinforcement learning with human feedback (RLHF) for large language models (LLMs). The authors, Sailik Sengupta and Saket Dingliwal, introduce SeRA (self-reviewing and alignment), a novel training method designed to mitigate spurious correlations that can arise during direct preference optimization (DPO). SeRA refines the training process by prioritizing preference pairs with significant reward differences, which the researchers assert can boost performance by 20% to 40%. The text also highlights related Amazon Science work and outlines various applied scientist career opportunities within Amazon, emphasizing roles in generative AI and LLMsacross different departments like Prime Video and Alexa AI.