
Sign up to save your podcasts
Or


Alright learning crew, Ernis here, ready to dive into some fascinating research about how we teach AI to be, well, less wrong! We're talking about Large Language Models – think of them as super-smart parrots that can string together sentences in amazing ways, like ChatGPT or Bard.
These models learn in stages, kind of like going to school. First, they're just exposed to tons of text – that's the unsupervised pre-training. It's like letting them wander around a library and soak everything up.
Then comes supervised fine-tuning, where they get direct instruction: "Here's a question, here's the right answer." But what about learning from mistakes?
That's where this paper comes in. It looks at the final phase of training, where these models are shown negative examples - incorrect answers, rejected responses, the AI equivalent of a big, red "X". Think of it like teaching a dog to sit. You don't just reward the "sit," you also correct the "stand" or "lie down" at the wrong time.
The researchers used a clever technique called "Likra" to carefully control how much influence these negative examples had. Imagine Likra as a volume knob for "wrongness." They wanted to see what happens when you turn it up or down.
They focused on multiple-choice questions, which provides a clear way to define "right" and "wrong." What they found was really interesting:
So, why does this matter? Well, for a few reasons:
This research suggests that showing AI what not to do is just as important as showing it what to do. It's about teaching these models to not just memorize, but to truly understand.
Here are a couple of things that popped into my head while prepping this:
Alright learning crew, that’s the tea on negative examples in LLMs. Let me know what you think!
By ernestasposkusAlright learning crew, Ernis here, ready to dive into some fascinating research about how we teach AI to be, well, less wrong! We're talking about Large Language Models – think of them as super-smart parrots that can string together sentences in amazing ways, like ChatGPT or Bard.
These models learn in stages, kind of like going to school. First, they're just exposed to tons of text – that's the unsupervised pre-training. It's like letting them wander around a library and soak everything up.
Then comes supervised fine-tuning, where they get direct instruction: "Here's a question, here's the right answer." But what about learning from mistakes?
That's where this paper comes in. It looks at the final phase of training, where these models are shown negative examples - incorrect answers, rejected responses, the AI equivalent of a big, red "X". Think of it like teaching a dog to sit. You don't just reward the "sit," you also correct the "stand" or "lie down" at the wrong time.
The researchers used a clever technique called "Likra" to carefully control how much influence these negative examples had. Imagine Likra as a volume knob for "wrongness." They wanted to see what happens when you turn it up or down.
They focused on multiple-choice questions, which provides a clear way to define "right" and "wrong." What they found was really interesting:
So, why does this matter? Well, for a few reasons:
This research suggests that showing AI what not to do is just as important as showing it what to do. It's about teaching these models to not just memorize, but to truly understand.
Here are a couple of things that popped into my head while prepping this:
Alright learning crew, that’s the tea on negative examples in LLMs. Let me know what you think!