
Sign up to save your podcasts
Or


Produced as part of the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort under the supervision of Evan Hubinger.
Acknowledgements: Thanks to Kyle Brady for his many contributions to this project.
Abstract.This post argues that the performance elicited by fine-tuning an LLM on a task using a given prompt format does not usefully bound the level of performance observed when the same information is presented in a different structure. Thus, fine-tuned performance provides very little information about the best performance that would be achieved by a large number of actors fine-tuning models with random prompting schemes in parallel.
In particular, we find that we get much better results from fine-tuning gpt-3.5-turbo (ChatGPT 3.5) to play chess when the game so far is presented in a single block of SAN[1] than when the game so far is separated into a series of SAN moves presented as [...]
---
Outline:
(03:06) General Setting
(04:44) LLM Prompting is Very Brittle and Can Elicit a Range of Behaviors
(06:46) Experiment 1: Semantic Prompt Variations
(09:09) Experiment 2: Structural Prompt Variations
(11:31) LLM Fine-tuning is Brittle and Can Elicit a Range of Behaviors
(14:21) Conclusion
(16:16) What's next?
(16:47) Appendix: Prompt specifications
(17:01) Completions model:
(17:23) Chat model tournament notation:
(18:09) Chat model move-by-move notation:
The original text contained 6 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
By LessWrongProduced as part of the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort under the supervision of Evan Hubinger.
Acknowledgements: Thanks to Kyle Brady for his many contributions to this project.
Abstract.This post argues that the performance elicited by fine-tuning an LLM on a task using a given prompt format does not usefully bound the level of performance observed when the same information is presented in a different structure. Thus, fine-tuned performance provides very little information about the best performance that would be achieved by a large number of actors fine-tuning models with random prompting schemes in parallel.
In particular, we find that we get much better results from fine-tuning gpt-3.5-turbo (ChatGPT 3.5) to play chess when the game so far is presented in a single block of SAN[1] than when the game so far is separated into a series of SAN moves presented as [...]
---
Outline:
(03:06) General Setting
(04:44) LLM Prompting is Very Brittle and Can Elicit a Range of Behaviors
(06:46) Experiment 1: Semantic Prompt Variations
(09:09) Experiment 2: Structural Prompt Variations
(11:31) LLM Fine-tuning is Brittle and Can Elicit a Range of Behaviors
(14:21) Conclusion
(16:16) What's next?
(16:47) Appendix: Prompt specifications
(17:01) Completions model:
(17:23) Chat model tournament notation:
(18:09) Chat model move-by-move notation:
The original text contained 6 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.

112,882 Listeners

130 Listeners

7,216 Listeners

533 Listeners

16,223 Listeners

4 Listeners

14 Listeners

2 Listeners