
Sign up to save your podcasts
Or
Produced as part of the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort under the supervision of Evan Hubinger.
Acknowledgements: Thanks to Kyle Brady for his many contributions to this project.
Abstract.This post argues that the performance elicited by fine-tuning an LLM on a task using a given prompt format does not usefully bound the level of performance observed when the same information is presented in a different structure. Thus, fine-tuned performance provides very little information about the best performance that would be achieved by a large number of actors fine-tuning models with random prompting schemes in parallel.
In particular, we find that we get much better results from fine-tuning gpt-3.5-turbo (ChatGPT 3.5) to play chess when the game so far is presented in a single block of SAN[1] than when the game so far is separated into a series of SAN moves presented as [...]
---
Outline:
(03:06) General Setting
(04:44) LLM Prompting is Very Brittle and Can Elicit a Range of Behaviors
(06:46) Experiment 1: Semantic Prompt Variations
(09:09) Experiment 2: Structural Prompt Variations
(11:31) LLM Fine-tuning is Brittle and Can Elicit a Range of Behaviors
(14:21) Conclusion
(16:16) What's next?
(16:47) Appendix: Prompt specifications
(17:01) Completions model:
(17:23) Chat model tournament notation:
(18:09) Chat model move-by-move notation:
The original text contained 6 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
Produced as part of the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort under the supervision of Evan Hubinger.
Acknowledgements: Thanks to Kyle Brady for his many contributions to this project.
Abstract.This post argues that the performance elicited by fine-tuning an LLM on a task using a given prompt format does not usefully bound the level of performance observed when the same information is presented in a different structure. Thus, fine-tuned performance provides very little information about the best performance that would be achieved by a large number of actors fine-tuning models with random prompting schemes in parallel.
In particular, we find that we get much better results from fine-tuning gpt-3.5-turbo (ChatGPT 3.5) to play chess when the game so far is presented in a single block of SAN[1] than when the game so far is separated into a series of SAN moves presented as [...]
---
Outline:
(03:06) General Setting
(04:44) LLM Prompting is Very Brittle and Can Elicit a Range of Behaviors
(06:46) Experiment 1: Semantic Prompt Variations
(09:09) Experiment 2: Structural Prompt Variations
(11:31) LLM Fine-tuning is Brittle and Can Elicit a Range of Behaviors
(14:21) Conclusion
(16:16) What's next?
(16:47) Appendix: Prompt specifications
(17:01) Completions model:
(17:23) Chat model tournament notation:
(18:09) Chat model move-by-move notation:
The original text contained 6 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
26,434 Listeners
2,388 Listeners
7,906 Listeners
4,133 Listeners
87 Listeners
1,462 Listeners
9,095 Listeners
87 Listeners
389 Listeners
5,429 Listeners
15,174 Listeners
474 Listeners
121 Listeners
75 Listeners
459 Listeners