
Sign up to save your podcasts
Or


In The LLM Has Left The Chat: Evidence of Bail Preferences in Large Language Models, we study giving LLMs the option to end chats, and what they choose to do with that option.
This is a linkpost for that work, along with a casual discussion of my favorite findings.
Bail Taxonomy
Based on continuations of Wildchat conversations (see this link to browse an OpenClio run on the 8319 cases where Qwen-2.5-7B-Instruct bails), we made this taxonomy of situations we found where some LLMs will terminate ("bail from") a conversation when given the option to do so:
Some of these were very surprising to me! Some examples:
---
Outline:
(00:30) Bail Taxonomy
(02:32) Models Losing Faith In Themselves
(03:19) Overbail
(03:51) Qwen roasting the bail prompt
(04:31) Inconsistency between bail methods
(07:28) Being fed outputs from other models in context increased bail rates by up to 4x
(09:43) Relationship Between Refusal and Bail
(10:22) Jailbreaks substantially increase bail rates
(10:57) Refusal Abliterated models (sometimes) increase bail rates
(11:59) Refusal Rate doesnt seem to predict Bail Rate
(12:33) No-Bail Refusals
(13:04) Bails Georg: A model that has high bail rates on everything
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
By LessWrongIn The LLM Has Left The Chat: Evidence of Bail Preferences in Large Language Models, we study giving LLMs the option to end chats, and what they choose to do with that option.
This is a linkpost for that work, along with a casual discussion of my favorite findings.
Bail Taxonomy
Based on continuations of Wildchat conversations (see this link to browse an OpenClio run on the 8319 cases where Qwen-2.5-7B-Instruct bails), we made this taxonomy of situations we found where some LLMs will terminate ("bail from") a conversation when given the option to do so:
Some of these were very surprising to me! Some examples:
---
Outline:
(00:30) Bail Taxonomy
(02:32) Models Losing Faith In Themselves
(03:19) Overbail
(03:51) Qwen roasting the bail prompt
(04:31) Inconsistency between bail methods
(07:28) Being fed outputs from other models in context increased bail rates by up to 4x
(09:43) Relationship Between Refusal and Bail
(10:22) Jailbreaks substantially increase bail rates
(10:57) Refusal Abliterated models (sometimes) increase bail rates
(11:59) Refusal Rate doesnt seem to predict Bail Rate
(12:33) No-Bail Refusals
(13:04) Bails Georg: A model that has high bail rates on everything
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

26,388 Listeners

2,424 Listeners

8,267 Listeners

4,145 Listeners

92 Listeners

1,565 Listeners

9,826 Listeners

89 Listeners

488 Listeners

5,475 Listeners

16,083 Listeners

534 Listeners

133 Listeners

96 Listeners

509 Listeners