
Sign up to save your podcasts
Or


In The LLM Has Left The Chat: Evidence of Bail Preferences in Large Language Models, we study giving LLMs the option to end chats, and what they choose to do with that option.
This is a linkpost for that work, along with a casual discussion of my favorite findings.
Bail Taxonomy
Based on continuations of Wildchat conversations (see this link to browse an OpenClio run on the 8319 cases where Qwen-2.5-7B-Instruct bails), we made this taxonomy of situations we found where some LLMs will terminate ("bail from") a conversation when given the option to do so:
Some of these were very surprising to me! Some examples:
---
Outline:
(00:30) Bail Taxonomy
(02:32) Models Losing Faith In Themselves
(03:19) Overbail
(03:51) Qwen roasting the bail prompt
(04:31) Inconsistency between bail methods
(07:28) Being fed outputs from other models in context increased bail rates by up to 4x
(09:43) Relationship Between Refusal and Bail
(10:22) Jailbreaks substantially increase bail rates
(10:57) Refusal Abliterated models (sometimes) increase bail rates
(11:59) Refusal Rate doesnt seem to predict Bail Rate
(12:33) No-Bail Refusals
(13:04) Bails Georg: A model that has high bail rates on everything
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
By LessWrongIn The LLM Has Left The Chat: Evidence of Bail Preferences in Large Language Models, we study giving LLMs the option to end chats, and what they choose to do with that option.
This is a linkpost for that work, along with a casual discussion of my favorite findings.
Bail Taxonomy
Based on continuations of Wildchat conversations (see this link to browse an OpenClio run on the 8319 cases where Qwen-2.5-7B-Instruct bails), we made this taxonomy of situations we found where some LLMs will terminate ("bail from") a conversation when given the option to do so:
Some of these were very surprising to me! Some examples:
---
Outline:
(00:30) Bail Taxonomy
(02:32) Models Losing Faith In Themselves
(03:19) Overbail
(03:51) Qwen roasting the bail prompt
(04:31) Inconsistency between bail methods
(07:28) Being fed outputs from other models in context increased bail rates by up to 4x
(09:43) Relationship Between Refusal and Bail
(10:22) Jailbreaks substantially increase bail rates
(10:57) Refusal Abliterated models (sometimes) increase bail rates
(11:59) Refusal Rate doesnt seem to predict Bail Rate
(12:33) No-Bail Refusals
(13:04) Bails Georg: A model that has high bail rates on everything
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

26,319 Listeners

2,452 Listeners

8,529 Listeners

4,176 Listeners

93 Listeners

1,601 Listeners

9,936 Listeners

95 Listeners

517 Listeners

5,509 Listeners

15,918 Listeners

552 Listeners

131 Listeners

93 Listeners

466 Listeners