
Sign up to save your podcasts
Or


Audio note: this article contains 59 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description.
Summary: Both our (UK AISI's) debate safety case sketch and Anthropic's research agenda point at systematic human error as a weak point for debate. This post talks through how one might strengthen a debate protocol to partially mitigate this.
Not too many errors in unknown places
The complexity theory models of debate assume some expensive verifier machine _M_ with access to a human oracle, such that
Typically, _M_ is some recursive tree computation, where for simplicity we can think of human oracle queries as occurring at the leaves [...]
---
Outline:
(00:39) Not too many errors in unknown places
(04:01) A protocol that handles an _\\varepsilon_\-fraction of errors
(05:26) What distribution do we measure errors against?
(06:43) Cross-examination-like protocols
(08:27) Collaborate with us
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
By LessWrong
Audio note: this article contains 59 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description.
Summary: Both our (UK AISI's) debate safety case sketch and Anthropic's research agenda point at systematic human error as a weak point for debate. This post talks through how one might strengthen a debate protocol to partially mitigate this.
Not too many errors in unknown places
The complexity theory models of debate assume some expensive verifier machine _M_ with access to a human oracle, such that
Typically, _M_ is some recursive tree computation, where for simplicity we can think of human oracle queries as occurring at the leaves [...]
---
Outline:
(00:39) Not too many errors in unknown places
(04:01) A protocol that handles an _\\varepsilon_\-fraction of errors
(05:26) What distribution do we measure errors against?
(06:43) Cross-examination-like protocols
(08:27) Collaborate with us
---
First published:
Source:
---
Narrated by TYPE III AUDIO.

112,234 Listeners

131 Listeners

7,230 Listeners

562 Listeners

16,230 Listeners

4 Listeners

14 Listeners

2 Listeners