
Sign up to save your podcasts
Or


Link to our arXiv paper: https://arxiv.org/abs/2507.15886
TL, DR;: We study how to efficiently combine multiple monitors with different performance and cost profiles into a single protocol. When the number of monitors is small (2-4), one should simply brute force search over all monitor invocation strategies (i.e., which monitor to call first, when to call an additional monitor). For every strategy, apply the Neyman Pearson lemma and audit regions with the highest likelihood ratios _ell(m) = frac{P(M = m |text{Output is misaligned})}{P(M = m|text{Output is aligned})}_ . Measure the recall and pick the best one.
Example of an optimal way of combining monitors given some costs and budgetIn this example, you call your black box monitor only if the probe monitor score is between -0.01 and 2.18, and you audit everywhere with a log likelihood ratio above -0.32 (above the green line). This set of parameters achieves the [...]
---
Outline:
(01:16) Introduction
(02:49) Two example monitors
(04:26) Optimization problem with two monitors
(05:21) Picking where to audit given monitor calls
(08:59) Just brute force where to call monitors
(10:27) Summary:
(11:14) Some more examples
(12:10) Pareto Frontiers with two monitors v. only one monitor
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
By LessWrong Link to our arXiv paper: https://arxiv.org/abs/2507.15886
TL, DR;: We study how to efficiently combine multiple monitors with different performance and cost profiles into a single protocol. When the number of monitors is small (2-4), one should simply brute force search over all monitor invocation strategies (i.e., which monitor to call first, when to call an additional monitor). For every strategy, apply the Neyman Pearson lemma and audit regions with the highest likelihood ratios _ell(m) = frac{P(M = m |text{Output is misaligned})}{P(M = m|text{Output is aligned})}_ . Measure the recall and pick the best one.
Example of an optimal way of combining monitors given some costs and budgetIn this example, you call your black box monitor only if the probe monitor score is between -0.01 and 2.18, and you audit everywhere with a log likelihood ratio above -0.32 (above the green line). This set of parameters achieves the [...]
---
Outline:
(01:16) Introduction
(02:49) Two example monitors
(04:26) Optimization problem with two monitors
(05:21) Picking where to audit given monitor calls
(08:59) Just brute force where to call monitors
(10:27) Summary:
(11:14) Some more examples
(12:10) Pareto Frontiers with two monitors v. only one monitor
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

26,375 Listeners

2,424 Listeners

8,934 Listeners

4,153 Listeners

92 Listeners

1,594 Listeners

9,907 Listeners

90 Listeners

75 Listeners

5,469 Listeners

16,043 Listeners

539 Listeners

130 Listeners

95 Listeners

503 Listeners