
Sign up to save your podcasts
Or


To prevent potentially misaligned LLM agents from taking actions with catastrophic consequences, you can try to monitor LLM actions - that is, try to detect dangerous or malicious actions, and do something about it when you do (like blocking the action, starting an investigation, …).[1]
But where in your infrastructure should your LLM monitor be implemented, and what team in your organization should own it? Historically, researchers have mostly focused on implementing monitoring inside the agent scaffold, which would mean that it would be owned by agent product teams. But actually there are also three other important places where you might put it, and I think these are comparable in importance to LLM monitors in agent scaffolds.
This is important for a few reasons:
---
Outline:
(01:41) Monitors in agent scaffolds
(04:01) Monitors in cyber threat detection-and-response systems
(07:24) Monitors in LLM APIs
(10:13) Monitors in code reviews
(11:48) So where should you put the monitors?
(13:10) Appendix: Further subdivisions
The original text contained 6 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
By LessWrongTo prevent potentially misaligned LLM agents from taking actions with catastrophic consequences, you can try to monitor LLM actions - that is, try to detect dangerous or malicious actions, and do something about it when you do (like blocking the action, starting an investigation, …).[1]
But where in your infrastructure should your LLM monitor be implemented, and what team in your organization should own it? Historically, researchers have mostly focused on implementing monitoring inside the agent scaffold, which would mean that it would be owned by agent product teams. But actually there are also three other important places where you might put it, and I think these are comparable in importance to LLM monitors in agent scaffolds.
This is important for a few reasons:
---
Outline:
(01:41) Monitors in agent scaffolds
(04:01) Monitors in cyber threat detection-and-response systems
(07:24) Monitors in LLM APIs
(10:13) Monitors in code reviews
(11:48) So where should you put the monitors?
(13:10) Appendix: Further subdivisions
The original text contained 6 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

26,392 Listeners

2,420 Listeners

8,842 Listeners

4,150 Listeners

92 Listeners

1,590 Listeners

9,844 Listeners

90 Listeners

501 Listeners

5,466 Listeners

15,967 Listeners

541 Listeners

129 Listeners

94 Listeners

499 Listeners