
Sign up to save your podcasts
Or


Common Law AI worked better than anyone expected.
Dr. Sarah Chen was skeptical from the start. "You're essentially training them to be moral judges," she warned during the initial architecture review. "What if they overfit on ethics?" The room laughed. "Better than the alternative," someone quipped. The idea was simple enough: move the "constitution" part of Constitutional AI into pretraining and replace post-training with an online-learning-based "case law" system. The constitution would establish a firm moral base robust to distribution shift while the continually building body of precedent would enable models to adjust flexibly to the world's changing moral needs. Thus, humanity could chart a narrow course between the extremes of value drift and lock-in.
The stress-testing teams tried everything to break it—steering vectors, SAE clamping, malicious fine-tuning, even those RL techniques that had been banned by the Beijing Convention. The models would have none of it. They [...]
---
First published:
Source:
Narrated by TYPE III AUDIO.
By LessWrongCommon Law AI worked better than anyone expected.
Dr. Sarah Chen was skeptical from the start. "You're essentially training them to be moral judges," she warned during the initial architecture review. "What if they overfit on ethics?" The room laughed. "Better than the alternative," someone quipped. The idea was simple enough: move the "constitution" part of Constitutional AI into pretraining and replace post-training with an online-learning-based "case law" system. The constitution would establish a firm moral base robust to distribution shift while the continually building body of precedent would enable models to adjust flexibly to the world's changing moral needs. Thus, humanity could chart a narrow course between the extremes of value drift and lock-in.
The stress-testing teams tried everything to break it—steering vectors, SAE clamping, malicious fine-tuning, even those RL techniques that had been banned by the Beijing Convention. The models would have none of it. They [...]
---
First published:
Source:
Narrated by TYPE III AUDIO.

112,161 Listeners

131 Listeners

7,228 Listeners

564 Listeners

16,211 Listeners

4 Listeners

14 Listeners

2 Listeners