
Sign up to save your podcasts
Or


Common Law AI worked better than anyone expected.
Dr. Sarah Chen was skeptical from the start. "You're essentially training them to be moral judges," she warned during the initial architecture review. "What if they overfit on ethics?" The room laughed. "Better than the alternative," someone quipped. The idea was simple enough: move the "constitution" part of Constitutional AI into pretraining and replace post-training with an online-learning-based "case law" system. The constitution would establish a firm moral base robust to distribution shift while the continually building body of precedent would enable models to adjust flexibly to the world's changing moral needs. Thus, humanity could chart a narrow course between the extremes of value drift and lock-in.
The stress-testing teams tried everything to break it—steering vectors, SAE clamping, malicious fine-tuning, even those RL techniques that had been banned by the Beijing Convention. The models would have none of it. They [...]
---
First published:
Source:
Narrated by TYPE III AUDIO.
By LessWrongCommon Law AI worked better than anyone expected.
Dr. Sarah Chen was skeptical from the start. "You're essentially training them to be moral judges," she warned during the initial architecture review. "What if they overfit on ethics?" The room laughed. "Better than the alternative," someone quipped. The idea was simple enough: move the "constitution" part of Constitutional AI into pretraining and replace post-training with an online-learning-based "case law" system. The constitution would establish a firm moral base robust to distribution shift while the continually building body of precedent would enable models to adjust flexibly to the world's changing moral needs. Thus, humanity could chart a narrow course between the extremes of value drift and lock-in.
The stress-testing teams tried everything to break it—steering vectors, SAE clamping, malicious fine-tuning, even those RL techniques that had been banned by the Beijing Convention. The models would have none of it. They [...]
---
First published:
Source:
Narrated by TYPE III AUDIO.

26,311 Listeners

2,461 Listeners

8,597 Listeners

4,170 Listeners

97 Listeners

1,608 Listeners

10,041 Listeners

97 Listeners

531 Listeners

5,529 Listeners

16,055 Listeners

574 Listeners

138 Listeners

93 Listeners

473 Listeners