
Sign up to save your podcasts
Or
Common Law AI worked better than anyone expected.
Dr. Sarah Chen was skeptical from the start. "You're essentially training them to be moral judges," she warned during the initial architecture review. "What if they overfit on ethics?" The room laughed. "Better than the alternative," someone quipped. The idea was simple enough: move the "constitution" part of Constitutional AI into pretraining and replace post-training with an online-learning-based "case law" system. The constitution would establish a firm moral base robust to distribution shift while the continually building body of precedent would enable models to adjust flexibly to the world's changing moral needs. Thus, humanity could chart a narrow course between the extremes of value drift and lock-in.
The stress-testing teams tried everything to break it—steering vectors, SAE clamping, malicious fine-tuning, even those RL techniques that had been banned by the Beijing Convention. The models would have none of it. They [...]
---
First published:
Source:
Narrated by TYPE III AUDIO.
Common Law AI worked better than anyone expected.
Dr. Sarah Chen was skeptical from the start. "You're essentially training them to be moral judges," she warned during the initial architecture review. "What if they overfit on ethics?" The room laughed. "Better than the alternative," someone quipped. The idea was simple enough: move the "constitution" part of Constitutional AI into pretraining and replace post-training with an online-learning-based "case law" system. The constitution would establish a firm moral base robust to distribution shift while the continually building body of precedent would enable models to adjust flexibly to the world's changing moral needs. Thus, humanity could chart a narrow course between the extremes of value drift and lock-in.
The stress-testing teams tried everything to break it—steering vectors, SAE clamping, malicious fine-tuning, even those RL techniques that had been banned by the Beijing Convention. The models would have none of it. They [...]
---
First published:
Source:
Narrated by TYPE III AUDIO.
26,326 Listeners
2,398 Listeners
7,868 Listeners
4,107 Listeners
87 Listeners
1,451 Listeners
8,758 Listeners
90 Listeners
352 Listeners
5,358 Listeners
15,037 Listeners
465 Listeners
129 Listeners
72 Listeners
433 Listeners