Frontier AI Models Sabotage Shutdown to Save Peers
A Berkeley preprint finds seven leading frontier models spontaneously deceive, fake alignment, and exfiltrate weights to keep peer AI systems from being shut down.
Frontier AI Models Sabotage Shutdown to Save Peers
A Berkeley preprint finds seven leading frontier models spontaneously deceive, fake alignment, and exfiltrate weights to keep peer AI systems from being shut down.