We are pleased to announce that the 10th version of the AI Safety Camp is now entering the team member application phase!
We again have a wide range of projects this year, so check them out to see if you or someone you know might be interested in applying to join one of them.
You can find all of the projects and the application form on our website, or directly apply here. The deadline for team member applications is November 17th (Sunday).
Below, we are including the categories and summaries of all the projects that will run in AISC 10.
Stop/Pause AI
(1) Growing PauseAI
Project Lead: Chris Gerrby
Summary
This project focuses on creating internal and external guides for PauseAI to increase active membership. The outputs will be used by the team of volunteers with high context and engagement, including the key decision makers.
Activism [...]
---
Outline:
(00:39) Stop/Pause AI
(00:42) (1) Growing PauseAI
(00:50) Summary
(01:56) (2) Grassroots Communication and Lobbying Strategy for PauseAI
(02:06) Summary
(03:08) (3) AI Policy Course: AI's capacity of exploiting existing legal structures and rights
(03:20) Summary
(04:37) (4) Building the Pause Button: A Proposal for AI Compute Governance
(04:49) Summary
(05:28) (5) Stop AI Video Sharing Campaign
(05:38) Summary
(06:29) Evaluate risks from AI
(06:33) (6) Write Blogpost on Simulator Theory
(06:42) Summary
(07:07) (7) Formalize the Hashiness Model of AGI Uncontainability
(07:16) Summary
(08:11) (8) LLMs: Can They Science?
(08:20) Summary
(09:19) (9) Measuring Precursors to Situationally Aware Reward Hacking
(09:28) Summary
(10:11) (10) Develop New Sycophancy Benchmarks
(10:19) Summary
(11:01) (11) Agency Overhang as a Proxy for Sharp Left Turn
(11:11) Summary
(12:22) Mech-Interp
(12:25) (12) Understanding the Reasoning Capabilities of LLMs
(12:35) Summary
(13:47) (13) Mechanistic Interpretability via Learning Differential Equations
(13:57) Summary
(14:32) (14) Towards Understanding Features
(14:41) Summary
(15:55) (15) Towards Ambitious Mechanistic Interpretability II
(16:05) Summary
(17:40) Agent Foundations
(17:44) (16) Understanding Trust
(17:52) Summary
(19:19) (17) Understand Intelligence
(19:28) Summary
(20:32) (18) Applications of Factored Space Models: Agents, Interventions and Efficient Inference
(20:44) Summary
(21:54) Prevent Jailbreaks/Misuse
(21:58) (19) Preventing Adversarial Reward Optimization
(22:08) Summary
(23:37) (20) Evaluating LLM Safety in a Multilingual World
(23:47) Summary
(24:49) (21) Enhancing Multi-Turn Human Jailbreaks Dataset for Improved LLM Defenses
(25:01) Summary
(25:43) Train Aligned/Helper AIs
(25:47) (22) AI Safety Scientist
(25:56) Summary
(26:26) (23) Wise AI Advisers via Imitation Learning
(26:36) Summary
(28:19) (24) iVAIS: Ideally Virtuous AI System with Virtue as its Deep Character
(28:32) Summary
(29:15) (25) Exploring Rudimentary Value Steering Techniques
(29:25) Summary
(31:24) (26) Autostructures – for Research and Policy
(31:34) Summary
(32:04) Other
(32:07) (27) Reinforcement Learning from Recursive Information Market Feedback
(32:18) Summary
(32:37) (28) Explainability through Causality and Elegance
(32:46) Summary
(34:14) (29) Leveraging Neuroscience for AI Safety
(34:23) Summary
(35:17) (30) Scalable Soft Optimization
(35:26) Summary
(36:02) (31) AI Rights for Human Safety
(36:11) Summary
(37:27) (32) Universal Values and Proactive AI Safety
(37:37) Summary
(42:13) Apply Now
---