LessWrong (30+ Karma) by LessWrong

October 27, 2024 “AI Safety Camp 10” by Robert Kralisch, Linda Linsefors, Remmelt

We are pleased to announce that the 10th version of the AI Safety Camp is now entering the team member application phase!

We again have a wide range of projects this year, so check them out to see if you or someone you know might be interested in applying to join one of them.

You can find all of the projects and the application form on our website, or directly apply here. The deadline for team member applications is November 17th (Sunday).

Below, we are including the categories and summaries of all the projects that will run in AISC 10.

Stop/Pause AI

(1) Growing PauseAI

Project Lead: Chris Gerrby

Summary

This project focuses on creating internal and external guides for PauseAI to increase active membership. The outputs will be used by the team of volunteers with high context and engagement, including the key decision makers.

Activism [...]

---

Outline:

(00:39) Stop/Pause AI

(00:42) (1) Growing PauseAI

(00:50) Summary

(01:56) (2) Grassroots Communication and Lobbying Strategy for PauseAI

(02:06) Summary

(03:08) (3) AI Policy Course: AI's capacity of exploiting existing legal structures and rights

(03:20) Summary

(04:37) (4) Building the Pause Button: A Proposal for AI Compute Governance

(04:49) Summary

(05:28) (5) Stop AI Video Sharing Campaign

(05:38) Summary

(06:29) Evaluate risks from AI

(06:33) (6) Write Blogpost on Simulator Theory

(06:42) Summary

(07:07) (7) Formalize the Hashiness Model of AGI Uncontainability

(07:16) Summary

(08:11) (8) LLMs: Can They Science?

(08:20) Summary

(09:19) (9) Measuring Precursors to Situationally Aware Reward Hacking

(09:28) Summary

(10:11) (10) Develop New Sycophancy Benchmarks

(10:19) Summary

(11:01) (11) Agency Overhang as a Proxy for Sharp Left Turn

(11:11) Summary

(12:22) Mech-Interp

(12:25) (12) Understanding the Reasoning Capabilities of LLMs

(12:35) Summary

(13:47) (13) Mechanistic Interpretability via Learning Differential Equations

(13:57) Summary

(14:32) (14) Towards Understanding Features

(14:41) Summary

(15:55) (15) Towards Ambitious Mechanistic Interpretability II

(16:05) Summary

(17:40) Agent Foundations

(17:44) (16) Understanding Trust

(17:52) Summary

(19:19) (17) Understand Intelligence

(19:28) Summary

(20:32) (18) Applications of Factored Space Models: Agents, Interventions and Efficient Inference

(20:44) Summary

(21:54) Prevent Jailbreaks/Misuse

(21:58) (19) Preventing Adversarial Reward Optimization

(22:08) Summary

(23:37) (20) Evaluating LLM Safety in a Multilingual World

(23:47) Summary

(24:49) (21) Enhancing Multi-Turn Human Jailbreaks Dataset for Improved LLM Defenses

(25:01) Summary

(25:43) Train Aligned/Helper AIs

(25:47) (22) AI Safety Scientist

(25:56) Summary

(26:26) (23) Wise AI Advisers via Imitation Learning

(26:36) Summary

(28:19) (24) iVAIS: Ideally Virtuous AI System with Virtue as its Deep Character

(28:32) Summary

(29:15) (25) Exploring Rudimentary Value Steering Techniques

(29:25) Summary

(31:24) (26) Autostructures – for Research and Policy

(31:34) Summary

(32:04) Other

(32:07) (27) Reinforcement Learning from Recursive Information Market Feedback

(32:18) Summary

(32:37) (28) Explainability through Causality and Elegance

(32:46) Summary

(34:14) (29) Leveraging Neuroscience for AI Safety

(34:23) Summary

(35:17) (30) Scalable Soft Optimization

(35:26) Summary

(36:02) (31) AI Rights for Human Safety

(36:11) Summary

(37:27) (32) Universal Values and Proactive AI Safety

(37:37) Summary

(42:13) Apply Now

---

First published:

October 26th, 2024

Source:

https://www.lesswrong.com/posts/57wx7B3GQavvKkPne/ai-safety-camp-10

---

Narrated by TYPE III AUDIO.

...more

Share LessWrong (30+ Karma)

Sign up to save your podcasts

LessWrong (30+ Karma)

FAQs about LessWrong (30+ Karma):

How many episodes does LessWrong (30+ Karma) have?

LessWrong (30+ Karma) episodes:

FAQs about LessWrong (30+ Karma):

How many episodes does LessWrong (30+ Karma) have?

More shows like LessWrong (30+ Karma)

Making Sense with Sam Harris

Conversations with Tyler

The Peter Attia Drive

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas

ManifoldOne

Your Undivided Attention

All-In with Chamath, Jason, Sacks & Friedberg

Machine Learning Street Talk (MLST)

Dwarkesh Podcast

Hard Fork

The Ezra Klein Show

Moonshots with Peter Diamandis

No Priors: Artificial Intelligence | Technology | Startups

Latent Space: The AI Engineer Podcast

BG2Pod with Brad Gerstner and Bill Gurley