September 13, 2024

AF - Can startups be impactful in AI safety? by Esben Kran

11 minutes

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Can startups be impactful in AI safety?, published by Esben Kran on September 13, 2024 on The AI Alignment Forum.

With

Lakera's strides in securing LLM APIs,

Goodfire AI's path to scaling interpretability, and 20+

model

evaluations

startups among

much

else, there's a rising number of technical startups attempting to secure the model ecosystem.

Of course, they have varying levels of impact on superintelligence containment and security and even with these companies, there's a lot of potential for aligned, ambitious and high-impact startups within the ecosystem. This point isn't new and has been made in our

posts and by

Eric Ho (Goodfire AI CEO).

To set the stage, our belief is that these are the types of companies that will have a positive impact:

Startups with a profit incentive completely aligned with improving AI safety;

that have a deep technical background to shape AGI deployment and;

do not try to compete with AGI labs.

Piloting AI safety startups

To understand impactful technical AI safety startups better, Apart Research joined forces with collaborators from Juniper Ventures, vectorview (alumni from the latest YC cohort), Rudolf (from the upcoming def/acc cohort), Tangentic AI, and others. We then invited researchers, engineers, and students to resolve a key question "can we come up with ideas that scale AI safety into impactful for-profits?"

The hackathon took place during a weekend two weeks ago with a keynote by

Esben Kran (co-director of Apart) along with 'HackTalks' by

Rudolf Laine (def/acc) and

Lukas Petersson (YC / vectorview). Individual submissions were a 4 page report with the problem statement, why this solution will work, what the key risks of said solution are, and any experiments or demonstrations of the solution the team made.

This post details the top 6 projects and excludes 2 projects that were made private by request (hopefully turning into impactful startups now!). In total, we had 101 signups and 11 final entries. Winners were decided by an LME model conditioned on reviewer bias. Watch the authors' lightning talks

here.

Dark Forest: Making the web more trustworthy with third-party content verification

By Mustafa Yasir (AI for Cyber Defense Research Centre, Alan Turing Institute)

Abstract: 'DarkForest is a pioneering Human Content Verification System (HCVS) designed to safeguard the authenticity of online spaces in the face of increasing AI-generated content. By leveraging graph-based reinforcement learning and blockchain technology, DarkForest proposes a novel approach to safeguarding the authentic and humane web. We aim to become the vanguard in the arms race between AI-generated content and human-centric online spaces.'

Content verification workflow supported by graph-based RL agents deciding verifications

Reviewer comments:

Natalia: Well explained problem with clear need addressed. I love that you included the content creation process - although you don't explicitly address how you would attract content creators to use your platform over others in their process. Perhaps exploring what features of platforms drive creators to each might help you make a compelling case for using yours beyond the verification capabilities.

I would have also liked to see more details on how the verification decision is made and how accurate this is on existing datasets.

Nick: There's a lot of valuable stuff in here regarding content moderation and identity verification. I'd narrow it to one problem-solution pair (e.g.,

"jobs to be done") and focus more on risks around early product validation (deep interviews with a range of potential users and buyers regarding value) and go-to-market. It might also be worth checking out

Musubi.

Read the full project here.

Simulation Operators: An annotation operation for alignment of robot

By Ardy Haroen (USC)

Abstrac...

...more

View all episodes

By The Nonlinear Fund

September 13, 2024

AF - Can startups be impactful in AI safety? by Esben Kran

11 minutes

With

Lakera's strides in securing LLM APIs,

Goodfire AI's path to scaling interpretability, and 20+

model

evaluations

startups among

much

else, there's a rising number of technical startups attempting to secure the model ecosystem.

posts and by

Eric Ho (Goodfire AI CEO).

To set the stage, our belief is that these are the types of companies that will have a positive impact:

Startups with a profit incentive completely aligned with improving AI safety;

that have a deep technical background to shape AGI deployment and;

do not try to compete with AGI labs.

Piloting AI safety startups

The hackathon took place during a weekend two weeks ago with a keynote by

Esben Kran (co-director of Apart) along with 'HackTalks' by

Rudolf Laine (def/acc) and

here.

Dark Forest: Making the web more trustworthy with third-party content verification

By Mustafa Yasir (AI for Cyber Defense Research Centre, Alan Turing Institute)

Content verification workflow supported by graph-based RL agents deciding verifications

Reviewer comments:

I would have also liked to see more details on how the verification decision is made and how accurate this is on existing datasets.

Nick: There's a lot of valuable stuff in here regarding content moderation and identity verification. I'd narrow it to one problem-solution pair (e.g.,

Musubi.

Read the full project here.

Simulation Operators: An annotation operation for alignment of robot

By Ardy Haroen (USC)

Abstrac...

...more

More shows like The Nonlinear Library: Alignment Forum

View all

AXRP - the AI X-risk Research Podcast

8 Listeners

Share AF - Can startups be impactful in AI safety? by Esben Kran

Sign up to save your podcasts

AF - Can startups be impactful in AI safety? by Esben Kran

AF - Can startups be impactful in AI safety? by Esben Kran

More shows like The Nonlinear Library: Alignment Forum

AXRP - the AI X-risk Research Podcast