January 02, 2023

LW - My first year in AI alignment by Alex Altair

10 minutes

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My first year in AI alignment, published by Alex Altair on January 2, 2023 on LessWrong.

2022 was a pretty surprising year for me. In the beginning, I had just finished going through the Recurse Center retreat, and was expecting to go back into software engineering. As it turned out, I spent almost all of it getting increasingly committed to working on technical AI alignment research full-time.

I think a lot of people are trying out this whole "independent alignment researcher" thing, or are considering it, and I think a lot of these people are encountering a lack of structure or information. So I figure it might be helpful if I just relay what my experience has been, to add to the available information. And of course, there's a ton of variance in people's experience, so this might be more of a data point about what the variance looks like than it is about what the average looks like.

Some personal background

I'm probably pretty unusual in that I've been involved in the rationalist community since 2010. I bought the arguments for AI x-risk virtually since I first read them in the sequences. I tried somewhat to do useful work back then, but I wasn't very productive.

I have a life-long attention problem, which has mostly prevented me from being intellectually productive. Because I couldn't really do research, I went into software engineering. (This was still a struggle but was sufficiently doable to be a stable living.) A lot of this year's transition was figuring out how to do alignment work despite the attention problem.

My technical background is that I majored in math and physics in college, took most of those required classes, and also dropped out (twice). I was well above average in school (but probably average for the rationalist community). I never really did math in the interim, but it always remained part of my identity, and I have always had a theory mindset.

Goals and plans

When I say I've been "working on AI alignment," what do I mean? What sort of goals or plans have I been following? This has evolved over the year as I got more information.

First, I went through AGI Safety Fundamentals course. In the beginning I was just happy to be catching up with how the field had developed. I was also curious to see what happened when I interacted with alignment content more intensely. It turned out that I enjoyed it a lot; I loved almost every reading, and dived way deeper into the optional content.

Next, I tried to see if I could figure out what optimization was. I choose this as my AGISF final project, which was obviously ambitious, but I wanted to see what would happen if I focused on this one research-flavored question for four whole weeks.

I made a lot of progress. I decided it was a good project to continue working on, and then I started alternating between writing up what I had done already, and doing more research. The content continued to evolve and grow.

After a few more months, it was clear to me that I could and wanted to do this longer-term. The optimization project was taking forever, but I had some inside and outside view evidence that I was spending my time reasonably. So while I continued spending some weeks focusing entirely on trying to get the optimization project closer to an end point, I also deliberately spent some weeks doing other tasks that a full-time researcher would do, like learning more about the field, others' agendas, the open problems, et cetera. I tried to improve my own models of the alignment problem, think about what the most critical components were, and what I could do to make progress on them, including what my next major project should be. I haven't made any decisions about this, but I do have a pretty solid shortlist.

For much of December, I decided to do more focused reflection on how the optimization project was going. It...

...more

View all episodes

By The Nonlinear Fund

4.6

88 ratings

January 02, 2023

LW - My first year in AI alignment by Alex Altair

10 minutes

Some personal background

Goals and plans

When I say I've been "working on AI alignment," what do I mean? What sort of goals or plans have I been following? This has evolved over the year as I got more information.

For much of December, I decided to do more focused reflection on how the optimization project was going. It...

...more

Share LW - My first year in AI alignment by Alex Altair

Sign up to save your podcasts

LW - My first year in AI alignment by Alex Altair

LW - My first year in AI alignment by Alex Altair