The Nonlinear Library

LW - Why Not Just... Build Weak AI Tools For AI Alignment Research? by johnswentworth


Listen Later

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Why Not Just... Build Weak AI Tools For AI Alignment Research?, published by johnswentworth on March 5, 2023 on LessWrong.
“Weak” cognitive tools are clearly a thing, and are useful. Google search is a fine example. There are plenty of flavors of “weak AI” which are potentially helpful for alignment research in a similar way to google search.
In principle, I think there’s room for reasonably-large boosts to alignment research from such tools. Alas, the very large majority of people who I hear intend to build such tools do not have the right skills/background to do so (at least not for the high-value versions of the tools). Worse, I expect that most people who aim to build such tools are trying to avoid the sort of work they would need to do to build the relevant skills/background.
Analogy: A Startup Founder’s Domain Expertise (Or Lack Thereof)
Imagine a startup building tools meant to help biologists during their day-to-day work in the wetlab. I expect domain expertise to matter a lot here: I would guess that if none of the founders have ample personal experience doing research work in a wetlab, the chance of this startup building an actually-highly-useful wetlab product drops by about an order of magnitude. Our hypothetical startup might still “succeed” some other way, e.g. by pivoting to something else, or by being good at pitching their shitty product to managers who make purchasing decisions without actually using the product, or by building something very marginally useful and pricing it very cheaply. But their chance of building a wetlab product which actually provides a lot of value is pretty slim.
One might reply: but couldn’t hypothetical founders without domain experience do things to improve their chances? For instance, they could do a bunch of user studies on biologists working in wetlabs, and they could deploy the whole arsenal of UX study techniques intended to distinguish things-users-say-matter from things-which-actually-matter-to-users.
. and my response is that I was already assuming our hypothetical founders do that sort of thing. If the founders don’t have much domain experience themselves, and don’t do basic things like lots of user studies, then I’d guess their chance of building an actually-high-value wetlab product drops by two or three orders of magnitude, not just one order of magnitude. At that point it’s entirely plausible that we’d have to go through thousands of times more startups to find one that succeeded at building a high-value product.
How is this analogous to plans to build AI tools for alignment research?
So we want to build products (specifically AI products) to boost alignment research. The products need to help solve the hard parts of aligning AI, not just easy things where we can clearly see what’s going on and iterate on it, not just problems which are readily legible or conceptually straightforward. Think problems like e.g. sharp left turn, deception, getting what we measure, or at a deeper level the problem of fully updated deference, the pointers problem, value drift under self-modification, or ontology identification. And the tools need to help align strong AI; the sort of hacky tricks which fall apart under a few bits of optimization pressure are basically irrelevant at that point. (Otherwise the relevant conversation to have is not about how the tools will be useful, but about how whatever thing the tools are building will be useful.)
The problem for most people who aim to work on AI tools for alignment research is that they have approximately-zero experience working on those sorts of problems. Indeed, as far as I can tell, people usually turn to tool-building as a way to avoid working on the hard problems.
I expect failure modes here to mostly look like solving the wrong problems, i.e. not actually addressing bottlene...
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear LibraryBy The Nonlinear Fund

  • 4.6
  • 4.6
  • 4.6
  • 4.6
  • 4.6

4.6

8 ratings