Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: An anthropomorphic AI dilemma, published by Tsvi Benson-Tilsen on May 7, 2023 on The AI Alignment Forum.
[Metadata: crossposted from. First completed January 21, 2023.]
Either generally-human-level AI will work internally like humans work internally, or not. If generally-human-level AI works like humans, then takeoff can be very fast, because in silico minds that work like humans are very scalable. If generally-human-level AI does not work like humans, then intent alignment is hard because we can't use our familiarity with human minds to understand the implications of what the AI is thinking or to understand what the AI is trying to do.
Terms
Here generally-human-level AI means artificial systems that can perform almost all tasks at or above the level of a human competent at that task (including tasks that take a long time).
Here intent alignment means making artificial systems try to do what we, on reflection and all things considered, would want them to do. A system that doesn't have intents in this sense--i.e. isn't well-described as trying to do something--cannot be intent aligned.
Here "how a mind works" points at the generators of the mind's ability to affect the world, and the determiners of [in what direction to push the world] a mind applies its capabilities.
Let A be some generally-human-level AI system (the first one, say).
(Here "dilemma" just means "two things, one of which must be true", not necessarily a problem.)
The anthropomorphy dichotomy
Either A will work internally like humans work internally, or not.
Well, of course it's a spectrum, not a binary dichotomy, and a high-dimensional space, not a spectrum. So a bit more precisely: either A will work internally very much like how humans work internally, or very much not like how humans work internally, or some more ambiguous mix.
To firmly avoid triviality: to qualify as "very much like" how humans work, in the sense used here, it is definitely neither necessary nor sufficient to enjoy walking barefoot on the beach, to get angry, to grow old, to have confirmation bias, to be slow at arithmetic, to be hairy, to have some tens of thousands of named concepts, to be distractible, and so on. What's necessary and sufficient is to share most of the dark matter generators of human-level capabilities. (To say that, is to presume that most of the generators of human-level capabilities aren't presently understood explicitly; really the condition is just about sharing the generators.)
Anthropomorphic AI is scalable
Suppose that A does share with humans the dark matter generators of human-level capabilities. Then A is very scalable, i.e. can be tweaked without too much difficulty so that it becomes much more capable than A. The basic case is here, in the section "The AI Advantage". (That link could be taken as arguing that AI will be non-anthropomorphic in some sense, but from the perspective of this essay, the arguments given there are of the form: here's how, starting from the human (anthropomorphic) baseline, it's visibly possible to make tweaks that greatly increase capabilities.) To add two (overlapping) points:
Simple network effects might be very powerful. If ideas combine at some low rate with other ideas and with tasks, then scaling up the number of idea-slots (analogous to cortex) would give superlinear returns, just using the same sort of thinking that humans use. Imagine if every thought in the world was made available to you, so that as you're having your thoughts, the most especially relevant knowledge held by anyone brings itself to your attention, and is already your own in the same way as your knowledge is already your own. Imagine if the most productive philosophical, mathematical, and scientific quirks of thought, once called forth by some private struggles, were instantly copyable to ten thous...