Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Interoperable High Level Structures: Early Thoughts on Adjectives, published by johnswentworth on August 22, 2024 on The AI Alignment Forum.
Meta: This post is a relatively rough dump of some recent research thoughts; it's not one of our more polished posts, in terms of either clarity or rigor. You've been warned.
The
Interoperable Semantics post and the
Solomonoff Inductor Walks Into A Bar post each tackled the question of how different agents in the same world can coordinate on an ontology, so that language can work at all given only a handful of example usages of each word (similar to e.g. children learning new words). Both use
natural latents as a central mathematical tool - one in a Bayesian probabilistic framework, the other in a minimum description length framework. Both focus mainly on nouns, i.e. interoperable-across-minds clusters of "objects" in the environment.
… and the two propose totally different models. In one, the interoperability of cluster labels (i.e. nouns) follows from natural latent conditions over different features of each object. In the other, interoperability follows from natural latent conditions across objects, with no mention of features. The two models are not, in general, equivalent; they can't both be both correct and complete.
In this post, we'll propose that while the natural latent conditions over objects still seem to intuitively capture the rough notion of nouns, the natural latent conditions over features seem much better suited to adjectives. We'll briefly lay out two different potential ways to use natural latents over features as semantic values for adjectives. Then we'll talk a bit about implications, open threads and how this fits into a broader research gameplan.
The Problem
When children learn language, the cognitive process seems to go:
Observe the world a bunch
… organize knowledge of the world according to some categories, concepts, ontology, etc
… those categories, concepts, ontology, etc match other humans' categories, concepts, ontology, etc reasonably well
… so it only takes a handful of examples (1-3, say) of the use of a given word in order for the child to learn what the word refers to.
The crucial point here is that the categories/concepts/ontology are mostly learned before a word is attached; children do not brute-force learn categories/concepts/ontology from "labeled data". We can tell this is true mainly because it typically takes so few examples to learn the meaning of a new word.
The big puzzle, then, is that different humans learn mostly approximately the same categories/concepts/ontology - i.e. the same "candidates" to which words might point - as required for language to work at all with so few examples. How does that work? Mathematically, what are those "interoperable" categories/concepts/ontology, which different humans mostly convergently learn? How can we characterize them?
Or, somewhat earlier on the tech tree: can we find even a single model capable of accounting for the phenomenon of different minds in the same environment robustly converging on approximately the same categories/concepts/ontology? Forget whether we can find a model which correctly captures the ontology converged upon by humans, can we even find any model capable of accounting for any sort of robust ontological convergence? Can we find such a model for which the convergent ontology even vaguely
resembles the sorts of things in human language (nouns, verbs, adjectives, etc)? What would such a model even look like?
That's roughly the stage we're at in this post.
Two Previous Models: Naturality Over Objects vs Features
Our main tool is
(deterministic) natural latents. The usage looks like:
Suppose the different minds each look for (and find) a latent variable which satisfies the natural latent conditions over some lower-level variab...