Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: One Minute Every Moment, published by abramdemski on September 1, 2023 on LessWrong.
About how much information are we keeping in working memory at a given moment?
"Miller's Law" dictates that the number of things humans can hold in working memory is "the magical number 7±2". This idea is derived from Miller's experiments, which tested both random-access memory (where participants must remember call-response pairs, and give the correct response when prompted with a call) and sequential memory (where participants must memorize and recall a list in order). In both cases, 7 is a good rule of thumb for the number of items people can recall reliably.
Miller noticed that the number of "things" people could recall didn't seem to depend much on the sorts of things people were being asked to recall. A random numeral contains about 3.3 bits of information, while a random letter contains about 7.8; yet people were able to recall about the same number of numerals or letters.
Miller concluded that working memory should not be measured in bits, but rather in "chunks"; this is a word for whatever psychologically counts as a "thing".
This idea was further reinforced by memory athletes, who gain the ability to memorize much longer strings of numbers through practice. A commonly-repeated explanation is as follows: memory athletes are not increasing the size of their working memory; rather, they are increasing the size of their "chunks" when it comes to recalling strings of numbers specifically. For someone who rarely needs to recall numbers, individual numerals might be "chunks". For someone who recalls numbers often due to work or hobby, two or three-digit numbers might be "chunks". For a memory athlete who can keep hundreds of digits in mind, perhaps sequences of one hundred digits count as a "chunk".
However, if you're like me, you probably aren't quite comfortable with Miller's rejection of bits as the information currency of the brain. The brain isn't magic. At some level, information is being processed.
I'll run with the idea that chunking is like Huffman codes. Data is compressed by learning a dictionary mapping from a set of "codewords" (which efficiently represent the data) to the decompressed representation. For example, if the word "the" occurs very frequently in our data, we might assign it a very short codeword like "01", while rare words like "lit" might get much longer codewords such as "1011010".
A codeword is sort of like a chunk; it's a "thing" in terms of which we compress. However, different code-words can contain different amounts of information, suggesting that they take up different amounts of space in working memory.
According to this hypothesis, when psychologists such as Miller ask people to remember letters or numbers, the codeword size is about the same, because we're asked to recall individual letters about as often as individual numbers. We don't suddenly adapt our codeword dictionary when we're asked to memorize a sequence of 0s and 1s, so that our memory can store the sequence efficiently at one-bit-per-bit; instead, we use our native representation, which represents "0" and "1" via codewords which are about as long as the codewords for "5" and "j" and so on.
In effect, Miller was vastly underestimating working memory size via naive calculations of size in terms of bits. A string of seven numbers would contain 3.3 7 = 23.1 bits of information if stored at maximal efficiency for the number-remembering task. A string of seven letters would instead contain 7.8 7 = 55 bits, under a similar optimality assumption. But people don't process information in a way that's optimized for psychology experiments; they process information in a way that's optimized for normal life. So, these two estimates of the number of bits in working memory are allowed to be very dif...