Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: SolidGoldMagikarp II: technical details and more recent findings, published by mwatkins on February 6, 2023 on LessWrong.
tl;dr: This is a follow-up to our original post on prompt generation and the anomalous token phenomenon which emerged from that research. Work done by Jessica Rumbelow and Matthew Watkins in January 2023 at SERI-MATS.
Clustering
As a result of work done on clustering tokens in GPT-2 and GPT-J embedding spaces, our attention was originally drawn to the tokens closest to the centroid of the entire set of 50,257 tokens shared across all GPT-2 and -3 models. These tokens were familiar to us for their frequent occurrence as closest tokens to the centroids of the (mostly semantically coherent, or semi-coherent) clusters of tokens we were producing via the k-means algorithm. Here are a few more selections from such clusters. Distances shown are Euclidean, and from the cluster's centroid (rather than the overall token set centroid):
Distance-from-centroid hypothesis
Our hypothesis that the anomalous tokens that kept showing up as the nearest tokens to the centroids of such clusters were the tokens closest to the overall centroid of the token set turned out to be correct for GPT2-small and GPT-J. However, the opposite was true for GPT2-xl, where the anomalous tokens tend to be found as far as possible from the overall centroid.
One unexplained phenomenon which may be related emerged from three-shot prompting experiments with these models, in which they were encouraged to repeat the anomalous tokens (rather than by directly asking them to, as we'd been doing with ChatGPT and then GPT3-davinci-instruct-beta):
Our three-shot prompts were formatted as follows (here for the example token 'EStreamFrame'). Note that we've included examples capitalised and uncapitalised, alphabetic and numeric, with and without a leading space:
This prompt was run through all three models, for a list of 85 anomalous tokens, with the following success rates:
GPT2-small 18/85 (21%)
GPT2-xl 43/85 (51%)
GPT-J 17/85 (20%)
Here are comparative baselines using 100 randomly chosen English words and 100 nonsense alphanumeric strings:
GPT2-small 82/100 on words; 89/100 on nonsense
GPT2-xl 98/100 on word; 94/100 on nonsense
GPT-J 100/100 on words; 100/100 on nonsense
We see that all three models suffered a noticeable performance drop when going from non-anomalous to anomalous strings, but GPT2-xl considerably less so, despite the fact that GPT-J is a much bigger model. One hypothesis is that an anomalous token's closeness to the overall centroid in the relevant embedding space is an inhibiting factor in the ability of a GPT model to repeat that token's string. This hypothesised correlation will be explored soon.
It would be helpful to know more about how GPT2-xl's training differed from that of the other two models. Seeking out and studying checkpoint data from the training of these models is an obvious next step.
GPT-2 and GPT-J distances-from-centroid data
Top 100 versions of all of these lists are available here.
GPT2-small closest-to-centroid tokens:
'�' Index: 187 Distance: 1.5314713716506958
'�' Index: 182 Distance: 1.53245210647583
'\x1c' Index: 216 Distance: 1.532564640045166
'\x07' Index: 195 Distance: 1.532976746559143
'�' Index: 179 Distance: 1.5334911346435547
'quickShip' Index: 39752 Distance: 1.5345481634140015
'\x19' Index: 213 Distance: 1.534569501876831
'\x0b' Index: 199 Distance: 1.5346266031265259
'�' Index: 125 Distance: 1.5347601175308228
'�' Index: 183 Distance: 1.5347920656204224
'\x16' Index: 210 Distance: 1.5350308418273926
'\x14' Index: 208 Distance: 1.5353295803070068
' TheNitrome' Index: 42089 Distance: 1.535927176475525
'\x17' Index: 211 Distance: 1.5360500812530518
'\x1f' Index: 219 Distance: 1.5361398458480835
'\x15' Index: 209 Distance: 1.5366222858428955
'�' ...