Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: InternLM - China's Best (Unverified), published by Lao Mein on June 9, 2023 on LessWrong.
A technical report of InternLM on 6/7. It consisted of 104 billion parameters, was trained on 1.6 trillion tokens, and was fine-tuned for performance in Chinese.
The authors claimed that it performed second-best on the Chinese language benchmark C-Eval, right after GPT4. In addition, it performed at the level of GPT3.5 in one-shot MMLU. A version fine-tuned for programming also performed similarly to GPT3.5 in coding benchmarks like HumanEval.
Notable takeaways:
Significant effort was put into parallelization to help evade the US chip ban. I don't know how impressive this actually is.
It achieved GPT3.5-level performance with similar-ish levels of compute and data. The China-America algorithmic gap is shrinking.
My gut feeling is that the model was very specifically fine-tuned for performing well on standardized tests, especially those in Chinese (GK/Gao Kao is the Chinese College entrance exam). It was also consistently bad with math.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.