
Sign up to save your podcasts
Or


Xiaohongshu's dots.llm1, a new open-source large language model utilizing a Mixture of Experts (MoE) architecture with 142 billion total parameters and 14 billion active parameters during inference.
A key feature highlighted is its extensive pretraining on 11.2 trillion high-quality, non-synthetic tokens, alongside a 32K token context window. Released under the permissive MIT license, the model includes intermediate training checkpoints to support research.
The text discusses the advantages and challenges of the MoE architecture compared to dense models and notes dots.llm1's strong performance, particularly in Chinese language tasks, positioning it competitively within the evolving global landscape of open-source AI, particularly among Chinese technology firms.
By Benjamin Alloul πͺ π
½π
Ύππ
΄π
±π
Ύπ
Ύπ
Ίπ
»π
ΌXiaohongshu's dots.llm1, a new open-source large language model utilizing a Mixture of Experts (MoE) architecture with 142 billion total parameters and 14 billion active parameters during inference.
A key feature highlighted is its extensive pretraining on 11.2 trillion high-quality, non-synthetic tokens, alongside a 32K token context window. Released under the permissive MIT license, the model includes intermediate training checkpoints to support research.
The text discusses the advantages and challenges of the MoE architecture compared to dense models and notes dots.llm1's strong performance, particularly in Chinese language tasks, positioning it competitively within the evolving global landscape of open-source AI, particularly among Chinese technology firms.