Researchers from the LongCat introduced LongCat-Flash-Lite on January 2026, demonstrating that scaling embeddings via N-gram layers outperforms increasing Mixture-of-Experts parameters in high-sparsity regimes. This architecture uses system optimizations and speculative decoding to boost inference speed. Source: January 2026 Scaling Embeddings Outperforms Scaling Experts in Language Models Meituan LongCat Team Hong Liu, Jiaqi Zhang, Chao Wang, Xing Hu, Linkun Lyu, Jiaqi Sun, Xurui Yang, Bo Wang, Fengcun Li, Yulei Qian, Lingtong Si, Yerui Sun, Rumei Li, Peng Pei, Yuchen Xie, Xunliang Cai https://arxiv.org/pdf/2601.21204