
Sign up to save your podcasts
Or
This document describes Hunyuan-Large, a large open-source language model developed by Tencent. This model utilizes a Mixture of Experts (MoE) architecture, which leverages multiple specialized sub-models to improve performance on a variety of tasks. Hunyuan-Large was trained on a massive dataset, including a significant amount of synthetic data, and utilizes several techniques to optimize performance, such as key-value cache compression, expert routing, and expert-specific learning rate scaling. The model is evaluated on a wide range of benchmarks, demonstrating its superior capabilities in areas such as language understanding, generation, logical reasoning, mathematics, coding, and long-context tasks. Hunyuan-Large's code and checkpoints are publicly available, aiming to accelerate future innovations and applications within the LLM community.
https://arxiv.org/pdf/2411.02265
This document describes Hunyuan-Large, a large open-source language model developed by Tencent. This model utilizes a Mixture of Experts (MoE) architecture, which leverages multiple specialized sub-models to improve performance on a variety of tasks. Hunyuan-Large was trained on a massive dataset, including a significant amount of synthetic data, and utilizes several techniques to optimize performance, such as key-value cache compression, expert routing, and expert-specific learning rate scaling. The model is evaluated on a wide range of benchmarks, demonstrating its superior capabilities in areas such as language understanding, generation, logical reasoning, mathematics, coding, and long-context tasks. Hunyuan-Large's code and checkpoints are publicly available, aiming to accelerate future innovations and applications within the LLM community.
https://arxiv.org/pdf/2411.02265