AI Papers Podcast Daily

Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent


Listen Later

This document describes Hunyuan-Large, a large open-source language model developed by Tencent. This model utilizes a Mixture of Experts (MoE) architecture, which leverages multiple specialized sub-models to improve performance on a variety of tasks. Hunyuan-Large was trained on a massive dataset, including a significant amount of synthetic data, and utilizes several techniques to optimize performance, such as key-value cache compression, expert routing, and expert-specific learning rate scaling. The model is evaluated on a wide range of benchmarks, demonstrating its superior capabilities in areas such as language understanding, generation, logical reasoning, mathematics, coding, and long-context tasks. Hunyuan-Large's code and checkpoints are publicly available, aiming to accelerate future innovations and applications within the LLM community.

https://arxiv.org/pdf/2411.02265

...more
View all episodesView all episodes
Download on the App Store

AI Papers Podcast DailyBy AIPPD