July 25, 2025

ERNIE Technical Report

11 minutes

In this episode:
• A New ERNIE on the Block: Linda introduces the new ERNIE 4.5 technical report from Baidu, setting the stage for a discussion on their new family of large-scale foundation models, including their massive 424 billion parameter Mixture-of-Experts model.
• Not Your Average MoE: The hosts discuss the core concept of ERNIE 4.5: its Mixture-of-Experts (MoE) architecture. Linda explains the novel 'heterogeneous' structure with modality-specific experts for vision and text, and Professor Norris comments on the implications for training stability.
• Building a Multimodal Beast: A deep dive into the specific architectural components that enable ERNIE's multimodality. This chapter covers the adaptive-resolution vision encoder, timestamp rendering for video, and the unified 3D positional embeddings for handling text, images, and video seamlessly.
• Training at Scale, Efficiently: Professor Norris and Linda unpack the impressive engineering behind training ERNIE 4.5. They cover the multi-stage training recipe, novel loss functions like Router Orthogonalization, and the remarkable 47% Model FLOPs Utilization.
• From Lab to Production: The discussion shifts to practical applications and deployment. The hosts talk about the aggressive W4A8 and 2-bit quantization schemes, impressive inference speeds, and the open-sourcing of models and toolkits like ERNIEKit and FastDeploy.
• Final Thoughts and Takeaways: Professor Norris and Linda share their final thoughts on the ERNIE 4.5 paper, highlighting its key contributions in efficient multimodal training and the importance of its open-source release for the research community.

...more

View all episodes

By Mechanical Dirk

July 25, 2025

ERNIE Technical Report

11 minutes

...more

Share ERNIE Technical Report

Sign up to save your podcasts

ERNIE Technical Report

ERNIE Technical Report