AI Podcast

By weedge

Latest podcasts about AI Technology and Papers.... more

· Technology

Download on the App Store

Download on the App Store

Get it on Google Play

FAQs about AI Podcast:

How many episodes does AI Podcast have?

The podcast currently has 427 episodes available.

AI Podcast episodes:

January 05, 2025 ConvNeXt: A Modern ConvNet for the 2020s
A podcast discussing the architecture and performance of ConvNeXt, a modern ConvNet model that challenges the dominance of Vision Transformers.
...more
7min
January 05, 2025 AI Vision Podcast: Masked Autoencoders for Scalable Vision Learning
A deep dive into Masked Autoencoders (MAE) and their impact on computer vision, discussing their architecture, training efficiency, and performance on ImageNet and downstream tasks.
...more
6min
January 04, 2025 AI Radio FM - Technology Channel, Your Personal Generative AI Podcast
A podcast discussing the auxiliary-loss-free load balancing strategy for mixture-of-experts models.
...more
6min
January 04, 2025 混合专家模型（MoE）技术综述
本播客深入探讨了混合专家模型（MoE）的最新进展、算法设计、系统实现以及实际应用。从稀疏和密集MoE的背景知识开始，我们提出了一个创新的MoE分类法，并探讨了选通函数、专家网络、训练方案和系统设计方面的复杂性，从而全面了解MoE。
...more
6min
January 04, 2025 零气泡流水线并行
本期播客深入探讨了零气泡流水线并行技术，这是一种旨在提高大规模分布式训练效率的创新方法。我们分析了传统流水线并行方法中的气泡问题，并介绍了如何通过精细化调度和优化器同步绕过技术来实现零气泡。此外，我们还讨论了自动调度算法、内存优化策略以及实验结果，旨在为听众提供一个全面而深入的技术解析。
...more
7min
January 04, 2025 GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
A podcast discussion about GShard, a module for scaling neural networks using conditional computation and automatic sharding, focusing on its application to multilingual machine translation.
...more
7min
January 04, 2025 AI Radio FM - Technology Channel: GShard and Giant Models
A deep dive into GShard, a module for scaling giant neural networks, focusing on its application to multilingual machine translation and its impact on training efficiency and model quality.
...more
9min
January 04, 2025 混合张量专家数据并行方法优化混合专家训练
深入探讨 DeepSpeed-TED，一种新颖的三维混合并行框架，用于训练具有大型基础模型的混合专家模型。我们讨论了内存优化、通信优化以及与现有方法的性能比较。
...more
6min
January 04, 2025 统一序列并行方法：为长上下文生成式AI赋能
本播客深入探讨了统一序列并行（Unified Sequence Parallelism，简称USP）方法，这是一种用于训练具有极长上下文的生成式AI模型的先进技术。我们分析了现有的序列并行方法，如DeepSpeed-Ulysses和Ring-Attention，并提出了一个统一的框架，该框架结合了两者的优点，同时克服了它们的局限性。通过详细讨论，我们将深入了解USP如何与数据并行、张量并行、ZeRO和流水线并行等现有并行技术相结合，从而为4D混合并行系统提供最佳实践。此外，我们还分享了实验结果，这些结果强调了USP在各种硬件配置下的性能，并展示了其在扩展模型上下文长度和提高训练效率方面的潜力。
...more
8min
January 04, 2025 LoongTrain: 高效长序列大语言模型训练
本期播客深入探讨LoongTrain，一个为长序列大语言模型设计的高效训练框架。我们将讨论其核心的2D注意力机制，以及它如何结合头并行和上下文并行来克服扩展性限制并保持效率。此外，还将分析Double-Ring-Attention机制，以及设备放置策略对训练速度的影响。
...more
8min

FAQs about AI Podcast:

How many episodes does AI Podcast have?

The podcast currently has 427 episodes available.