Embodied AI 101

One Learning Rate Doesn't Fit All: Layerwise Spectral Scheduling for Transformers


Listen Later

Shows that modern transformers are highly heterogeneous across layers and proposes layerwise learning rates based on weight spectrum shape, yielding up to 1.5× training speedup on LLaMA/GPT-style models.
...more
View all episodesView all episodes
Download on the App Store

Embodied AI 101By Shaoqing Tan