Sign up to save your podcastsEmail addressPasswordRegisterOrContinue with GoogleAlready have an account? Log in here.
March 25, 2026FlashAttention-3: Fast & Accurate Attention with Asynchrony & Low-Precision17 minutesPlayMajor efficiency leap for Transformer attention mechanisms, enabling faster training/inference on long sequences with low-precision compute....moreShareView all episodesBy Shaoqing TanMarch 25, 2026FlashAttention-3: Fast & Accurate Attention with Asynchrony & Low-Precision17 minutesPlayMajor efficiency leap for Transformer attention mechanisms, enabling faster training/inference on long sequences with low-precision compute....more
Major efficiency leap for Transformer attention mechanisms, enabling faster training/inference on long sequences with low-precision compute.
March 25, 2026FlashAttention-3: Fast & Accurate Attention with Asynchrony & Low-Precision17 minutesPlayMajor efficiency leap for Transformer attention mechanisms, enabling faster training/inference on long sequences with low-precision compute....more
Major efficiency leap for Transformer attention mechanisms, enabling faster training/inference on long sequences with low-precision compute.