Misreading Chat

#131: FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness


Listen Later

CUDA で書かれた PyTorch 用カーネルに森田が玉砕しました。ご意見感想などは Reddit やおたより投書箱にお寄せください。iTunes のレビューや星もよろしくね。

  • [2205.14135] FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
  • GitHub – Dao-AILab/flash-attention: Fast and memory-efficient exact attention
  • GitHub – NVIDIA/apex: A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
  • [2307.08691] FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
  • [2112.05682] Self-attention Does Not Need $O(n^2)$ Memory
  • GitHub – tspeterkim/flash-attention-minimal: Flash Attention in ~100 lines of CUDA (forward pass only)
  • ...more
    View all episodesView all episodes
    Download on the App Store

    Misreading ChatBy Hajime Morrita, Jun Mukai

    • 5
    • 5
    • 5
    • 5
    • 5

    5

    6 ratings


    More shows like Misreading Chat

    View all
    Rebuild by Tatsuhiko Miyagawa

    Rebuild

    49 Listeners

    耳で学ぶAI、ロボシンク by 矢野 哲平

    耳で学ぶAI、ロボシンク

    0 Listeners