April 23, 2024

#131: FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Listen Later

30 minutes

CUDA で書かれた PyTorch 用カーネルに森田が玉砕しました。ご意見感想などは Reddit やおたより投書箱にお寄せください。iTunes のレビューや星もよろしくね。

[2205.14135] FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

GitHub – Dao-AILab/flash-attention: Fast and memory-efficient exact attention

GitHub – NVIDIA/apex: A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

[2307.08691] FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

[2112.05682] Self-attention Does Not Need $O(n^2)$ Memory

GitHub – tspeterkim/flash-attention-minimal: Flash Attention in ~100 lines of CUDA (forward pass only)

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

Misreading Chat

By Hajime Morrita, Jun Mukai

5

66 ratings

April 23, 2024

#131: FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Listen Later

30 minutes

CUDA で書かれた PyTorch 用カーネルに森田が玉砕しました。ご意見感想などは Reddit やおたより投書箱にお寄せください。iTunes のレビューや星もよろしくね。

[2205.14135] FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

GitHub – Dao-AILab/flash-attention: Fast and memory-efficient exact attention

GitHub – NVIDIA/apex: A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

[2307.08691] FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

[2112.05682] Self-attention Does Not Need $O(n^2)$ Memory

GitHub – tspeterkim/flash-attention-minimal: Flash Attention in ~100 lines of CUDA (forward pass only)

...more

More shows like Misreading Chat

Rebuild by Tatsuhiko Miyagawa

Rebuild

49 Listeners

耳で学ぶAI、ロボシンク by 矢野哲平

耳で学ぶAI、ロボシンク

0 Listeners