GitHub Daily Trend

[Gluon][Tutorial] Persistent attention by Mogball · Pull Request #7298 · triton-lang/triton


Listen Later

https://github.com/triton-lang/triton/pull/7298
Rewrite the attention kernel to be persistent. This gives better performance at low-contexts. However, fp16 at large context has suffered a bit due to a ptxas instruction scheduling issue in the so...
...more
View all episodesView all episodes
Download on the App Store

GitHub Daily TrendBy VoiceFeed