https://github.com/triton-lang/triton/pull/7298 

Rewrite the attention kernel to be persistent. This gives better performance at low-contexts. However, fp16 at large context has suffered a bit due to a ptxas instruction scheduling issue in the so...

https://github.com/triton-lang/triton/pull/7298 Rewrite the attention kernel to be persistent. This gives better performance at low-contexts. However, fp16 at large context has suffered a bit due to a ptxas instruction scheduling issue in the so...

[Gluon][Tutorial] Persistent attention by Mogball · Pull Request #7298 · triton-lang/triton

GitHub trends to you daily.
This podcast features popular GitHub repositories in an audio format, presented in a radio style.
Stay updated on the latest trending technologies with ease.

This is an unofficial channel, and we are not affiliated with the original media sources.
The content is curated and produced independently by a Japanese software engineer.

Powered by VoiceFeed. https://voicefeed.web.app

News

GitHub trends to you daily. This podcast features popular GitHub repositories in an audio format, presented in a radio style. Stay updated on the latest trending technologies with ease. This is an unofficial channel, and we are not affiliated with the original media sources. The content is curated and produced independently by a Japanese software engineer. Powered by VoiceFeed. https://voicefeed.web.app

Share [Gluon][Tutorial] Persistent attention by Mogball · Pull Request #7298 · triton-lang/triton

Sign up to save your podcasts

[Gluon][Tutorial] Persistent attention by Mogball · Pull Request #7298 · triton-lang/triton

[Gluon][Tutorial] Persistent attention by Mogball · Pull Request #7298 · triton-lang/triton