March 24, 2025

Transformers Without Normalization: Dynamic Tanh Achieves Strong Performance

11 minutes

This podcast episode delves into the "Transformers without Normalization" paper, which introduces Dynamic Tanh (DyT) as a potential replacement for normalization layers in Transformers. DyT, a simple operation defined as tanh(αx) with a learnable parameter, aims to replicate the effects of Layer Norm without calculating activation statistics. Could DyT offer similar or better performance and improved efficiency, challenging the necessity of normalization in modern neural networks?

...more

View all episodes

By Build Wiz AI

March 24, 2025

Transformers Without Normalization: Dynamic Tanh Achieves Strong Performance

11 minutes

...more

Share Transformers Without Normalization: Dynamic Tanh Achieves Strong Performance

Sign up to save your podcasts

Transformers Without Normalization: Dynamic Tanh Achieves Strong Performance

Transformers Without Normalization: Dynamic Tanh Achieves Strong Performance