
Sign up to save your podcasts
Or


Large Language Models exhibit massive activations with values significantly larger than others, remaining constant across inputs, influencing attention probabilities, and serving as bias terms. Similar phenomena are observed in Vision Transformers.
https://arxiv.org/abs//2402.17762
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
By Igor Melnyk5
33 ratings
Large Language Models exhibit massive activations with values significantly larger than others, remaining constant across inputs, influencing attention probabilities, and serving as bias terms. Similar phenomena are observed in Vision Transformers.
https://arxiv.org/abs//2402.17762
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

956 Listeners

1,942 Listeners

438 Listeners

111,918 Listeners

9,986 Listeners

5,516 Listeners

211 Listeners

49 Listeners

92 Listeners

474 Listeners