May 10, 2026

“Neural Networks learn Bloom Filters” by Alex Gibson

20 minutes

Overview:

We train a tiny ReLU network to output sparse top- distributions over a vocabulary much larger than its residual dimension. The trained network seems to converge to a mechanism closely resembling a Bloom filter: tokens are assigned sparse binary hashes, the hidden layer computes an approximate union indicator, and the output logits are linearly read from this union.

Here's what a small network trained on a toy version of the sparse top- distribution task learns to use:

Weight matrix of a 1-layer ReLU network trained via gradient descent on the toy -sparse distribution task below, for , , . Truncated at first tokens for visualisation purposes.

Plot of the range of values of , it forms a bimodal distribution.

That's the input weight matrix of the trained network. Every entry is either or . The network has effectively encoded a binary hash for each token - and as we'll show, this seems to enable the network to approximately simulate a Bloom filter, and so output the correct set of top- tokens with high probability.

We provide a theoretical construction showing how to set the weights to exactly implement a Bloom filter. The real network [...]

---

Outline:

(00:10) Overview:

(02:02) The Task:

(03:27) Construction:

(04:17) Formal construction:

(04:47) Analysis of a single forward pass:

(06:13) Training:

(07:04) Behavioural analysis of the trained network:

(10:14) Mechanistic analysis of the trained network:

(16:21) Conclusion / Reflections:

(18:24) Related work:

(19:25) Further work:

---

First published:

May 9th, 2026

Source:

https://www.lesswrong.com/posts/buxBdp8NtHGgBwabv/neural-networks-learn-bloom-filters

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more

View all episodes

By LessWrong

May 10, 2026

“Neural Networks learn Bloom Filters” by Alex Gibson

20 minutes

Overview:

Here's what a small network trained on a toy version of the sparse top- distribution task learns to use:

Weight matrix of a 1-layer ReLU network trained via gradient descent on the toy -sparse distribution task below, for , , . Truncated at first tokens for visualisation purposes.

Plot of the range of values of , it forms a bimodal distribution.

We provide a theoretical construction showing how to set the weights to exactly implement a Bloom filter. The real network [...]

---

Outline:

(00:10) Overview:

(02:02) The Task:

(03:27) Construction:

(04:17) Formal construction:

(04:47) Analysis of a single forward pass:

(06:13) Training:

(07:04) Behavioural analysis of the trained network:

(10:14) Mechanistic analysis of the trained network:

(16:21) Conclusion / Reflections:

(18:24) Related work:

(19:25) Further work:

---

First published:

May 9th, 2026

Source:

https://www.lesswrong.com/posts/buxBdp8NtHGgBwabv/neural-networks-learn-bloom-filters

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more

More shows like LessWrong (30+ Karma)

View all

The Daily

112,330 Listeners

Astral Codex Ten Podcast

130 Listeners

Interesting Times with Ross Douthat

7,247 Listeners

Dwarkesh Podcast

563 Listeners

The Ezra Klein Show

16,328 Listeners

AI Article Readings

4 Listeners

Doom Debates!

14 Listeners

LessWrong posts by zvi

2 Listeners

Share “Neural Networks learn Bloom Filters” by Alex Gibson

Sign up to save your podcasts

“Neural Networks learn Bloom Filters” by Alex Gibson

“Neural Networks learn Bloom Filters” by Alex Gibson

More shows like LessWrong (30+ Karma)

The Daily

Astral Codex Ten Podcast

Interesting Times with Ross Douthat

Dwarkesh Podcast

The Ezra Klein Show

AI Article Readings

Doom Debates!

LessWrong posts by zvi