Episode 12.21: Interpretability, sparseness, and vector manipulation. Mono- and polysemanticity.
More ‘caveat emptor’ than usual in this speculative episode based on three very important papers from Anthropic AI on interpretability, aka making sense of neural nets.
Episode 12.21: Interpretability, sparseness, and vector manipulation. Mono- and polysemanticity.
More ‘caveat emptor’ than usual in this speculative episode based on three very important papers from Anthropic AI on interpretability, aka making sense of neural nets.