
Sign up to save your podcasts
Or


This is a post to officially announce the sae-vis library, which was designed to create feature dashboards like those from Anthropic's research.
Summary.There are 2 types of visualisations supported by this library: feature-centric and prompt-centric.
The feature-centric vis is the standard from Anthropic's post, it looks like the image below. There's an option to navigate through different features via a dropdown in the top left.
You can see the interactive version at the GitHub repo, at _feature_vis_demo.html.The prompt-centric vis is centred on a single user-supplied prompt, rather than a single feature. It will show you the list of features which score highest on that prompt, according to a variety of different metrics. It looks like the image below. There's an option to navigate through different possible metrics and choices of token in your prompt via a dropdown in the top left.
You can see the interactive version at [...]---
First published:
Source:
Narrated by TYPE III AUDIO.
By LessWrongThis is a post to officially announce the sae-vis library, which was designed to create feature dashboards like those from Anthropic's research.
Summary.There are 2 types of visualisations supported by this library: feature-centric and prompt-centric.
The feature-centric vis is the standard from Anthropic's post, it looks like the image below. There's an option to navigate through different features via a dropdown in the top left.
You can see the interactive version at the GitHub repo, at _feature_vis_demo.html.The prompt-centric vis is centred on a single user-supplied prompt, rather than a single feature. It will show you the list of features which score highest on that prompt, according to a variety of different metrics. It looks like the image below. There's an option to navigate through different possible metrics and choices of token in your prompt via a dropdown in the top left.
You can see the interactive version at [...]---
First published:
Source:
Narrated by TYPE III AUDIO.

113,041 Listeners

130 Listeners

7,230 Listeners

531 Listeners

16,229 Listeners

4 Listeners

14 Listeners

2 Listeners