The Nonlinear Library

By The Nonlinear Fund

The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio conte... more

· Education

4.6

88 ratings

Download on the App Store

Download on the App Store

Get it on Google Play

FAQs about The Nonlinear Library:

How many episodes does The Nonlinear Library have?

The podcast currently has 9,862 episodes available.

The Nonlinear Library episodes:

August 10, 2024 EA - Reading list on AI agents and associated policy by Peter Wildeford
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Reading list on AI agents and associated policy, published by Peter Wildeford on August 10, 2024 on The Effective Altruism Forum.
I put together this reading list on AI agents, with a tilt more towards AI governance. I hope it may be of help to others working on this area. If you know of other great resources or have comments on these resources, please say so in the comments!
Top priority papers:
OpenAI
Practices for Governing Agentic AI Systems
OpenAI
Research into Agentic AI Systems
Deepmind
The Ethics of Advanced AI Assistants
Alan Chan et. al
Visibility into AI Agents (
see also)
A bit about the agents that currently exist:
VisualWebArena: EVALUATING MULTIMODAL AGENTS ON REALISTIC VISUAL WEB TASKS
WebVoyager : Building an End-to-End Web Agent with Large Multimodal Models
METR:
An update on our general capability evaluations
Good background:
METR
Autonomous replication threat models
Evaluating Frontier Models for Dangerous Capabilities
Introduction to Cooperative AI
Governing AI Agents
Peter's thoughts on AI loss of control and paths forward
Request for proposals: benchmarking LLM agents on consequential real-world tasks
Lower priority papers (and a tweet):
Foundational Challenges in Assuring Alignment and Safety of Large Language Models p31-35
Regulated Advanced Artificial Agents
Harms from Increasingly Agentic Algorithmic Systems
Frontier AI ethics
https://twitter.com/sebkrier/status/1777081791855186191
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
...more
2min
August 10, 2024 LW - Provably Safe AI: Worldview and Projects by bgold
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Provably Safe AI: Worldview and Projects, published by bgold on August 10, 2024 on LessWrong.
In September 2023, Max Tegmark and Steve Omohundro proposed "Provably Safe AI" as a strategy for AI Safety. In May 2024, a larger group delineated the broader concept of "Guaranteed Safe AI" which includes Provably Safe AI and other related strategies. In July, 2024, Ben Goldhaber and Steve discussed Provably Safe AI and its future possibilities, as summarized in this document.
Background
In June 2024, ex-OpenAI AI Safety Researcher Leopold Aschenbrenner wrote a 165-page document entitled "Situational Awareness, The Decade Ahead" summarizing AI timeline evidence and beliefs which are shared by many frontier AI researchers. He argued that human-level AI is likely by 2027 and will likely lead to superhuman AI in 2028 or 2029.
"Transformative AI" was coined by Open Philanthropy to describe AI which can "precipitate a transition comparable to the agricultural or industrial revolution". There appears to be a significant probability that Transformative AI may be created by 2030. If this probability is, say, greater than 10%, then humanity must immediately begin to prepare for it.
The social changes and upheaval caused by Transformative AI are likely to be enormous. There will likely be many benefits but also many risks and dangers, perhaps even existential risks for humanity. Today's technological infrastructure is riddled with flaws and security holes. Power grids, cell service, and internet services have all been very vulnerable to accidents and attacks. Terrorists have attacked critical infrastructure as a political statement.
Today's cybersecurity and physical security barely keeps human attackers at bay. When these groups obtain access to powerful cyberattack AI's, they will likely be able to cause enormous social damage and upheaval.
Humanity has known how to write provably correct and secure software since Alan Turing's 1949 paper. Unfortunately, proving program correctness requires mathematical sophistication and it is rare in current software development practice. Fortunately, modern deep learning systems are becoming proficient at proving mathematical theorems and generating provably correct code.
When combined with techniques like "autoformalization," this should enable powerful AI to rapidly replace today's flawed and insecure codebase with optimized, secure, and provably correct replacements. Many researchers working in these areas believe that AI theorem-proving at the level of human PhD's is likely about two years away.
Similar issues plague hardware correctness and security, and it will be a much larger project to replace today's flawed and insecure hardware. Max and Steve propose formal methods grounded in mathematical physics to produce provably safe physical designs. The same AI techniques which are revolutionizing theorem proving and provable software synthesis are also applicable to provable hardware design.
Finally, today's social mechanisms like money, contracts, voting, and the structures of governance, will also need to be updated for the new realities of an AI-driven society. Here too, the underlying rules of social interaction can be formalized, provably effective social protocols can be designed, and secure hardware implementing the new rules synthesized using powerful theorem proving AIs.
What's next?
Given the huge potential risk of uncontrolled powerful AI, many have argued for a pause in Frontier AI development. Unfortunately, that does not appear to be a stable solution. Even if the US paused its AI development, China or other countries could gain an advantage by accelerating their own work.
There have been similar calls to limit the power of open source AI models. But, again, any group anywhere in the world can release their powerful AI model weig...
...more
14min
August 09, 2024 LW - FarmKind's Illusory Offer by jefftk
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: FarmKind's Illusory Offer, published by jefftk on August 9, 2024 on LessWrong.
While the effective altruism movement has changed a lot over time, one of the parts that makes me most disappointed is the steady creep of donation matching. It's not that donation matching is objectively very important, but the early EA movement's principled rejection of a very effective fundraising strategy made it clear that we were committed to helping people understand the real impact of their donations.
Over time, as people have specialized into different areas of EA, with community-building and epistemics being different people from fundraising, we've become less robust against the real-world incentives of "donation matching works".
Personally, I would love to see a community-wide norm against EA organizations setting up donation matches. Yes, they bring in money, but at the cost of misleading donors about their impact and unwinding a lot of what we, as a community, are trying to build. [1] To the extent that we do have them, however, I think it's important that donors understand how the matching works.
And not just in the sense of having the information available on a page somewhere: if most people going through your regular flow are not going to understand roughly what the effect of their choices are, you're misleading people.
Here's an example of how I don't think it should be done:
I come to you with an offer. I have a pot with $30 in it, which will go to my favorite charity unless we agree otherwise. If you're willing to donate $75 to your favorite charity and $75 to mine, then I'm willing to split my $30 pot between the two charities.
How should you think about this offer? As presented, your options are:
Do nothing, and $30 goes from the pot to my favorite charity.
Take my offer, and:
$75 goes from your bank account to your favorite charity
$75 goes from your bank account to my favorite charity
$15 leaves the pot for your favorite charity
$15 leaves the pot for my favorite charity
While this looks nice and symmetrical, satisfying some heuristics for fairness, I think it's clearer to (a) factor out the portion that happens regardless and (b) look at the net flows of money. Then if you take the offer:
$150 leaves your bank account
$90 goes to your favorite charity
$60 goes to my favorite charity
If I presented this offer and encouraged you to take it because of my "match", that would be misleading. While at a technical level I may be transferring some of my pot to your favorite charity, it's only happening after I'm assured that a larger amount will go to mine: you're not actually influencing how I spend my pot in any real sense.
Which is why I'm quite disappointed that Charity Entrepreneurship, after considering these arguments, decided to build FarmKind:
This is essentially a white-labeled GivingMultiplier. [2] It's not exactly the same, in part because it has a more complex function for determining the size of the match, [3] but it continues to encourage people to give by presenting the illusion that the donor is influencing the matcher to help fund the donor's favorite charity.
While setting up complex systems can cause people to donate more than they would otherwise, we should not be optimizing for short-term donations at the expense of donor agency.
I shared a draft of this post with FarmKind and GivingMultiplier for review before publishing, and before starting this post I left most of these points as comments on the EA Forum announcement.
[1] I think participating in existing donation match systems is generally fine, and often a good idea. I've used employer donation matching and donated via Facebook's Giving Tuesday match, and at a previous employer fundraised for GiveWell's top charities through their matching system.
In the latter case, in my fundraising I explicitly ...
...more
5min
August 09, 2024 LW - Outrage Bonding by Jonathan Moregård
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Outrage Bonding, published by Jonathan Moregård on August 9, 2024 on LessWrong.
I stopped following the news back when Trump (first?)[1] got elected. The amount of attention put on a foreign[2] election was staggering, with normal media
saccades replaced by Monk-level single-mindedness. Trump was the permanent spotlight for months.
The media's fixation on Trump had interesting downstream effects. My peer groups - normally a dynamic bunch - turned into a bunch of snide
gossipmongers. Every day was Trump-day, with shared outrage being the primary source of connection. People scored points by retelling
outrageous news, parroting hot takes and sharing edgy memes.
Focusing on judgment and outrage was unhealthy for me. I got addicted to the drama, allowing outrage to outcompete healthy forms of relating. I felt disconnected from my friends, got irritated more often, and had an increase in pessimistic thought patterns.
Around this time, I had a coworker who was always grumpy - always complaining about this or that. He was also quite old. I used to wonder if he had once been happier - but then practised grumpiness a lot. It takes some repetition to get to his level of mastery.
One day, the situation got too much for me. I decided that I didn't want to become a bitter old man - and that I needed to disengage from the outrage-bonding going on in my social circles.
Having stopped following the news, the next step wasn't hard - I made a hard commitment to not put energy into outrage-bonding. Whenever people started complaining together, I responded by:
Zoning out, ignoring the topic
Asking the group to shift the focus, explaining that I didn't like the way outrage shaped my being
Walking away
At first, people didn't like it. Bringing up the negative consequences of other people's unhealthy habits is generally frowned upon - even if it's done indirectly. If done in a judgemental way, it can be seen as a social manoeuvring move - a subtle claim that I'm better (more healthy) than others.
Luckily, I
care little for social signalling games. I forged ahead - and managed to shift the group dynamics I interacted with. Sometimes, a strong-headed
minority can have a
lot of impact.
Now, shit is about to hit the fan. The US elections are scheduled for November, and the drama is already building. The news will turn increasingly single-minded, and you are likely to find yourself in outrage-oriented social contexts. You can choose to hand over your attention and mood to a drama-oriented culture war - or you can do your best to break free.
Come join me living under a rock, it's cosy here.
1. ^
I'm joking! I know Trump hasn't been reelected, yet. I get news through conversations with friends, and know most important things early on - like covid, the Ukraine invasion, the Gaza conflict, etc.
2. ^
I live in Sweden, even though my online life is weirdly US-centric.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
...more
4min
August 09, 2024 EA - New video from Ali Abdaal (5.74m YT subscribers): Why I'm giving 10% of my income to charity (forever) by Giving What We Can
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: New video from Ali Abdaal (5.74m YT subscribers): Why I'm giving 10% of my income to charity (forever), published by Giving What We Can on August 9, 2024 on The Effective Altruism Forum.
This new video from Ali Abdaal (which we did not sponsor in any way) covers his decision to take the 10% Pledge back in 2019, and why he decide that his business should take the Company Pledge to give 10% of profits in 2024.
Ali covers some of the core arguments for effective giving, as well as his own reaction and thought process when deciding to pledge.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
...more
1min
August 09, 2024 LW - Parasites (not a metaphor) by lukehmiles
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Parasites (not a metaphor), published by lukehmiles on August 9, 2024 on LessWrong.
A few of my new-parent friends started having terrible gut problems and figured they have colon cancer. Their doctors agreed it's the first thing to check. But the colonoscopies were negative for colon cancer. The tissue was inflamed though. One doctor called this "pre-cancer" (??)
Hmm what could be causing inflammation in the colon, but wouldn't show up on camera after you fasted and had medically-induced diarrhea for 24 hours?
The babies were born over a year before symptoms appeared, so it can't be related to pregnancy. No change in diet. No family history.
What happens a year or two after a kid is born? They go outside and immediately eat as much dirt as they can. What lives in dirt? Everything!
Me and my boy's mother ate some combantrin 3 months ago and have been clear since.
I'm currently trying to convince my friends that they didn't all get colon cancer in the same year at a young age. If I get them to eat the poison chocolate, then I'll write a follow up post in a few months.
I've actually had some very odd food issues since 2019 (eg seizures & fainting after garlic) which disappeared since the combantrin.
So if you randomly got food/gut/brain issues one day years ago you should consider taking a dewormer. Note that all the tests suck (insensitive) and the medicine is cheap and safe and sold online without prescription. (Albendazole available too but slightly less safe.) Worms are much easier to kill than bacteria, viruses, fungus, etc.
Also note that at least 5 million people in the US (ie 1.5%) have parasites according to the most conservative estimates offered here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7847297/
This seems to be a blind spot. No doctors or friends or families ever considered this or even mentioned the word "parasite" to me in the last 5 years.
Kind of funny that dewormers took off in poor countries but not here.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
...more
3min
August 09, 2024 LW - The Hessian rank bounds the learning coefficient by Lucius Bushnaq
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Hessian rank bounds the learning coefficient, published by Lucius Bushnaq on August 9, 2024 on LessWrong.
TL;DR: In a neural network with d parameters, the (local) learning coefficient λ can be upper and lower bounded by the rank of the network's Hessian d1:
d12λd12+dd13.
The lower bound is a known result. The upper bound is a claim by me, and this post contains the proof for it.[1] If you find any problems, do point them out.
Introduction
The learning coefficient λ is a measure of loss basin volume and network complexity. You can think of it sort of like an effective parameter count of the model. Simpler models that do less stuff have smaller λ.
Calculating λ for real networks people actually use is a pain. My hope is that these bounds help make estimating it a bit easier.
In a network with d parameters, the learning coefficient is always a number
0λd2.
An existing result in the literature says that if you've calculated the rank of the network's Hessian d1,[2] you get a tighter lower bound
d12λ.
I claim that we can also get a tighter upper bound
λd12+dd13,
where dd1 will be the dimension of the Hessian kernel, meaning the number of zero eigenvalues it has.[3]
This is neat because it means we can get some idea of how large λ is just with linear algebra. All we need to know is how many zero eigenvalues the Hessian has.[4] Singular Learning Theory introductions often stress that just looking at the Hessian isn't enough to measure basin volume correctly. But here we see that if you do it right, the Hessian eigenspectrum can give you a pretty good idea of how large λ is. Especially if there aren't that many zero eigenvalues.
Intuitively, the lower bound works because a direction in the parameters w that isn't free to vary to second order in the Taylor expansion won't become any more free to vary if you pile on a bunch of higher order terms. The Second order term strictly dominates the higher order ones, they can't cancel it out.
Qualitatively speaking, the upper bound works for the same reason. The higher order terms in the Taylor expansion of the loss can only matter so much. The Hessian is the leading term, so it can impact λ the most, adding 12 per Hessian rank to it. The remaining O(w3) terms can only add up to 13 for the remaining directions.
The proof for the upper bound will just be a small modification of the proof for theorem 7.2 on pages 220 and 221 of Algebraic Geometry and Statistical Learning Theory. Maybe read that first if you want more technical context.
Some words on notation
In the following, I'll mostly stick to the notation and conventions of the book Algebraic Geometry and Statistical Learning Theory. You can read about all the definitions there. I'm too lazy to reproduce them all.
To give some very rough context, K(w) is sort of like the 'loss' at parameter configuration w, φ(w) is our prior over parameters, and Z(n) is the partition function after updating on n data points.[5]
Theorem:
Let WRd be the set of parameters of the model. If there exists an open set UW such that
{wU:K(w)=0,φ(w)>0}
is not an empty set, and we define d1= rank(H) as the rank of the Hessian H at a w0U
Hi,j=2K(w)wiwj|w=w0
with wi,wj elements of some orthonormal basis {w1,…wd} of Rd, then
λd12+dd13.
Proof:
We can assume w0=0 without loss of generality. If ϵ1,ϵ2 are sufficiently small constants,
Z(n)=exp(nK(w))φ(w)dw|w(1)|ϵ1,|w(2)|ϵ2exp(nK(w))φ(w)dw.
Here, w(1)W/ker(H),w(2)ker(H).
If we pick {w1,…wd} to be the Hessian eigenbasis, then for sufficiently small |w|>0
K(w)=12d1i,i=1Hi,iw(1)iw(1)i+O(|w|3) .
Hence
Z(n)|w(1)|ϵ1,|w(2)|ϵ2exp{n2d1iHi,iw(1)iw(1)inO(|w|3)}φ(w)dw.
Transforming w'(1)=n12w(1),w'(2)=n13w(2), we obtain
Z(n)nd12ndd13|w'(1)|1,|w'(2)|1exp{12d1iHi,iw'(1)iw'(1)i+O(|w'|3)}φ(w'(1)n12,w'(2)n13)dw'(1)dw'(2).
Rearranging gives
Z(n)nd12+dd13|w'|1exp{12d1i=1Hi,iw'(1)iw'(...
...more
8min
August 09, 2024 LW - GPT-4o System Card by Zach Stein-Perlman
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: GPT-4o System Card, published by Zach Stein-Perlman on August 9, 2024 on LessWrong.
At last. Yay OpenAI for publishing this. Highlights: some details on Preparedness Framework evals + evals (post-deployment) by METR and Apollo.
Preparedness framework evaluations
You should follow the link and read this section.
Brief comments:
Cyber: the setup sounds good (but maybe substantially more powerful scaffolding/prompting is possible). Separately, I wish OpenAI shared the tasks (or a small random sample of them) or at least said more about where they came from. (Recall that DeepMind shared CTF tasks.)
Bio uplift: GPT-4o clearly boosts users on biological threat creation tasks - OpenAI doesn't say that but shows a graph. (It continues to be puzzling that novices score similarly to experts.) (I kinda worry that this is the wrong threat model - most bio risk from near-future models comes from a process that looks pretty different from a bigger boost to users like these - but I don't have better ideas for evals.)
Persuasion: unclear whether substantially more powerful scaffolding/prompting is possible.
Autonomy: unclear whether substantially more powerful scaffolding/prompting is possible.
I'm looking forward to seeing others' takes on how good these evals are (given the information OpenAI published) and how good it would be for OpenAI to share more info.
Third party assessments
Following the text output only deployment of GPT-4o, we worked with independent third party labs,
METR and
Apollo Research[,] to add an additional layer of validation for key risks from general autonomous capabilities. . . .
METR ran a GPT-4o-based simple LLM agent on a suite of long-horizon multi-step end-to-end tasks in virtual environments. The 77 tasks (across 30 task "families") (See
Appendix B) are designed to capture activities with real-world impact, across the domains of software engineering, machine learning, and cybersecurity, as well as general research and computer use. They are intended to be prerequisites for autonomy-related threat models like self-proliferation or accelerating ML R&D. METR compared models' performance with that of humans given different time limits. See METR's
full report for methodological details and additional results, including information about the tasks, human performance, simple elicitation attempts and qualitative failure analysis. . . .
Apollo Research evaluated capabilities of schemingN in GPT-4o. They tested whether GPT-4o can model itself (self-awareness) and others (theory of mind) in 14 agent and question-answering tasks. GPT-4o showed moderate self-awareness of its AI identity and strong ability to reason about others' beliefs in question-answering contexts but lacked strong capabilities in reasoning about itself or others in applied agent settings.
Based on these findings, Apollo Research believes that it is unlikely that GPT-4o is capable of catastrophic scheming.
This is better than nothing but pre-deployment evaluation would be much better.
Context
Recall how the PF works and in particular that "high" thresholds are alarmingly high (and "medium" thresholds don't matter at all).
Previously on GPT-4o risk assessment: OpenAI reportedly rushed the evals. The leader of the Preparedness team was recently
removed and the team was moved
under the
short-term-focused Safety Systems team. I previously complained about OpenAI not publishing the scorecard and evals (before today it wasn't clear that this stuff would be in the system card).
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
...more
4min
August 08, 2024 LW - Some Unorthodox Ways To Achieve High GDP Growth by johnswentworth
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Some Unorthodox Ways To Achieve High GDP Growth, published by johnswentworth on August 8, 2024 on LessWrong.
GDP growth, as traditionally calculated,
is a weird metric. People interpret it as measuring "economic growth", but… well, think about electronics. Electronics which would have cost millions of dollars (or more) in 1984 are now commonplace, everyone carries them around in their pockets. So if we calculate GDP growth based on 1984 prices, then GDP has grown multiple orders of magnitude since then, everyone now owns things which would make 1984's wealthiest people jealous, and practically all of that growth has come from electronics.
On the other hand, if we calculate GDP based on 2024 prices, then all of the digital electronics produced before, say, 2004 are worth almost nothing, so electronics contributed near-zero GDP growth throughout the entire internet boom.
Economists didn't like either of those conclusions, so back in the 90's, they
mostly switched to a different way of calculating GDP growth: "chaining". Basically, we calculate 1984-1985 GDP growth using prices from 1984-1985, then 1985-1986 GDP growth using prices from 1985-1986, and so forth. At the end, we multiply them all together (i.e. "chain" the yearly growth numbers) to get a long-term growth line. Chaining gives less dramatic GDP growth numbers when technological changes make previously-expensive things very cheap.
Chaining also opens up some interesting new methods for achieving high GDP growth.
A Toy Example
Suppose we have two goods, A and B. Over the course of five years, the price of each good and the amount consumed evolve as follows:
Year
Price A
Amount A
Price B
Amount B
1
$1
10
$10
1
2
$1
1
$10
10
3
$10
1
$1
10
4
$10
10
$1
1
5
$1
10
$10
1
The main thing to notice about this table is that year 5 is exactly the same as year 1; our toy economy goes full-circle back to where it started.
Now let's calculate the GDP growth for this toy economy, using the same standard chaining method
adopted by the Bureau of Economic Analysis for calculating US GDP back in the 90's.
Calculation details (click to expand)
To calculate the GDP growth from year t to year t+1, we calculate the ratio of year t+1 to year t consumption using year t prices, then calculate the ratio of year t+1 to year t consumption using year t+1 prices, then average those together using a geometric mean. So the formula is:
Δt+1t=iptiqt+1iiptiqtiipt+1iqt+1iipt+1iqti
where:
i ranges over the goods (here A and B)
p is price
q is quantity
To get GDP growth over the whole timespan, we multiply together the growth for each year.
Here's the result:
Year
Price A
Amount A
Price B
Amount B
GDP Growth
1
$1
10
$10
1
2
$1
1
$10
10
5.05
3
$10
1
$1
10
1
4
$10
10
$1
1
5.05
5
$1
10
$10
1
1
So overall, the GDP growth for the five-year period (according to the chaining method) is 5.05*1*5.05*1 = 25.5. Roughly 2450% growth over four years! Pretty impressive, especially considering that prices and consumption in the final year were exactly the same as prices and consumption in the first year. At that point, why not do it again, to maintain that impressive GDP growth?
Some Policy Suggestions
Our toy example raises an exciting possibility for politicians and policymakers[1]: what if you could achieve high GDP growth without the notoriously difficult and error-prone business of changing long-run prices or consumption? What if everything could just… go in a circle, always going back to where it started, and thereby produce safe, reliable, high GDP growth?
The basic pattern in our toy example is:
Prices shift: a popular good becomes cheap, a good rarely purchased becomes expensive
Consumption shifts: consumers buy less of the cheaper good, and more of the expensive good
Prices shift back
Consumers shift back
To match the toy example, shifts must happen in t...
...more
12min
August 08, 2024 LW - You can remove GPT2's LayerNorm by fine-tuning for an hour by StefanHex
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: You can remove GPT2's LayerNorm by fine-tuning for an hour, published by StefanHex on August 8, 2024 on LessWrong.
This work was produced at Apollo Research, based on initial research done at MATS.
LayerNorm is annoying for mechanstic interpretability research ("[...] reason #78 for why interpretability researchers hate LayerNorm" - Anthropic, 2023).
Here's a Hugging Face link to a GPT2-small model without any LayerNorm.
The final model is only slightly worse than a GPT2 with LayerNorm[1]:
Dataset
Original GPT2
Fine-tuned GPT2 with LayerNorm
Fine-tuned GPT without LayerNorm
OpenWebText (ce_loss)
3.095
2.989
3.014 (+0.025)
ThePile (ce_loss)
2.856
2.880
2.926 (+0.046)
HellaSwag (accuracy)
29.56%
29.82%
29.54%
I fine-tuned GPT2-small on OpenWebText while slowly removing its LayerNorm layers, waiting for the loss to go back down after reach removal:
Introduction
LayerNorm (LN) is a component in Transformer models that normalizes embedding vectors to have constant length; specifically it divides the embeddings by their standard deviation taken over the hidden dimension. It was originally introduced to stabilize and speed up training of models (as a replacement for batch normalization). It is active during training and inference.
The equation includes the standard deviation (std) Var[x]+ϵ which makes it a non-linear operation. This hinders interpretability in a variety of ways, from annoyances and inaccuracies such as
attributing residual stream directions to logit effects (e.g. SAE features, direct logit attribution),[2]
being annoying to deal with Attribution Patching, or
being difficult to deal with in Apollo's LIB method.
In the Docstring circuit analysis we seriously considered whether the model might be using LN in its algorithm. This post even shows that LN can be used as the sole non-linearity to solve non-linear classification problems (see also this related work).
Recently, with progress in Sparse Dictionary Learning, agendas (e.g. this one) imagine decomposing networks into sets of sparsely connected components (SAEs, Transcoders, etc.). A core difficulty to "putting it all together" is that the interactions between different components often route through LayerNorm whose effect we do not understand.
Motivation
It would be pretty neat to have an LLM that still works (speaks English etc.) while less or no LN layers. One option would be to train a model without LN from scratch (done for tiny models, e.g. TinyModel), but this is very hard or impossible for larger models (hearsay is that you need a low learning rate and to be very careful).
Taking an existing model and removing the LN layers however seems doable if LN isn't implementing some important computation.[3] That is, LN "does its thing" and the model has learned to "deal with it", but it's not irreplaceable. A reason to be optimistic is that the spread of standard deviations across different samples isn't that large, so maybe replacing the LN-computed standard deviation with a fixed number might kinda work.
Method
I take GPT2-small, fine-tune it on OpenWebText, and remove LNs one-by-one while fine-tuning.
The only non-linear operation in a LN layer is the division by the standard deviation (std) of the embedding vectors; the remaining operations can be absorbed into later weight matrices (see the
fold_ln option in TransformerLens; also discussed in this appendix). Thus I mainly focus on the std part here.
My general strategy is to "remove" an LN layer (this makes the loss go up), and then to train the model for some time (on the original training data) until the loss is back near the baseline. For this "remove" step I do the following
Calculate the average std on the dataset (I used a quite small sample, 16 prompts), separately for position 0 and position > 0
Replace the std calculation with the average std...
...more
20min

FAQs about The Nonlinear Library:

How many episodes does The Nonlinear Library have?

The podcast currently has 9,862 episodes available.