The Nonlinear Library

By The Nonlinear Fund

The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio conte... more

· Education

4.6

88 ratings

Download on the App Store

Download on the App Store

Get it on Google Play

FAQs about The Nonlinear Library:

How many episodes does The Nonlinear Library have?

The podcast currently has 9,862 episodes available.

The Nonlinear Library episodes:

January 15, 2023 AF - Underspecification of Oracle AI by Rubi J. Hudson
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Underspecification of Oracle AI, published by Rubi J. Hudson on January 15, 2023 on The AI Alignment Forum.
Rubi and Johannes worked on this post as part of the SERI MATS program, with Evan Hubinger providing mentorship to both. Rubi also received mentorship from Leo Gao. Thanks to Paul Colognese and Nicholas Schiefer for discussions related to this post.
An oracle is a type of AI system that only answers questions without taking any other actions in the world. Simulators and generative models, which have seen increased discussion recently (links: 1, 2, 3, 4), can be thought of as types of oracles. Such systems may simultaneously be powerful enough to generate a pivotal act while also being easier to align due to a more limited domain.
One major concern with oracles is that the answers they give can still manipulate the world. If oracles are evaluated on predictive accuracy, this gives them an incentive to use their answers to affect the course of events and make the world more predictable. Concretely, we are concerned that oracles may make self-fulfilling prophecies (also known as self-confirming predictions or fixed points), where the act of making the prediction causes the predicted outcome to come true. Even if their manipulation does not terminate in a fixed point, attempts to influence the world towards predictability can be very dangerous.
As one example, consider a highly trusted oracle asked to predict the stock market. If such an oracle predicts that stock prices will rise, then people buy based off that prediction and the price will in fact rise. Similarly, if the oracle predicts that prices will fall, then people will sell, causing prices to fall. For a more real world example, see this market and this market, each on whether a specific person will find a research/romantic partner. Here, high probabilities would indicate desirability of that person, while low probabilities would suggest some hidden flaw, either of which could influence whether potential partners decide to reach out and therefore how the market resolves.
In both the stock market and partnership cases, multiple predictions are valid, so how does the oracle choose between them? Ideally, we would like it to choose the one that is “better” for humanity, but this now introduces an outer alignment question similar to an agentic AI acting directly on the world, and which we wanted to avoid by using oracles in the first place.
Instead, what we can aim for is an oracle that does not take into account the consequences of the prediction it makes when choosing a prediction. Then, there is only one valid prediction for the oracle to make, since the rest of the world is constant from its perspective. This can be thought of as a type of partial agency, optimizing the prediction in some directions but not others. It would be extremely desirable as a safety property, removing all incentives for an oracle to manipulate the world. To emphasize the importance of this property, we introduce new terminology, dubbing oracles “consequence-blind” if they exhibit the desired behavior and “consequence-aware” if they do not.
For an oracle, consequence-blindness is equivalent to following a lonely causal decision theory. The causal decision theory blinds the oracle to any acausal influence, while the loneliness component makes it blind to its influence on other agents, which are necessary intermediaries for a prediction to influence the world.
In this post we will primarily consider an oracle trained via supervised learning on a historical data set. There are a number of different policies that could be learned which minimize loss on the training set, and we will explore the different ways these can generalize. We divide the generalization behavior into a number of different axes, and for each axis discuss the potenti...
...more
31min
January 15, 2023 EA - Does EA understand how to apologize for things? by titotal
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Does EA understand how to apologize for things?, published by titotal on January 15, 2023 on The Effective Altruism Forum.
In response to the drama over Bostroms apology for an old email, the original email has been universally condemned from all sides. But I've also seen some confusion over why people dislike the apology itself. After all, nothing in the apology was technically inaccurate, right? What part of it do we disagree with?
Well, I object to it because it was an apology. And when you grade an apology, you don't grade it on the factual accuracy of the scientific claims contained within, you grade it on how good it is at being an apology. And to be frank, this was probably one of the worst apologies I have ever seen in my life, although it has since been topped by Tegmark's awful non-apology for the far right newspaper affair.
Okay, let's go over the rules for an apology to be genuine and sincere. I'll take them from here.
Acknowledge the offense.
Explain what happened.
Express remorse.
Offer to make amends.
Notably missing from this list is step 5: Go off on an unrelated tangent about eugenics.
Imagine if I called someone's mother overweight in a vulgar manner. When they get upset, I compose a long apology email where I apologize for the language, but then note that I believe it is factually true their mother has a BMI substantially above average, as does their sister, father, and wife. Whether or not those claims are factually true doesn't actually matter, because bringing them up at all is unnecessary and further upsets the person I just hurt.
In Bostroms email of 9 paragraphs, he spends 2 talking about the historical context of the email, 1 talking about why he decided to release it, 1 actually apologizing, and the remaining 5 paragraphs giving an overview of his current views on race, intelligence, genetics, and eugenics.
What this betrays is an extreme lack of empathy for the people he is meant to be apologizing to. Imagine if he was reading this apology out loud to the average black person, and think about how uncomfortable they would feel by the time he got to part discussing his papers about the ethics of genetic enhancement.
Bostroms original racist email did not mention racial genetic differences or eugenics. They should not have been brought up in the apology either. As a direct result of him bringing the subject up, this forum and others throughout the internet have been filled with race science debate, an outcome that I believe is very harmful. Discussions of racial differences are divisive, bad PR, probably result in the spread of harmful beliefs, and are completely irrelevant to top EA causes. If Bostrom didn't anticipate that this outcome would result from bringing the subject up, then he was being hopelessly naive.
On the other hand, Bostroms apology looks absolutely saintly next to the FLI's/Max Tegmarks non-apology for the initial approval of grant money to a far-right newspaper (the funding offer was later rescinded). At no point does he offer any understanding at all as to why people might be concerned about approving, even temporarily, funding for a far-right newspaper that promotes holocaust denial, covid vaccine conspiracy theories, and defending "ethnic rights".
I don't even know what to say about this statement. The FLI has managed to fail at point 1 of an apology: understanding that they did something wrong. I hope they manage to release a real apology soon, and when they do, maybe they can learn some lessons from previous failures.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
...more
4min
January 15, 2023 LW - How does GPT-3 spend its 175B parameters? by Robert AIZI
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: How does GPT-3 spend its 175B parameters?, published by Robert AIZI on January 13, 2023 on LessWrong.
[Target audience: Me from a week ago, and people who have some understanding of ML but want to understand transformers better on a technical level.]
Free advice for people learning new skills: ask yourself random questions. In answering them, you’ll strengthen your understanding and find out what you really understand and what’s actually useful. And some day, if you ask yourself a question that no one has asked before, that’s a publication waiting to happen!
So as I was reading up on transformers, I got fixated on this question: where are the 175 billion parameters in the architecture? Not in the literal sense (the parameters are in the computer), but how are they “spent” between various parts of the architecture - the attention heads vs feed-forward networks, for instance. And how can one calculate the number of parameters from the architecture’s “size hyperparameters” like dimensionality and number of layers?
The goal of this post is to answer those questions, and make sense of this nice table from the GPT-3 paper, deriving the nparams column from the other columns.
Primary Sources
Lots of resources about transformers conjure information from thin air, and I want to avoid that, so I’m showing all my work here. These are the relevant parts of the sources we'll draw from:
Three more details we’ll use, all from Section 2.1 of the GPT-3 paper:
The vocabulary size is [nvocab=]50257 tokens (via a reference to Section 2.3 of the GPT-2 paper)
The feed-forward networks are all a single layer which is “four times the size of the bottleneck layer”, so dff=4dmodel
“All models use a context window of nctx=2048 tokens.”
Variable abbreviations
I’ll use shorthand for the model size variables to increase legibility:
nlayers=xdmodel=ynheads=zdhead=wnvocab=vnctx=u
Where are the Parameters?
From Exhibit A, we can see that the original 1-hot encoding of tokens U is first converted to the initial “residual stream” h0, then passed through transformer blocks (shown in Exhibits B-D), with nlayers blocks total. We'll break down parameter usage by stage:
Word Embedding Parameters
We is the word embedding matrix.
Converts the shape (nctx, nvocab) matrix U into a (nctx,dmodel) matrix, so We has size (nvocab,dmodel), resulting in vy=nvocabdmodel parameters.
Position Embedding Parameters
Wp is the position embedding matrix. Unlike the original transformer paper, GPT learns its position embeddings.
Wp is the same size as the residual stream, (nctx,dmodel), resulting in uy = nctxdmodel parameters
Transformer Parameters - Attention
The attention sublayer of the transformer is one half of the basic transformer block (Exhibit B). As shown in Exhibit C, each attention head in each layer is parameterized by 3 matrices, WQi,WKi,WVi, with one additional matrix WO per layer which combines the attention heads.
What Exhibit C calls dk and dv are both what GPT calls dhead, so WQi,WKi, and WVi are all size (dmodel,dhead). Thus each attention head contributes 3dmodeldhead parameters.
What Exhibit C calls h is what GPT calls nheads, so WO is size (nheads∗dhead,dmodel) and therefore contributes nheadsdheaddmodel parameters.
Total parameters per layer: For a single layer, there are nheads attention heads, so the WQi,WKi, and WVi matrices contribute 3dmodeldheadnheads parameters, plus an additional nheadsdheaddmodel parameters from WO, for a total of 4dmodeldheadnheads
Total parameters: 4xyzw=4dmodeldheadnheadsnlayers
Transformer Parameters - FFN
The “feed-forward network” (FFN) is the other half of the basic transformer block (Exhibit B). Exhibit D shows that it consists of a linear transform parameterized by W1 and b1, an activation function, and then another linear transform parameterized by W2 and b2, as one m...
...more
14min
January 15, 2023 AF - Non-directed conceptual founding by Tsvi Benson-Tilsen
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Non-directed conceptual founding, published by Tsvi Benson-Tilsen on January 15, 2023 on The AI Alignment Forum.
[Metadata: crossposted from. Written 13 June 2022. I'm fairly likely to not respond to comments promptly. If you're especially interested in chatting, my gmail address is: tsvibtcontact ]
In trying to understand minds-in-general, we sometimes ask questions that talk about "big" things (taking "big" to ambiguously mean any of large, complex, abstract, vague, important, touches many things, applies to many contexts, "high-level"). E.g.:
What is it for a mind to have thoughts or to care about stuff? How does care and thought relate?
What is it to believe a proposition?
Why do agents use abstractions?
These "big" things such as thought, caring, propositions, beliefs, agents, abstractions, and so on, have to be analyzed and re-understood in clearer terms in order to get anywhere useful. When others make statements about these things, I'm pulled to pause their flow of thoughts and instead try to get clear on meanings. In part, that pull is because the more your thoughts use descriptions that aren't founded on words with clear meaning, the more leeway is given to your words to point at different things in different instances.[1]
Main claim
From talking with Sam, I've come to think that there's an important thing I hadn't seen sufficiently clearly:
A description of Y that uses terms that are only as "foundational" as Y or even "less foundational" than Y, can still be useful and doesn't have to be harmful. For analyzing "big" things, such descriptions are necessary.
Circular founding
A description is a proposition of the form "Y is a ...". A description is founded on X if it assumes that X exists, e.g. by mentioning X, or by mentioning Z which mentions X, or by relying on X to be in the background.[2]
Some descriptions of Y might be founded on Y, or on X where X is itself founded on Y. A description like that could be called circular, or in general non-directed.
The circularity could be harmful. E.g., you could trick yourself into thinking you're talking about anything coherently, when really you're not: whenever you ask "Wait, what's Y?" you respond "Oh it's XZ", and you say "Z is YX", and you say "X is YZ", and you never do the work of connecting XYZ to stuff that matters, so it's all hot air. Or, you might have "Y" more densely connected to its neighbors, but not beholden to anything outside of its neighbors, so "Y" and its neighbors might drift under their own collaborative inertia and drag other ideas with them away from reality. There are probably other problems with circular founding, so, there's reason to be suspicious. But:
(A) Non-directed founding can elucidate relevant structure;
(B) For "big" things, it's more likely to be feasible to found somewhat-non-directedly, and especially somewhat-circularly, and less likely to be feasible to found strictly in a certain direction;
and therefore
(C) For analyzing and understanding "big" things, non-directed and circular founding are likely to be best-in-class among the available tools.
(A): "Thing = Nexus" as a circular, non-directed, useful founding
As an example, take the description of a thing as an inductive nexus of reference (more specifically, the claim that nexusness points essentially [see below] at the nexus of thingness). This description makes use of a pre-theoretic notion of the "stuff" between which there may be relations of reference, and defines "reference" in terms of what minds in general do. So the definition of nexus is founded on "stuff", which is pre-theoretically on a similar footing to "thing", making the definition of nexus somewhat circularly founded. And, the definition of nexus is founded on "mind", which is a "bigger" concept than "thing", making the definition of nexus founded on so...
...more
15min
January 15, 2023 EA - The writing style here is bad by Michał Zabłocki
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The writing style here is bad, published by Michał Zabłocki on January 15, 2023 on The Effective Altruism Forum.
Epistemic status: around that of Descartes' (low)
I am not a native English speaker. Despite that, I've had my English skills in high regard most of my life. It was the language of my studies at the university. Although I still make plenty of mistakes, I want to assure you I am capable of reading academic texts.
That being said: a whole lot of posts and comments here do feel like academic texts. The most basic/heuristic check: I found a tool to measure linguistic complexity, here/ - so you can play with it yourself, if you'd like to. Now, I realize that AI Safety is a complicated, professional topic with a lot of jargon. Hence, let's take a discussion that, I believe, should be especially welcoming to non-professionals:
I could make some Python project and analyse lingustic complexity of a whole range of posts, produce graphs and it sure would be fun and much better, but I am a lazy person and I just want to show you the idea. I mean to sound extremely simple when I say the following.
There's a whole lot of syllables right there.
Most of the comments here do feel like academic papers. Reading them is a really taxing exercise. In fact, I usually just stray from it. Whether it's my shit attention span or people on a global scale are not proficient English speakers, it is my firm belief that ideas should be communicated in an understandable matter when posssible. That is, most of people should be able to understand them. If you want to increase diveristy and be more inclusive, well, I think that's one really good way at attempting so.
This is also the reason for the exact title of the post, rather than "Linguistic preferences of some effective altruists seem to be impacted by a tendency to overly intellectualize."
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
...more
2min
January 15, 2023 EA - Do better, please ... by Rohit is a Strange Loop
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Do better, please ..., published by Rohit is a Strange Loop on January 15, 2023 on The Effective Altruism Forum.
I am not a card carrying member of EA. I am not particularly A, much less E in that context. However the past few months have been exhausting in seeing not just the community, one I like, in turmoil repeatedly, while clearly fumbling basic aspects of how they're seen in the wider world. I like having EA in the world, I think it does a lot of good. And I think you guys are literally throwing it away based on aesthetics of misguided epistemic virtue signaling. But it's late, and I read more than a few articles, and this post is me begging you to please just stop.
The specific push here is of course the Bostrom incident, when he clearly and highly legibly wrote black people have lower intelligence than other races. And his apology, was, to put it mildly, mealy mouthed and without much substance. If anything, in the intervening 25 years since the offending email, all he seems to have learnt to do is forget the one thing he said he wanted to do - to speak plainly.
I'm not here to litigate race science. There's plenty of well reviewed science in the field that demonstrates that, varyingly, there are issues with measurements of both race and intelligence, much less how they evolve over time, catch up speeds, and a truly dizzying array of confounders. I can easily imagine if you're young and not particularly interested in this space you'd have a variety of views, what is silly is seeing someone who is so clearly in a position of authority, with a reputation for careful consideration and truth seeking, maintaining this kind of view.
And not only is this just wrong, it's counterproductive.
If EA wants to work on the most important problems in the world and make progress on them, it would be useful to have the world look upon you with trust. For anything more than turning money into malaria nets, you need people to trust you. And that includes trusting your intentions and your character.
If you believe there are racial differences in intelligence, and your work forces you to work on the hard problems of resource allocation or longtermist societal evolution, nobody will trust you to do the right tradeoffs. History is filled with optimisation experiments gone horribly wrong when these beliefs existed at the bottom. The base rate of horrible outcomes is uncomfortably large.
This is human values misalignment. Unless you have overwhelming evidence (or any real evidence), this is just a dumb prior to hold and publicise if you're working on actively changing people's lives. I don't care what you think about ethics about sentient digital life in the future if you can't figure this out today.
Again, all of which individually is fine. I'm an advocate of people holding crazy opinions should they want to. But when like a third of the community seems to support him, and the defenses require contortions that agree, dismiss and generally be whiny about drama, that's ridiculous. While I appreciate posts like this, which speak about the importance of epistemic integrity, it seems to miss the fact that applauding someone for not lying is great but not if the belief they're holding is bad. And even if this blows over, it will remain a drag on EA unless it's addressed unequivocally.
Or this type of comment which uses a lot of words but effectively seems to support the same thought. That no, our job is to differentiate QALYs and therefore differences are part of life.
But guess what, epistemic integrity on something like this (I believe something pretty reprehensible and am not cowing to people telling me so) isn't going to help with shrimp welfare or AI risk prevention. Or even malaria net provision. Do not mistake "sticking with your beliefs" to be an overriding good, above believing w...
...more
7min
January 15, 2023 EA - Someone should write a detailed history of effective altruism by Pete Rowlett
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Someone should write a detailed history of effective altruism, published by Pete Rowlett on January 14, 2023 on The Effective Altruism Forum.
I think that someone should write a detailed history of the effective altruism movement. The history that currently exists on the forum is pretty limited, and I’m not aware of much other material, so I think there’s room for substantial improvement. An oral history was already suggested in this post.
I tentatively planned to write this post before FTX collapsed, but the reasons for writing this are probably even more compelling now than they were beforehand. I think a comprehensive written history would help.
Develop an EA ethos/identity based on a shared intellectual history and provide a launch pad for future developments (e.g. longtermism and an influx of money). I remember reading about a community member who mostly thought about global health getting on board with AI safety when they met a civil rights attorney who was concerned about it. A demonstration of shared values allowed for that development.
Build trust within the movement. As the community grows, it can no longer rely on everyone knowing everyone else, and needs external tools to keep everyone on the same page. Aesthetics have been suggested as one option, and I think that may be part of the solution, in concert with a written history.
Mitigate existential risk to the EA movement. See EA criticism #6 in Peter Wildeford’s post and this post about ways in which EA could fail. Assuming the book would help the movement develop an identity and shared trust, it could lower risk to the movement.
Understand the strengths and weaknesses of the movement, and what has historically been done well and what has been done poorly.
There are a few ways this could happen.
Open Phil (which already has a History of Philanthropy focus area) or CEA could actively seek out someone for the role and fund them for the duration of the project. This process would give the writer the credibility needed to get time with important EA people.
A would-be writer could request a grant, perhaps from the EA Infrastructure Fund.
An already-established EA journalist like Kelsey Piper could do it. There would be a high opportunity cost associated with this option, of course, since they’re already doing valuable work. On the other hand, they would already have the credibility and baseline knowledge required to do a great job.
I’d be interested in hearing people’s thoughts on this, or if I missed a resource that already exists.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
...more
3min
January 15, 2023 EA - EA should help Tyler Cowen publish his drafted book in China by Matt Brooks
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: EA should help Tyler Cowen publish his drafted book in China, published by Matt Brooks on January 14, 2023 on The Effective Altruism Forum.
Tyler Cowen was on the Jan 9th episode of ChinaTalk, a podcast hosted by Jordan Schneider.
Podcast:
China Talk Substack:
At 39:45 Tyler mentions writing a book to improve US relations with China that will likely never be published. We should help him publish it!Edit: Tyler is interested although worried about censorship
I transcribed this part of the podcast with Whisper, so there may be mistakes. Go listen to the entire episode anyway, it’s worth a listen.
Transcription
JordanSo shortly, millions of Chinese nationals who've been playing World of Warcraft their entire lives will no longer be able to. I'm curious, how important shared cultural touchstones, like video games, the NBA and Marvel movies are to keeping the peace?
Tyler
I don't know, we had plenty such touchstones with, say, Germany before World War I, World War II, it didn't matter. But certainly worth trying, you know, I had my own project to improve relations with China, which failed, by the way. I wrote a manuscript for a book, and my plan was to publish it only in China. And it was a book designed to explain America to the Chinese, and make it more explicable, more understandable. So I wrote the book, I submitted it to Xinhua, which gave me a contract, even paid me in advance. But then a number of events came along, most specifically the Trump trade wars, and the book never came out. They're still sitting on it. I don't think it will ever come out. That was my, you know, you could call it, misguided project, to just do a very small amount to help the two countries get along better.
Jordan
Wow, what were your, what were your themes?
Tyler
Well, if you think of Tokvill, he wrote democracy in America, so that Europeans would understand America better, right? So I thought, well, if we're trying to explain America to Chinese people, it's a really very different set of questions, especially in the 21st century. Though I covered a lot of basic differences across the economies, the policies, why are the economies different?
Why is there so little state ownership in America?
Why are so many parts of America so bad at infrastructure?
Why do Americans save less?
How is religion different in America?
That was, I think, an especially sensitive topic. And just try to make sense of America for Chinese readers, but not defending it. Just some kind of, all of branch of understanding. Here's how we are. And I don't know. I don't think they'll ever put the book out. And of course, by now, it's out of date.
Jordan
Yeah, but there's, I mean, there's plenty of other people. Other like countries on the planet who could use a little, you know, a civics 101.
Tyler
They could. I mean, this is a book written for Chinese people with the contrasts and data comparisons to China. So to sort of send the same book to, you know, Senegal, I don't think would really make sense.
Jordan
Yeah, but if you publish it in the US, it will like, you know, Osmos out. I don't think it needs to be published by Xinhua for Chinese people to read it, Tyler.
Tyler
I've thought of having it translated into Chinese distributed Somersault in some way. Haven't ruled that out. No downside for me, but you want to do things right. And I kept on waiting for Xinhua. And now I've really completely given up. The book is out of date with facts. That's not a big problem. Facts you can update, but it's very out of date with respect to tone. So right now, everyone feels you need to be tough with China. You can't sort of say nice things to China about China, you're pandering. You look like LeBron James or you're afraid to speak up. And the book would have made a lot of sense, say in 2015 that its current tone doesn't make sense in ...
...more
6min
January 14, 2023 EA - Concerns about AI safety career change by mmKALLL
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Concerns about AI safety career change, published by mmKALLL on January 13, 2023 on The Effective Altruism Forum.
Summary:
I'm a software engineer interested in working on AI safety, but confused about its career prospects. I outlined all my concerns below.
In particular, I had trouble finding accounts of engineers working in the field, and the differences between organizations/companies working on AI safety are very unclear from the outside.
It's also not clear if frontend skills are seen as useful, or whether applicants should reside within the US.
Full text:
I'm an experienced full-stack software engineer and software/strategy consultant based in Japan. I've been loosely following EA since 2010, and have become increasingly concerned about AI x-risk since 2016. This has led me to regularly consider possible careers in AI safety, especially now that the demand for software engineers in the field has increased dramatically.
However, having spent ~15 hours reading about the current state of the field, organizations, and role of engineers, I find myself having more questions than I started with. In hope of finding more clarity and help share what engineers considering the career shift might be wondering, I decided to outline my main points of concern below:
The only accounts of engineers working in AI safety I could find were two articles and a problem profile on 80,000 Hours. Not even the AI Alignment Forum seemed to have any posts written by engineers sharing their experience. Despite this, most orgs have open positions for ML engineers, DevOps engineers, or generalist software developers. What are all of them doing?
Many job descriptions listed very similar skills for engineers, even when the orgs seemed to have very different approaches on tackling AI safety problems. Is the set of required software skills really that uniform across organizations?
Do software engineers in the field feel that their day-to-day work is meaningful? Are they regularly learning interesting and useful things? How do they see their career prospects?
I'm also curious whether projects are done with a diverse set of technologies? Who is typically responsible for data transformations and cleanup? How much ML theory should an engineer coming into the field learn beforehand? (I'm excited to learn about ML, but got very mixed signals about the expectations.)
Some orgs describe their agenda and goals. In many cases, these seemed very similar to me, as all of them are pragmatic and many even had shared or adjacent areas of research. Given the similarities, why are there so many different organizations? How is an outsider supposed to know what makes each of them unique?
As an example, MIRI states that they want to "ensure that the creation of smarter-than-human machine intelligence has a positive impact", Anthropic states they have "long-term goals of steerable, trustworthy AI", Redwood Research states they want to "align -- future systems with human interests", and Center of AI Safety states they want to "reduce catastrophic and existential risks from AI". What makes these different from each other? They all sound like they'd lead to similar conclusions about what to work on.
I was surprised to find that some orgs didn't really describe their work or what differentiates them. How are they supposed to find the best engineers if interested ones can't know what areas they are working on? I also found that it's sometimes very difficult to evaluate whether an org is active and/or trustworthy.
Related to this, I was baffled to find that MIRI hasn't updated their agenda since 2015, and their latest publication is dated at 2016. However, their blog seems to have ~quarterly updates? Are they still relevant?
Despite finding many orgs by reading articles and publications, I couldn't find a good overall list ...
...more
7min
January 14, 2023 AF - World-Model Interpretability Is All We Need by Thane Ruthenis
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: World-Model Interpretability Is All We Need, published by Thane Ruthenis on January 14, 2023 on The AI Alignment Forum.
Summary, by sections:
Perfect world-model interpretability seems both sufficient for robust alignment (via a decent variety of approaches) and realistically attainable (compared to "perfect interpretability" in general, i. e. insight into AIs' heuristics, goals, and thoughts as well). Main arguments: the NAH + internal interfaces.
There's plenty of reasons to think that world-models would converge towards satisfying a lot of nice desiderata: they'd be represented as a separate module in AI cognitive architecture, and that module would consists of many consistently-formatted sub-modules representing recognizable-to-us concepts. Said "consistent formatting" may allow us to, in a certain sense, interpret the entire world-model in one fell swoop.
We already have some rough ideas on how the data in world-models would be formatted, courtesy of the NAH. I also offer some rough speculations on possible higher-level organizing principles.
This avenue of research also seems very tractable. It can be approached from a wide variety of directions, and should be, to an extent, decently factorizable. Optimistically, it may constitute a relatively straight path from here to a "minimum viable product" for alignment, even in words where alignment is really hard.
1. Introduction
1A. Why Aim For This?
Imagine that we develop interpretability tools that allow us to flexibly understand and manipulate an AGI's world-model — but only its world-model. We would be able to see what the AGI knows, add or remove concepts from its mental ontology, and perhaps even use its world-model to run simulations/counterfactuals. But its thoughts and plans, and its hard-coded values and shards, would remain opaque to us. Would that be sufficient for robust alignment?
I argue it would be.
Primarily, this would solve the Pointers Problem. A central difficulty of alignment is that our values are functions of highly abstract variables, and that makes it hard to point an AI at them, instead of at easy-to-measure, shallow functions over sense-data. Cracking open a world-model would allow us to design metrics that have depth.
From there, we'd have several ways to proceed:
Fine-tune the AI to point more precisely at what we want (such as "human values" or "faithful obedience"), instead of its shallow correlates.
This would also solve the ELK, which alone can be used as a lever to solve the rest of alignment.
Alternatively, this may lower the difficulty of retargeting the search — we won't necessarily need to find the retargetable process, only the target.
Discard everything of the AGI except the interpreted world-model, then train a new policy function over that world-model (in a fashion similar to this), that'll be pointed at the "deep" target metric from the beginning.
The advantage of this approach over (1) is that in this case, our policy function wouldn't be led astray by any values/mesa-objectives it might've already formed.
With some more insight into how agency/intelligence works, perhaps we'll be able to manually write a general-purpose search algorithm over that world-model. In a sense, "general-purpose search" is just a principled way of drawing upon the knowledge contained in the world-model, after all — the GPS itself is probably fairly simple.
Taking this path would give us even more control over how our AI works than (2), potentially allowing us to install some very nuanced counter-measures.
That leaves open the question of the "target metric". It primarily depends on what will be easy to specify — what concepts we'll find in the interpreted world-model. Some possibilities:
Human values. Prima facie, "what this agent values" seems like a natural abstraction, one that we'd expect to ...
...more
37min

FAQs about The Nonlinear Library:

How many episodes does The Nonlinear Library have?

The podcast currently has 9,862 episodes available.