The Nonlinear Library

By The Nonlinear Fund

The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio conte... more

· Education

4.6

88 ratings

Download on the App Store

Download on the App Store

Get it on Google Play

FAQs about The Nonlinear Library:

How many episodes does The Nonlinear Library have?

The podcast currently has 9,862 episodes available.

The Nonlinear Library episodes:

August 31, 2023 LW - Report on Frontier Model Training by YafahEdelman
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Report on Frontier Model Training, published by YafahEdelman on August 31, 2023 on LessWrong.
Understanding what drives the rising capabilities of AI is important for those who work to forecast, regulate, or ensure the safety of AI. Regulations on the export of powerful GPUs need to be informed by understanding of how these GPUs are used, forecasts need to be informed by bottlenecks, and safety needs to be informed by an understanding of how the models of the future might be trained. A clearer understanding would enable policy makers to target regulations in such a way that they are difficult for companies to circumvent with only technically compliant GPUs, forecasters to avoid focus on unreliable metrics, and technical research working on mitigating the downsides of AI to understand what data models might be trained on.
This doc is built from a collection of smaller docs I wrote on a bunch of different aspects of frontier model training I consider important. I hope for people to be able to use this document as a collection of resources, to draw from it the information they find important and inform their own models.
I do not expect this doc to have a substantial impact on any serious AI labs capabilities efforts - I think my conclusions are largely discoverable in the process of attempting to scale AIs or for substantially less money than a serious such attempt would cost. Additionally I expect major labs already know many of the things in this report.
Acknowledgements
I'd like to thank the following people for their feedback, advice, and discussion:
James Bradbury, Software Engineer, Google DeepMind
Benjamin Edelman, Ph.D. Candidate, Harvard University
Horace He, Software Engineer, PyTorch/Meta
Lukas Finnveden, Research Analyst, Open Philanthropy Project
Joanna Morningstar, Chief Scientific Officer, Nanotronics
Keller Scholl, Ph.D. Candidate, Pardee RAND Graduate School
Jaime Sevilla, Director, Epoch
Cody Wild, Research Engineer, Google
Index
Cost Breakdown of ML Training
Estimates the costs of training a frontier (state of the art) model, drawing on leaks and analysis. Power usage is a small portion of the cost, GPUs are likely a slim majority.
Why ML GPUs Cost So Much
ML GPUs are expensive largely because of their communication and memory capabilities - not because of their processing power. NVIDIA's best gaming GPU provides greater ML processing power than the GPU used to train GPT-4, for only a tenth the price. Note that NVIDIA's near monopoly plausibly explains some of the price differential.
Contra FLOPs
Argues that the most common metric of ML computing power - floating point operations - is flawed, due to the rise of different types of floating point numbers making standardization difficult and the cost of processing power representing a small portion of the cost of ML.
ML Parallelism
An overview of ML parallelism techniques, showing how the common notion that "ML is embarrassingly parallel" is simplistic and breaks down at large scales - where any simple method of parallelizing a model starts to hit bottlenecks as the capabilities of individual devices become bottlenecks regardless of the number of devices involved.
We (Probably) Won't Run Out of Data
There are many routes toward preventing data from becoming a major bottleneck to ML scaling, though it's not certain any of them enable scaling as fast as has occurred historically.
AI Energy Use and Heat Signatures
ML energy usage may become important in the near future, even if it's a relatively minor concern for frontier model training right now. If current trends continue, energy usage could limit scaling, determine major engineering challenges, and provide a novel approach to surveillance of training runs using satellites and multispectral photography.
Cost Breakdown of ML Training
This section is an att...
...more
40min
August 30, 2023 AF - Responses to apparent rationalist confusions about game / decision theory by Anthony DiGiovanni
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Responses to apparent rationalist confusions about game / decision theory, published by Anthony DiGiovanni on August 30, 2023 on The AI Alignment Forum.
I've encountered various claims about how AIs would approach game theory and decision theory that seem pretty importantly mistaken. Some of these confusions probably aren't that big a deal on their own, and I'm definitely not the first to point out several of these, even publicly. But collectively I think these add up to a common worldview that underestimates the value of technical work to reduce risks of AGI conflict. I expect that smart agents will likely avoid catastrophic conflict overall - it's just that the specific arguments for expecting this that I'm responding to here aren't compelling (and seem overconfident).
For each section, I include in the footnotes some examples of the claims I'm pushing back on (or note whether I've primarily seen these claims in personal communication). This is not to call out those particular authors; in each case, they're saying something that seems to be a relatively common meme in this community.
Summary:
The fact that conflict is costly for all the agents involved in the conflict, ex post, doesn't itself imply AGIs won't end up in conflict. Under their uncertainty about each other, agents with sufficiently extreme preferences or priors might find the risk of conflict worth it ex ante. (more)
Solutions to collective action problems, where agents agree on a Pareto-optimal outcome they'd take if they coordinated to do so, don't necessarily solve bargaining problems, where agents may insist on different Pareto-optimal outcomes. (more)
We don't have strong reasons to expect AGIs to converge on sufficiently similar decision procedures for bargaining, such that they coordinate on fair demands despite committing under uncertainty. Existing proposals for mitigating conflict given incompatible demands, while promising, face some problems with incentives and commitment credibility. (more)
The commitment races problem is not just about AIs making commitments that fail to account for basic contingencies. Updatelessness (or conditional commitments generally) seems to solve the latter, but it doesn't remove agents' incentives to limit how much their decisions depend on each other's decisions (leading to incompatible demands). (more)
AIs don't need to follow acausal decision theories in order to (causally) cooperate via conditioning on each other's source code. (more)
Most supposed examples of Newcomblike problems in everyday life don't seem to actually be Newcomblike, once we account for "screening off" by certain information, per the Tickle Defense. (more)
The fact that following acausal decision theories maximizes expected utility with respect to conditional probabilities, or counterfactuals with the possibility of logical causation, doesn't imply that agents with acausal decision theories are selected for (e.g., acquire more material resources). (more)
Ex post optimal =/= ex ante optimal
An "ex post optimal" strategy is one that in fact makes an agent better off than the alternatives, while an "ex ante optimal" strategy is optimal with respect to the agent's uncertainty at the time they choose that strategy. The idea that very smart AGIs could get into conflicts seems intuitively implausible because conflict is, by definition, ex post Pareto-suboptimal. (See the "inefficiency puzzle of war.")
But it doesn't follow that the best strategies available to AGIs given their uncertainty about each other will always be ex post Pareto-optimal. This may sound obvious, but my experience with seeing people's reactions to the problem of AGI conflict suggests that many of them haven't accounted for this important distinction.
As this post discusses in more detail, there are two fundamental sources of u...
...more
28min
August 30, 2023 EA - Did Economists Really Get Africa's AIDS Epidemic "Analytically Wrong"? (A Reply) by TomDrake
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Did Economists Really Get Africa's AIDS Epidemic "Analytically Wrong"? (A Reply), published by TomDrake on August 30, 2023 on The Effective Altruism Forum.
To demonstrate CGD's cherished principle of not taking organisation positions, here is a response from a couple of us in the health team to our colleague Justin Sandefur's recent(ish) blog on cost-effectiveness evidence and PEPFAR.
Our concern was that readers might come away from Justin's blog thinking that cost-effectiveness evidence wasn't useful in the original PEPFAR decision and wouldn't be useful in similar decisions about major global health initiatives. We disagree and wanted to make the case for cost-effectiveness as well as addressing some of Justin's specific points along the way.
A recent, thought-provoking blog by our colleague, Justin Sandefur, titled "How Economists got Africa's AIDS Epidemic Wrong", has sparked a debate about the historical role of cost-effectiveness analysis in assessing the investments of the President's Emergency Plan for AIDS Relief (PEPFAR) and, implicitly, the value of such analysis in making similar global health decisions. Justin tells the story of PEPFAR and concludes that economists that raised concerns over the cost-effectiveness of antiretroviral therapies got PEPFAR "analytically wrong", a conclusion that some readers may interpret as a reason to discard cost-effectiveness analysis for such decisions in the future. The original blog draws three lessons:
Lesson #1. What persuaded the White House was evidence of feasibility and efficacy, not cost-effectiveness
Lesson #2. The budget constraint wasn't fixed; PEPFAR unlocked new money
Lesson #3. Prices also weren't fixed, and PEPFAR may have helped bring them down
In this blog we argue that while Justin's observations hold some truth, they do not discredit the value of cost-effectiveness analysis in decision-making. Specifically, we contend that:
Because there were many feasible and effective options at the time, this was not sufficient criteria for such a large decision. It should have considered the cost-effectiveness of other options, to explore the relative impact.
PEPFAR may have unlocked some new money, but it wasn't all new money, and it will have had short- and long-term opportunity costs. Moreover we cannot be certain that PEPFAR was uniquely able to increase available funding. Thus the decision could have considered cost-effectiveness analysis to reveal likely trade-offs.
Price reductions could have been analytically explored for PEPFAR and for alternative options as part of cost-effectiveness analysis during decision-making.
The bigger lesson, we conclude, is that when the next PEPFAR-sized decision happens, our systems and their stakeholders must strive for higher standards, embracing analysis that models a range of good options and assesses them against key criteria. Cost-effectiveness analysis is a necessary component of this, but it is not sufficient, and additional analysis and scenarios should be considered through a deliberative process, before settling on a final decision.
Below we offer reflections on each of Justin's three lessons, in order, then draw out the overall conclusions.
Response 1: Feasibility and efficacy are not enough
Justin uses an analogy of giving to a homeless person to invite the reader to agree that cost is not really the relevant issue when considering whether to do a good deed. True enough, if something can be considered not effective or not feasible then it's a non-starter and we don't need to trouble ourselves over cost or cost-effectiveness. But when there are multiple feasible and effective options with different levels of effectiveness and cost, understanding which does the most good for the money is absolutely worth knowing. Indeed we agree that there is a moral imperative to...
...more
10min
August 30, 2023 EA - EA Germany Community Health Documents & Processes by Milena Canzler
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: EA Germany Community Health Documents & Processes, published by Milena Canzler on August 30, 2023 on The Effective Altruism Forum.
Our goal with these documents is to build a safe community for all our members by making sure any interpersonal harm is appropriately dealt with and encouraging harmed individuals to reach out to us. This is a collection of all the documents we created for EA Germany during the Community Health Project (March - July 2023). We were asked to share these with the wider community. We welcome others to use and adapt them and invite feedback.
We created two types of documents:
Public: to inform our community about our Code of Conduct, event standards, ways to report misbehaviours and ask for help, processes for evaluating and responding to reports, confidentiality, professional contact points, and Awareness Workshop results.
Internal: description of the role and responsibility of the Community Health Contact, their interaction with the Equal Opportunities Officer of our association, and handover processes.
Why we need this
Last week, Ninas, a new group organizer, was alerted by group member Sayat about allegations against a long-term member. Unaware of any past issues, Ninas informs Sayat, causing Sayat to feel belittled and cease work on a promising Biosafety camp. After confronting the accused and mistakenly revealing Sayat as the source, retaliation against Sayat ensues. Ninas, now shocked and believing Sayat, uncovers additional troubling stories about the long-term member after further inquiry. Ninas regrets not knowing earlier, as this knowledge could have prevented Sayat's distress and withdrawal from their project due to harassment.
Imagine if Ninas, the hypothetical group organiser, had a starter pack of information about Community Health and previous incidents in their group. Imagine if they didn't botch their first conversation with Sayat and were better informed about confidentiality procedures.
As part of the Community Health Project, which started in March 2023, we created a series of documents to inform our members about our offers and to document internal processes for the team. These documents exist to support us in staying in line with our vision and values. They further help identify areas for improvement and serve as the foundation for action.
Setting in Germany
To appreciate how these documents work, here is an overview of the system in and for which they're created. Other communities will have to adapt the processes for their own structure, culture and laws.
Our structure: The German EA community has 27 active local groups. EA Germany is organised as a membership association Effektiver Altruismus Deutschland (EAD) e.V. with over 100 members. We currently have a team of five employees, some full-time and some part-time. The association also elects an Equal Opportunities Officer who checks our processes and important documents for discrimination. They also provided us with feedback on these documents.
The resources
Part of our strategy for 2023 is to provide a trained Community Health Contact, implement standards and offer documents for Community Health. With the support of colleagues, the Equal opportunities officer, CEA's Community Health team and experts, I assessed our options to create a safer community. This assessment is ongoing. Here is the summary of what needed to be done and the results so far:
Public documents
Community Health Contact
Mental Health First Aid training to qualify me for this role in our team
Informed our community about this offer; also in German
Anonymous contact forms established; also in German
Code of Conduct
We added avenues of reporting violations and responses to reports, an alcohol and drug policy and anonymous contact forms.
During our annual meeting at the start of June, we asked the memb...
...more
7min
August 30, 2023 LW - Biosecurity Culture, Computer Security Culture by jefftk
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Biosecurity Culture, Computer Security Culture, published by jefftk on August 30, 2023 on LessWrong.
While I've only worked in biosecurity for about a year and my computer security background consists of things I picked up while working on other aspects of software engineering, the cultures seem incredibly different. Some examples of good computer security culture that would be bad biosecurity culture:
Openness and full disclosure. Write blog posts with deep detail on how vulnerabilities were found, with the goal of teaching others how to find similar ones in the future. Keep details quiet for a few months if need be to give vendors time to fix but after, say, 90 days go public.
Breaking things to fix them. Given a new system, of course you should try to compromise it. If you succeed manually, make a demo that cracks it in milliseconds. Make (and publish!) fuzzers and other automated vulnerability search tools.
Enthusiastic curiosity and exploration. Noticing hints of vulnerabilities and digging into them to figure out how deep they go is great. If someone says "you don't need to know that" ignore them and try to figure it out for yourself.
This is not how computer security has always been, or how it is everywhere, and people in the field are often fiercely protective of these ideals against vendors that try to hide flaws or silence researchers. And overall my impression is that this culture has been tremendously positive in computer security.
Which means that if you come into the effective altruism corner of biosecurity with a computer security background and see all of these discussions of "information hazards", people discouraging trying to find vulnerabilities, and people staying quiet about dangerous things they've discovered it's going to feel very strange, and potentially rotten.
So here's a framing that might help see things from this biosecurity perspective. Imagine that the Morris worm never happened, nor Blaster, nor Samy. A few people independently discovered SQL injection but kept it to themselves. Computer security never developed as a field, even as more and more around us became automated. We have driverless cars, robosurgeons, and simple automated agents acting for us, all with the security of original Sendmail. And it's all been around long enough that the original authors have moved on and no one remembers how any of it works. Someone who put in some serious effort could cause immense distruction, but this doesn't happen because the people who have the expertise to cause havoc have better things to do. Introducing modern computer security culture into this hypothetical world would not go well!
Most of the cultural differences trace back to what happens once a vulnerability is known. With computers:
The companies responsible for software and hardware are in a position to fix their systems, and disclosure has helped build a norm that they should do this promptly.
People who are writing software can make changes to their approach to avoid creating similar vulnerabilities in the future.
End users have a wide range of effective and reasonably cheap options for mitigation once the vulnerability is known.
But with biology there is no vendor, a specific fix can take years, a fully general fix may not be possible, and mitigation could be incredibly expensive. The culture each field needs is downstream from these key differences.
Overall this is sad: we could move faster if we could all just talk about what we're most concerned about, plus cause prioritization would be simpler. I wish we were in a world where we could apply the norms from computer security! But different constraints lead to different solutions, and the level of caution I see in biorisk seems about right given these constraints.
(Note that when I talk about "good biosecurity culture" I'm desc...
...more
4min
August 30, 2023 LW - Open Call for Research Assistants in Developmental Interpretability by Jesse Hoogland
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Open Call for Research Assistants in Developmental Interpretability, published by Jesse Hoogland on August 30, 2023 on LessWrong.
We are excited to announce multiple positions for Research Assistants to join our six-month research project assessing the viability of Developmental Interpretability (DevInterp).
This is a chance to gain expertise in interpretability, develop your skills as a researcher, build out a network of collaborators and mentors, publish in major conferences, and open a path towards future opportunities, including potential permanent roles, recommendations, and successive collaborations.
Background
Developmental interpretability is a research agenda aiming to build tools for detecting, locating, and understanding phase transitions in learning dynamics of neural networks. It draws on techniques from singular learning theory, mechanistic interpretability, statistical physics, and developmental biology.
Position Details
General info:
Title: Research Assistant / Research Engineer.
Location: Remote, with hubs in Melbourne and London.
Duration: Until March 2024 (at minimum).
Compensation: base salary is USD$35k per year, to be paid out as an independent contractor at an hourly rate.
Timeline:
Application Deadline: September 15th, 2023
Ideal Start Date: October 2023
How to Apply: Complete the application form by the deadline. Further information on the application process will be provided in the form.
Who We Are
The developmental interpretability research team consists of experts across a number of areas of mathematics, physics, statistics and AI safety. The principal researchers:
Daniel Murfet, mathematician and SLT expert, University of Melbourne.
Susan Wei, statistician and SLT expert, University of Melbourne.
Jesse Hoogland, MSc. Physics, SERI MATS scholar, RA in Krueger lab
We have a range of projects currently underway, led by one of these principal researchers and involving a number of other PhD and MSc students from the University of Melbourne and collaborators from around the world. In an organizational capacity you would also interact with Alexander Oldenziel and Stan van Wingerden.
You can find us and the broader DevInterp research community on our Discord. Beyond the Developmental Interpretability research agenda, you can read our first preprint on scalable SLT invariants and check out the lectures from the SLT & Alignment summit.
Overview of Projects
Here's the selection of the projects underway, some of which you would be expected to contribute to. These tend to be on the more experimental side:
Developing scalable estimates for SLT invariants: Invariants like the (local) learning coefficient and (local) singular fluctuation can signal the presence of "hidden" phase transitions. Improving these techniques can help us better identify these transitions.
DevInterp of vision models: To what extent do the kinds of circuits studied in the original circuits thread emerge through phase transitions?
DevInterp of program synthesis: In examples where we know there is rich compositional structure, can we see it in the singularities? Practically, this means studying settings like modular arithmetic (grokking), multitask sparse parity, and more complex variants.
DevInterp of in-context learning & induction heads: Is the development of induction heads a proper phase transition in the language of SLT? More ambitiously, can we apply singular learning theory to study in-context learning and make sense of "in-context phase transitions."
DevInterp of language models: Can we detect phase transitions in simple language models (like TinyStories). Can we, from these transitions, discover circuit structure? Can we extend these techniques to larger models (e.g., in the Pythia suite).
DevInterp of reinforcement learning models: To what extent are phase transitions inv...
...more
8min
August 30, 2023 LW - The Economics of the Asteroid Deflection Problem by moyamo
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Economics of the Asteroid Deflection Problem, published by moyamo on August 30, 2023 on LessWrong.
Imagine a world with no ads or paywalls. A world where open-source software gets the same level of funding as proprietary software. A world where people can freely reuse ideas and music without paying royalties. A world where people get paid for writing book reviews. A world where Game-of-Thrones-quality shows are freely available on YouTube. A world where AI safety research gets the same-level of funding as AI capabilities research. Is this a fantasy world? No, this is the world where people use Dominant Assurance Contracts
If you are already convinced you can make this idea a reality by donating to create a Platform for Dominant Assurance Contracts. If you are not convinced read on.
The Free-rider problem
A few months ago I stumbled across this video. (I highly recommend you watch the video, but if you don't have time, I've summarized the video below).
Summary of A Deeper Look at Public Goods
A good is rival if one person's use of a good diminishes another person's ability to benefit from it. Jeans are rival. If I'm wearing a pair of jeans, you can't wear it at the same time. Asteroid deflection is non-rival. If I deflect an asteroid to protect myself, you are saved with no additional cost.
A good is excludable if people who don't pay can be easily prevented from using a good. An example of a good that is excludable is a pair of jeans. You can exclude people by locking the jeans in your closet. An example of a good that is non-excludable is asteroid deflection. You cannot prevent the people who did not pay for the asteroid deflection program from benefiting from the asteroid being deflected.
A good which is both rival and excludable is called a private good. A good which is non-rival and non-excludable is called a public good.
(Additionally, goods which are excludable and non-rival are called club goods, and goods which are non-excludable but rival are called common resources. We won't be focusing on these types of goods, but I've mentioned them for completeness)
ExcludableNon-ExcludableRivalPrivate GoodCommon ResourcesNon-RivalClub GoodPublic Good
Markets are good at providing private goods because
by excluding people who don't pay, consumers are incentivized to pay, which incentivizes producers to produce, and
since private goods are non-rival it is efficient to exclude consumers who aren't willing to pay (if the benefit to the consumer was greater than the cost, the consumer would be willing to pay).
Public goods challenge markets because
consumers who don't pay can't be excluded, consumers are incentivized instead to free-ride (i.e. benefit from the good without paying) and thus producers have no incentive to produce.
Additionally, even if we could figure out a way to exclude non-payers, (e.g. by executing everyone who doesn't pay for the asteroid deflection program), it is inefficient to do so (being non-rival means there are no additional costs to non-payers benefiting).
Examples of public goods
Information
Journalism
Prediction markets
Scientific research
Educational material
Media
TV series
Movies
YouTube videos
Books
Short-stories
Art
Podcasts
Software
Safety
Neighborhood watches
Vaccines and other public health interventions
AI safety
Military defense
Public spaces
Public roads
Public parks
Isn't there a clever mechanism to solve the free-rider problem?
This video stuck with me. The fact the public goods are inefficiently provided by the market seems like the main issue with our civilization. Heck, AI Safety is a public good.
The other thing that stuck is that this seems so solvable. Surely, there is a clever mechanism that can fix this issue? So I went to the Wikipedia page of the Free-rider Problem and scrolled to the bottom, and lo and behold it was j...
...more
25min
August 30, 2023 EA - Language models surprised us by Ajeya
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Language models surprised us, published by Ajeya on August 30, 2023 on The Effective Altruism Forum.
Note: This post was crossposted from Planned Obsolescence by the Forum team, with the author's permission. The author may not see or respond to comments on this post.
Most experts were surprised by progress in language models in 2022 and 2023. There may be more surprises ahead, so experts should register their forecasts now about 2024 and 2025.
Kelsey Piper co-drafted this post. Thanks also to Isabel Juniewicz for research help.
If you read media coverage of ChatGPT - which called it 'breathtaking', 'dazzling', 'astounding' - you'd get the sense that large language models (LLMs) took the world completely by surprise. Is that impression accurate?
Actually, yes. There are a few different ways to attempt to measure the question "Were experts surprised by the pace of LLM progress?" but they broadly point to the same answer: ML researchers, superforecasters, and most others were all surprised by the progress in large language models in 2022 and 2023.
Competitions to forecast difficult ML benchmarks
ML benchmarks are sets of problems which can be objectively graded, allowing relatively precise comparison across different models. We have data from forecasting competitions done in 2021 and 2022 on two of the most comprehensive and difficult ML benchmarks: the MMLU benchmark and the MATH benchmark.
First, what are these benchmarks?
The MMLU dataset consists of multiple choice questions in a variety of subjects collected from sources like GRE practice tests and AP tests. It was intended to test subject matter knowledge in a wide variety of professional domains. MMLU questions are legitimately quite difficult: the average person would probably struggle to solve them.
At the time of its introduction in September 2020, most models only performed close to random chance on MMLU (~25%), while GPT-3 performed significantly better than chance at 44%. The benchmark was designed to be harder than any that had come before it, and the authors described their motivation as closing the gap between performance on benchmarks and "true language understanding":
Natural Language Processing (NLP) models have achieved superhuman performance on a number of recently proposed benchmarks. However, these models are still well below human level performance for language understanding as a whole, suggesting a disconnect between our benchmarks and the actual capabilities of these models.
Meanwhile, the MATH dataset consists of free-response questions taken from math contests aimed at the best high school math students in the country. Most college-educated adults would get well under half of these problems right (the authors used computer science undergraduates as human subjects, and their performance ranged from 40% to 90%).
At the time of its introduction in January 2021, the best model achieved only about ~7% accuracy on MATH. The authors say:
We find that accuracy remains low even for the best models. Furthermore, unlike for most other text-based datasets, we find that accuracy is increasing very slowly with model size. If trends continue, then we will need algorithmic improvements, rather than just scale, to make substantial progress on MATH.
So, these are both hard benchmarks - the problems are difficult for humans, the best models got low performance when the benchmarks were introduced, and the authors seemed to imply it would take a while for performance to get really good.
In mid-2021, ML professor Jacob Steinhardt ran a contest with superforecasters at Hypermind to predict progress on MATH and MMLU. Superforecasters massively undershot reality in both cases.
They predicted that performance on MMLU would improve moderately from 44% in 2021 to 57% by June 2022. The actual performance was 68%, which s...
...more
12min
August 30, 2023 EA - LTFF and EAIF are unusually funding-constrained right now by Linch
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: LTFF and EAIF are unusually funding-constrained right now, published by Linch on August 30, 2023 on The Effective Altruism Forum.
Summary
EA Funds aims to empower thoughtful individuals and small groups to carry out altruistically impactful projects - in particular, enabling and accelerating small/medium-sized projects (with grants <$300K). We are looking to increase our level of independence from other actors within the EA and longtermist funding landscape and are seeking to raise ~$2.7M for the Long-Term Future Fund and ~$1.7M for the EA Infrastructure Fund (~$4.4M total) over the next six months.
Why donate to EA Funds? EA Funds is the largest funder of small projects in the longtermist and EA infrastructure spaces, and has had a solid operational track record of giving out hundreds of high-quality grants a year to individuals and small projects. We believe that we're well-placed to fill the role of a significant independent grantmaker, because of a combination of our track record, our historical role in this position, and the quality of our fund managers.
Why now? We think now is an unusually good time to donate to us, as a) we have an unexpectedly large funding shortage, b) there are great projects on the margin that we can't currently fund, and c) more stabilized funding now can give us time to try to find large individual and institutional donors to cover future funding needs.
Importantly, Open Philanthropy is no longer providing a guaranteed amount of funding to us and instead will move over to a (temporary) model of matching our funds 2:1 ($2 from them for every $1 from you, up to 3.5M from them per fund).
Where to donate: If you're interested, you can donate to either Long-Term Future Fund (LTFF) or EA Infrastructure Fund (EAIF) here.
Some relevant quotes from fund managers:
Oliver Habryka
I think the next $1.3M in donations to the LTFF (430k pre-matching) are among the best historical grant opportunities in the time that I have been active as a grantmaker. If you are undecided between donating to us right now vs. December, my sense is now is substantially better, since I expect more and larger funders to step in by then, while we have a substantial number of time-sensitive opportunities right now that will likely go unfunded.
I myself have a bunch of reservations about the LTFF and am unsure about its future trajectory, and so haven't been fundraising publicly, and I am honestly unsure about the value of more than ~$2M, but my sense is that we have a bunch of grants in the pipeline right now that are blocked on lack of funding that I can evaluate pretty directly, and that those seem like quite solid funding opportunities to me (some of this is caused by a large number of participants of the SERI MATS program applying for funding to continue the research they started during the program, and those applications are both highly time-sensitive and of higher-than-usual quality).
Lawrence Chan
"My main takeaway from [evaluating a batch of AI safety applications on LTFF] is [LTFF] could sure use an extra $2-3m in funding, I want to fund like, 1/3-1/2 of the projects I looked at." (At the current level of funding, we're on track to fund a much lower proportion).
Related links
EA Funds organizational update: Open Philanthropy matching and distancing
Long-Term Future Fund: April 2023 grant recommendations
What Does a Marginal Grant at LTFF Look Like?
Asya Bergal's Reflections on my time on the Long-Term Future Fund
Linch Zhang's Select examples of adverse selection in longtermist grantmaking
Our Vision
We think there is a significant shortage of independent funders in the current longtermist and EA infrastructure landscape, resulting in fewer outstanding projects receiving funding than is good for the world. Currently, the primary source of funding for these projects...
...more
31min
August 29, 2023 LW - Trying a Wet Suit by jefftk
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Trying a Wet Suit, published by jefftk on August 29, 2023 on LessWrong.
I get cold very quickly in the water, enough that unless it's close to body temperature I get chilled through within ~15min. This mostly wasn't a problem, because I'd take a quick dip to cool off and then hang out on the beach, but now that I have kids they (and I) want lots of swimming together time. When I touched on this a few weeks ago people recommended trying a wetsuit, and yesterday evening I did for the first time!
It was different in a bunch of ways, but on balance I like it a lot. Some things I noticed:
Initially there was some air in my suit, which felt funny bubbling out.
The suit is slightly buoyant, taking some getting used to.
It still felt cold getting in, until my body had time to heat up the trapped layer of water.
I didn't get cold in the water! This was with maybe ~78F water and ~82F air. I could play with my kids until they wanted to get out of the water.
I bought separate pants and a vest, which I wore under my swimsuit. The pants worked very well, and the vest worked ok: I got a bit of water rotating through where they met, and I occasionally needed to pull the vest down. Possibly a full body suit would have been better? But those seem more annoying to get in and out of, and the amount of water moving through was pretty low with the kind of playing we were doing.
Once I got out of the water, I stayed wet a lot longer. This was warmer than usual for the first part, since it was a warm sort of wet, though after we got to the stage where I normally would have been fully dry (~45m?) I was mildly colder.
Overall I'm happy, and am looking forward to swimming with it again in the future!
(My kids are now asking if they can get ones too, which is fine with me!)
Comment via: facebook, mastodon
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
...more
2min

FAQs about The Nonlinear Library:

How many episodes does The Nonlinear Library have?

The podcast currently has 9,862 episodes available.