The Nonlinear Library

LW - AI #23: Fundamental Problems with RLHF by Zvi


Listen Later

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #23: Fundamental Problems with RLHF, published by Zvi on August 3, 2023 on LessWrong.
After several jam-packed weeks, things slowed down to allow everyone to focus on the potential room temperature superconductor, check Polymarket to see how likely it is we are so back and bet real money, or Manifold for chats and better graphs and easier but much smaller trading.
The main thing I would highlight this week are an excellent paper laying out many of the fundamental difficulties with RLHF, and a systematic new exploit of current LLMs that seems to reliably defeat RLHF.
I'd also note that GPT-4 fine tuning is confirmed to be coming. That should be fun.
Table of Contents
Introduction.
Table of Contents.
Language Models Offer Mundane Utility. Here's what you're going to do.
Language Models Don't Offer Mundane Utility. Universal attacks on LLMs.
Fun With Image Generation. Videos might be a while.
Deepfaketown and Botpocalypse Soon. An example of doing it right.
They Took Our Jobs. What, me worry?
Get Involved. If you share more opportunities in comments I'll include next week.
Introducing. A bill. Also an AI medical generalist.
In Other AI News. Fine tuning is coming to GPT-4. Teach LLMs arithmetic.
Quiet Speculations. Various degrees of skepticism.
China. Do not get overexcited.
The Quest for Sane Regulation. Liability and other proposed interventions.
The Week in Audio. I go back to The Cognitive Revolution.
Rhetorical Innovation. Bill Burr is concerned and might go off on a rant.
No One Would Be So Stupid As To. Robotics and AI souls.
Aligning a Smarter Than Human Intelligence is Difficult. RLHF deep dive.
Other People Are Not As Worried About AI Killing Everyone. It'll be fine.
The Wit and Wisdom of Sam Altman. Don't sleep on this.
The Lighter Side. Pivot!
Language Models Offer Mundane Utility
Make a ransom call, no jailbreak needed. Follows the traditional phone-calls-you-make-are-your-problem-sir legal principle. This has now been (at least narrowly, for this particular application) fixed or broken, depending on your perspective.
See the FAQ:
Back in AI#3 we were first introduced to keeper.ai, the site that claims to use AI to hook you up with a perfect match that meets all criteria for both parties so you can get married and start a family, where if you sign up for the Legacy plan they only gets paid when you tie the knot. They claim 1 in 3 dates from Keeper lead to a long term relationship. Aella has now signed up, so we will get to see it put to the test.
Word on Twitter is the default cost for the keeper service is $50k. If it actually works, that is a bargain. If it doesn't, depends if you have to deposit in advance, most such startups fail and that is a lot to put into escrow without full confidence you'll get it back.
I continue to think that this service is a great idea if you can get critical mass and make reasonable decisions, while also not seeing it as all that AI. From what I can tell the AI is used to identify potential matches for humans to look at, but it is not clear to me (A) how any match can ever truly be 100%, I have never seen one, not everything is a must-have and (B) how useful an AI is here when you need full reliability, you still need the humans to examine everything and I'd still instinctively want to mostly abstract things into databases? Some negative selection should be useful in saving time, but that seems like about it?
An early experiment using LLMs as part of a recommendation algorithm. LLMs seem useful for getting more useful and accurate descriptor labels of potential content, and could potentially help give recommendations that match arbitrary user criteria. Or potentially one could use this to infer user criteria, including based on text-based feedback. I am confident LLMs are the key to the future of recommendation engines, b...
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear LibraryBy The Nonlinear Fund

  • 4.6
  • 4.6
  • 4.6
  • 4.6
  • 4.6

4.6

8 ratings