Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #30: Dalle-3 and GPT-3.5-Instruct-Turbo, published by Zvi on September 21, 2023 on LessWrong.
We are about to see what looks like a substantial leap in image models. OpenAI will be integrating Dalle-3 into ChatGPT, the pictures we've seen look gorgeous and richly detailed, with the ability to generate pictures to much more complex specifications than existing image models. Before, the rule of thumb was you could get one of each magisteria, but good luck getting two things you want from a given magisteria. Now, perhaps, you can, if you are willing to give up on adult content and images of public figures since OpenAI is (quite understandably) no fun.
We will find out in a few weeks, as it rolls out to ChatGPT+ users.
As usual a bunch of other stuff also happened, including a model danger classification system from Anthropic, OpenAI announcing an outside red teaming squad, a study of AI impact on consultant job performance, some incremental upgrades to Bard including an extension for GMail, new abilities to diagnose medical conditions and some rhetorical innovations.
Also don't look now but GPT-3.5-Turbo-Instruct plays Chess at 1800 Elo, and due to its relative lack of destructive RLHF seems to offer relatively strong performance at a very low cost and very high speed, although for most purposes its final quality is still substantially behind GPT-4.
Table of Contents
Introduction.
Table of Contents.
Language Models Offer Mundane Utility. GPT-4 boosts consultant productivity.
Language Models Don't Offer Mundane Utility. Do we want to boost that?
Level Two Bard. Some improvements, I suppose. Still needs a lot of work.
Wouldn't You Prefer a Good Game of Chess? An LLM at 1800 Elo. World model.
GPT-4 Real This Time. GPT-3.5-Instruct-Turbo proves its practical use, perhaps.
Fun With Image Generation. Introducing Dalle-3.
Deepfaketown and Botpocalypse Soon. Amazon limits self-publishing to 3 a day.
Get Involved. OpenAI hiring for mundane safety, beware the double-edged sword.
Introducing. OpenAI red team network, Anthropic responsible scaling policy.
In Other AI News. UK government and AI CEO both change their minds.
Technical Details. One grok for grammar, another for understanding.
Quiet Speculations. Michael Nielsen offers extended thoughts on extinction risk.
The Quest for Sane Regulation. Everyone is joining the debate, it seems.
The Week in Audio. A lecture about copyright law.
Rhetorical Innovation. We keep trying.
No One Would Be So Stupid As To. Are we asking you to stop?
Aligning a Smarter Than Human Intelligence is Difficult. Asimov's laws? No.
I Didn't Do It, No One Saw Me Do It, You Can't Prove Anything. Can you?
People Are Worried About AI Killing Everyone. Yet another round of exactly how.
Other People Are Not As Worried About AI Killing Everyone. Tony Blair.
The Lighter Side. Jesus flip the tables.
Language Models Offer Mundane Utility
Diagnose eye diseases. This seems like a very safe application even with false positives, humans can verify anything the AI finds.
Diagnose foetal growth restrictions early.
In theory and technically using graph neural networks, use the resulting 'reading mode' in Android or Chrome to strip out the words from a webpage, in an actually readable size and font, much more accurate than older attempts. Seems you have to turn it on under chrome flags.
GPT-4 showing some solid theory of mind in a relatively easy situation. Always notice whether you are finding out it can do X consistently, can do X typically, or can do X once with bespoke prompting.
The same with failure to do X. What does it mean that a model would ever say ~X, versus that it does all the time, versus it does every time? Each is different.
How to convince people who are unimpressed by code writing that LLMs are not simply parrots? Eliezer asked on Twitter, and said ...