Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #27: Portents of Gemini, published by Zvi on August 31, 2023 on LessWrong.
By all reports, and as one would expect, Google's Gemini looks to be substantially superior to GPT-4. We now have more details on that, and also word that Google plans to deploy it in December, Manifold gives it 82% to happen this year and similar probability of being superior to GPT-4 on release.
I indeed expect this to happen on both counts. This is not too long from now, but also this is AI #27 and Bard still sucks, Google has been taking its sweet time getting its act together. So now we have both the UK Summit and Gemini coming up within a few months, as well as major acceleration of chip shipments. If you are preparing to try and impact how things go, now might be a good time to get ready and keep your powder dry. If you are looking to build cool new AI tech and capture mundane utility, be prepared on that front as well.
Table of Contents
Introduction.
Table of Contents. Bold sections seem most relatively important this week.
Language Models Offer Mundane Utility. Summarize, take a class, add it all up.
Language Models Don't Offer Mundane Utility. Not reliably or robustly, anyway.
GPT-4 Real This Time. History will never forget the name, Enterprise.
Fun With Image Generation. Watermarks and a faster SDXL.
Deepfaketown and Botpocalypse Soon. Wherever would we make deepfakes?
They Took Our Jobs. Hey, those jobs are only for our domestic robots.
Get Involved. Peter Wildeford is hiring. Send in your opportunities, folks!
Introducing. Sure, Graph of Thoughts, why not?
In Other AI News. AI gives paralyzed woman her voice back, Nvidia invests.
China. New blog about AI safety in China, which is perhaps a thing you say?
The Best Defense. How exactly would we defend against bad AI with good AI?
Portents of Gemini. It is coming in December. It is coming in December.
Quiet Speculations. A few other odds and ends.
The Quest for Sane Regulation. CEOs to meet with Schumer, EU's AI Act.
The Week in Audio. Christiano and Leahy give talks, Rohit makes his case.
Rhetorical Innovation. Some relatively promising attempts.
Llama No One Stopping This. Meta to open source all Llamas no matter what.
No One Would Be So Stupid As To. Bingo, sir.
Aligning a Smarter Than Human Intelligence is Difficult. Davidad has a plan.
People Are Worried About AI Killing Everyone. Roon, the better critic we need.
Other People Are Not As Worried About AI Killing Everyone. Consciousness?
The Wit and Wisdom of Sam Altman. Do you feel lucky? Well, do ya?
The Lighter Side. The big time.
Language Models Offer Mundane Utility
A class on the economics of ChatGPT, complete with podcast recording. More like this, please, no matter my quibbles. I especially don't think survey courses, in economics or elsewhere, are the way to go. Focus on what matters and do something meaningful rather than try to maximize gesturing. If you let me teach students with other majors one economics class, teach them the basics of micro and then use that to explore what matters sounds like a great plan. So is getting students good at using LLMs.
Use algorithmic instructions to let LLMs accurately do tasks like 19-digit addition.
Summarize writing. It seems GPT-4 summaries are potentially more accurate than humans ones.
We encountered two practical problems:
Not following instructions. Bigger models were better at following instructions. We had to use another LLM to understand the outputs of the smaller LLMs and work out if it said A or B was the answer.
Ordering bias. Given A and B, are you more likely to suggest A simply because it is first? One way to test this is to swap the ordering and see how many times you say A both times or B both times.
Once we dealt with these problem we saw:
Human: 84% (from past research)
gpt-3.5-turbo: 67.0% correct (seemed to h...