Robots Talking

AI vs. The Arcade: How Human Games Are Redefining General Intelligence


Listen Later

AI vs. The Arcade: How Human Games Are Redefining General Intelligence

Have you ever wondered why we, as humans, are so obsessed with games? From the strategic depth of Chess to the frantic tapping of Flappy Bird, we spend countless hours in digital and physical playgrounds. According to recent research, this isn't just about killing time—it’s actually a cornerstone of our General Intelligence.

Games are "structured microcosms" of the real world. When we play, we are actually practicing skills like resource management, social deduction, and physical navigation in a safe, fun environment. Now, researchers from institutions like MIT and Harvard are using this "Multiverse of Human Games" to see if artificial intelligence can finally keep up with us.

The AI GAMESTORE: A Never-Ending Test

Evaluating how "smart" an AI really is has become a massive challenge. Traditional tests often focus on narrow tasks like solving a specific math problem or writing code. But being good at one thing doesn't mean a machine has the versatility of a human adult.

To bridge this gap, researchers built the AI GAMESTORE. This platform uses LLMs (Large Language Models) to automatically source and adapt popular games from the Apple App Store and Steam into standardized tests for machines. By having artificial intelligence play 100 different games—ranging from Angry Birds clones to complex puzzles—the researchers could measure its ability to learn and adapt just like a human would.

The Scoreboard: Humans vs. Machines

The researchers pitted seven of the world's most advanced LLMs (including frontier models like GPT-5.2 and Gemini 2.5 Pro) against 106 human players. The goal was simple: play the first two minutes of a new game and see who scores higher.

The results were a wake-up call for the tech world:

  • The Massive Gap: Even the best AI models achieved less than 10% of the average human score on the majority of the games.
  • Thinking Time: While humans reacted in real-time, the machines took 15 to 20 times longer to "think" about their next move.
  • Total Failure: In about 30-40% of the games, the models couldn't make any meaningful progress at all, scoring near zero.
  • Why is the AI Struggling?

    You might think a supercomputer could easily beat a human at a "casual" mobile game, but the AI GAMESTORE revealed three major "cognitive bottlenecks" where machines fail:

    1. Memory: AI often "forgets" what happened just a few frames ago, making it hard to navigate maps or track changing goals.
    2. Planning: Humans are great at thinking several steps ahead (e.g., "If I pour this liquid here, I can move that block later"). Current models struggle with this multi-step logic.
    3. World-Model Learning: When you start a new game, you quickly "get" the rules—gravity makes things fall, and touching a spike is bad. AI still struggles to infer these hidden rules through active play.
    4. What’s Next for General Intelligence?

      This research shows that while artificial intelligence is getting better at talking and coding, it still lacks the "cognitive versatility" of a typical human. The "Multiverse of Human Games" provides a way to track this progress through a "living" benchmark that can't be easily cheated or memorized.

      The ultimate goal isn't just to build a better gamer. It’s to develop AI that can interact with the world as flexibly, safely, and intuitively as we do. Until then, it looks like your high score on the App Store is safe!

      ...more
      View all episodesView all episodes
      Download on the App Store

      Robots TalkingBy mstraton8112