Tech Leader Pro

There is no Moore's Law for AI (TLP 2024w35)


Listen Later

I feel like we're in the vertical scaling phase of language model design, I look forward to federated model architectures running on cheap commodity hardware.

Notes:

  • In my online conversations on X, I see a lot of technologists assuming that large language models like GPT will grow in size and power indefinitely.
  • It often reminds me of Moore's law, which was coined by the late co-founder of Intel, Gordon Moore.
  • In relation the CPU release cycles, "Moore's law is the observation that the number of transistors in an integrated circuit (IC) doubles about every two years.". Source: https://en.m.wikipedia.org/wiki/Moore%27s_law
  • In basic terms, that means that a consumer could expect the CPUs running in their devices to double in power about every two years, and this held true for a long time until CPU manufacturers started to hit the limits of physics in terms of how small they could manufacture transistors to be.
  • Now in my opinion we cannot expect the same doubling of power in large language models like GPT from OpenAI, because they will also eventually hit physical limits on:
  • The amount of quality training data available, I have described this as being like "the new oil" in a previous episode. Ref: https://techleader.pro/a/643-Tech-Leader-Pro-podcast-2024-week-18,-The-new-knowledge-acquisition-bottleneck
  • The amount of available computing resources available to host these models: processing, memory, storage, networking etc. are all finite, especially in a vertical hosting model.
  • Let's look at the previous major GPT releases, and see if my theory holds true:
    • GPT-1 was released in June 2018. It had 12 decoder layers, and 117 million parameters.
    • GPT-2 was released in February 2019 (an 8 month gap). It had 48 decoder layers (a 4x increase), and 1.5 billion parameters (a ~13x increase). So far, so good!
    • GPT-3 was released in May 2020 (a 1 year 3 month gap). It had 96 decoder layers (a 2x increase), and 175 billion parameters (a ~117x increase). Here timelines and decoder layers still fall within a hypothetical Moore's law criteria, while the parameters increased dramatically, great!
    • GPT-4 was released March 2023 (a 2 year and 10 month gap). It is believed to have 120 layers (a 1.25x increase), and 1.8 trillion parameters (a ~10x increase), but OpenAI have kept such details hidden and these figures are based on a leak. Reference: https://the-decoder.com/gpt-4-architecture-datasets-costs-and-more-leaked/
    • GPT-5 has yet to be released, with no official date confirmed as of recording this.
    • Source for GPT-1 to GPT-3 data: https://360digitmg.com/blog/types-of-gpt-in-artificial-intelligence
    • So what are we to conclude from this? Well clearly by GPT-4, the release timeline and increases in layers violate a hypothetical Moore's law for AI language models, but the parameter increases remain impressive.
    • It is because of those parameter figures growth however that I often claim they are "brute forcing" AI, and that it's not sustainable long-term as eventually they will start to hit practical limits, and you can already see that parameter growth slowing down between GPT-3 and GPT-4.
    • It will be very interesting to see what happens with GPT-5, and if we ever see GPT-6.
    • Frankly speaking, I have doubts about GPT-6 ever appearing without some major architectural changes of direction.
    • Just like when the CPU industry pivoted from vertical scaling (speed measured in hertz) to horizontal scaling (amount of CPU cores for parallel processing), I fully expect the same pivot to happen with large language models in the years ahead.
    • In fact I believe that pivot is close.
    • All physical processes have upper limits. Eventually all data will be consumed, memory limits will be hit in the cloud, processing limits will be reached...
    • Vertical scaling has limits, while horizontal scaling distributes those limits to multiple environments, which helps to prolong growth.
    • I feel like we're in the "vertical scaling" phase of language model design, I look forward to federated model architectures running on cheap commodity hardware.
    • In addition, solutions should scale down as well as up.
    • We are in the mainframe phase of AI development
    • I eagerly await small language models that can run on low-powered edge devices in the wild.
    • I want my talking door like in Ubik by PKD! If you get that reference, I salute you!
    • What I am working on this week:
      • Greppr is now at 6.3 million documents indexed.
      • Media I am enjoying this week:
        • Maelstrom by Peter Watts, which is part 2 of his Rifter series.
        • Back playing Escape from Tarkov with the release of patch 0.15.
        • Notes and subscription links are here: https://techleader.pro/a/658-There-is-no-Moore's-Law-for-AI-(TLP-2024w35)

          ...more
          View all episodesView all episodes
          Download on the App Store

          Tech Leader ProBy John Collins