September 02, 2024

There is no Moore's Law for AI (TLP 2024w35)

7 minutes

I feel like we're in the vertical scaling phase of language model design, I look forward to federated model architectures running on cheap commodity hardware.

Notes:

In my online conversations on X, I see a lot of technologists assuming that large language models like GPT will grow in size and power indefinitely.

It often reminds me of Moore's law, which was coined by the late co-founder of Intel, Gordon Moore.

In relation the CPU release cycles, "Moore's law is the observation that the number of transistors in an integrated circuit (IC) doubles about every two years.". Source: https://en.m.wikipedia.org/wiki/Moore%27s_law

In basic terms, that means that a consumer could expect the CPUs running in their devices to double in power about every two years, and this held true for a long time until CPU manufacturers started to hit the limits of physics in terms of how small they could manufacture transistors to be.

Now in my opinion we cannot expect the same doubling of power in large language models like GPT from OpenAI, because they will also eventually hit physical limits on:

The amount of quality training data available, I have described this as being like "the new oil" in a previous episode. Ref: https://techleader.pro/a/643-Tech-Leader-Pro-podcast-2024-week-18,-The-new-knowledge-acquisition-bottleneck

The amount of available computing resources available to host these models: processing, memory, storage, networking etc. are all finite, especially in a vertical hosting model.

Let's look at the previous major GPT releases, and see if my theory holds true:

GPT-1 was released in June 2018. It had 12 decoder layers, and 117 million parameters.

GPT-2 was released in February 2019 (an 8 month gap). It had 48 decoder layers (a 4x increase), and 1.5 billion parameters (a ~13x increase). So far, so good!

GPT-3 was released in May 2020 (a 1 year 3 month gap). It had 96 decoder layers (a 2x increase), and 175 billion parameters (a ~117x increase). Here timelines and decoder layers still fall within a hypothetical Moore's law criteria, while the parameters increased dramatically, great!

GPT-4 was released March 2023 (a 2 year and 10 month gap). It is believed to have 120 layers (a 1.25x increase), and 1.8 trillion parameters (a ~10x increase), but OpenAI have kept such details hidden and these figures are based on a leak. Reference: https://the-decoder.com/gpt-4-architecture-datasets-costs-and-more-leaked/

GPT-5 has yet to be released, with no official date confirmed as of recording this.

Source for GPT-1 to GPT-3 data: https://360digitmg.com/blog/types-of-gpt-in-artificial-intelligence

So what are we to conclude from this? Well clearly by GPT-4, the release timeline and increases in layers violate a hypothetical Moore's law for AI language models, but the parameter increases remain impressive.

It is because of those parameter figures growth however that I often claim they are "brute forcing" AI, and that it's not sustainable long-term as eventually they will start to hit practical limits, and you can already see that parameter growth slowing down between GPT-3 and GPT-4.

It will be very interesting to see what happens with GPT-5, and if we ever see GPT-6.

Frankly speaking, I have doubts about GPT-6 ever appearing without some major architectural changes of direction.

Just like when the CPU industry pivoted from vertical scaling (speed measured in hertz) to horizontal scaling (amount of CPU cores for parallel processing), I fully expect the same pivot to happen with large language models in the years ahead.

In fact I believe that pivot is close.

All physical processes have upper limits. Eventually all data will be consumed, memory limits will be hit in the cloud, processing limits will be reached...

Vertical scaling has limits, while horizontal scaling distributes those limits to multiple environments, which helps to prolong growth.

I feel like we're in the "vertical scaling" phase of language model design, I look forward to federated model architectures running on cheap commodity hardware.

In addition, solutions should scale down as well as up.

We are in the mainframe phase of AI development

I eagerly await small language models that can run on low-powered edge devices in the wild.

I want my talking door like in Ubik by PKD! If you get that reference, I salute you!

What I am working on this week:

Greppr is now at 6.3 million documents indexed.

Media I am enjoying this week:

Maelstrom by Peter Watts, which is part 2 of his Rifter series.

Back playing Escape from Tarkov with the release of patch 0.15.

Notes and subscription links are here: https://techleader.pro/a/658-There-is-no-Moore's-Law-for-AI-(TLP-2024w35)

...more

View all episodes

By John Collins