The Hidden Trap of GPU-Based AI: The Von Neumann Bottleneck

ericwoodell
Feb 13, 2024
8 min read

Updated: Apr 4, 2024

At this risk of sounding provocative, I will begin by stating for the record, that GPU-based AI systems are doomed to failure.

I’ve already explained why this is true in “AI Boom to Doom: The Dark Side of Data Center Proliferation.” It boils down to finite amounts of power and water available, simple as that. Data centers are notoriously expensive in terms of power and water consumption, and GPU-based AI systems consume 5-10 times more for a given footprint. Worse, the power densities that GPU-based AI requires- the amount of power consumed in a single computer rack- goes from a maximum of 20 kilowatts (kW) per rack, up to 100 kW per rack.

Put in perspective, the average American household requires 1.2 kW. A computer rack is roughly the same footprint as a double-door refrigerator, going all the way up to the ceiling, consuming power for ~17 homes. The waste byproduct for all that computing power is heat. From experience I can tell you that when a computer rack is loaded with 20 kW of IT equipment, you’re at the ragged edge of being able to adequately cool that equipment so it won’t overheat and die.

GPU-based AI systems are at least five times worse than typical IT equipment, due to how they’re made and operate. In this case, a single computer rack consumes more power than 80 American homes, and (again) the waste byproduct of that power usage is heat. FAR more heat than air cooling systems can use, requiring the necessitation of new cooling strategies utilizing direct liquid cooling, or DLC. These systems have a variety of problems on their own, and as this brilliant analysis explains, their effectiveness in real-world applications is far from certain.

If you recognize the absurdity of this situation- at a time when grid reliability is no longer a certainty- then let me say “congratulations,” common-sense is still alive and well.

The Von Neumann Architecture

Now that we know that IT assets- and AI assets in particular- are energy hogs, we need to ask WHY?

The glib answer would be “physics,” but there’s far more to it than that. WHY are physics the limiting factor? The answer to that question goes all the way back to 1945, when John Von Neumann developed the protocols for how computer processors handled data.

Forgive me for copying and pasting from the Wikipedia description, but it explains the situation overall:

The von Neumann architecture—also known as the von Neumann model or Princeton architecture—is a computer architecture based on a 1945 description by John von Neumann, and by others, in the First Draft of a Report on the EDVAC.[1] The document describes a design architecture for an electronic digital computer with these components:

· A processing unit with both an arithmetic logic unit and processor registers

· A control unit that includes an instruction register and a program counter

· Memory that stores data and instructions

· External mass storage

· Input and output mechanisms[1][2]

The term "von Neumann architecture" has evolved to refer to any stored-program computer in which an instruction fetch and a data operation cannot occur at the same time (since they share a common bus). This is referred to as the von Neumann bottleneck, which often limits the performance of the corresponding system.[3]umann architecture" has evolved to refer to any stored-program computer in which an instruction fetch and a data operation cannot occur at the same time (since they share a common bus). This is referred to as the von Neumann bottleneck, which often limits the performance of the corresponding system.[3]

That sounds all fine and well, but what does it really mean? The Von Neumann architecture was designed to assist programmers in writing code, at the expense of efficiency in the processor. At that time, a processor was predicted to be able to perform up to 20,000 operations per second, a phenomenal speed for the time! So there was no worry about efficiency of the processor, it was so fast it wouldn’t matter, especially for the limited amount of data that was being fed to the processor. But now…? Now the sheer volume of data is choking the whole system.

To use an analogy, imagine you’re at an intersection in a small town, with minimal traffic controlled by a police officer with a stop sign, signaling drivers to stop or go as the circumstances require. It’s not ideal by any means, but it’s do-able.

Now apply that method in a large city, say Chicago, for example. The intersection now has multiple lanes going each direction, there are LOTS of cars, and the traffic is controlled by the same police officer with a stop-sign, directing traffic. And he’s not terribly bright, so he applies the same rules he used in the small town, allowing only one car proceed through the intersection at a time. What would be the result of this? A massive traffic-jam.

Similarly, every cycle of the processor is that policeman letting through a single bit of information, or fetching an instruction, then pulling data from the memory unit, then sending out a single bit of information to the output device (such as your monitor), then fetching another instruction, pulling data, then another bit to the output device, and so on.

Moore’s Law Is Indeed Dead, But NOT For the Reasons You Think

Up until recently, the limitations of the Von Neumann architecture haven’t been a significant problem. We can increase the speed of the processors, as demonstrated by Moore’s Law, which states the number of transistors in an integrated circuit doubles every two years, effectively doubling processor speed every two years. But Moore’s Law is now dead, as (ironically) pronounced by Jensen Huang, cofounder and CEO of Nvidia, in September 2022.

That is to say, doubling the number of transistors every two years is not doubling the speed, to keep up with computing demands.

Moore’s Law is indeed dead, but not due to the physical limitations of integrated circuits, but because of the timing constraints inherent to the Von Neumann bottleneck.

The bottleneck is the constraint that killed Moore’s Law, it’s really that simple.

Let me restate this: You can increase the speed of the processor, but the time required to handle fetch commands, output commands, input commands and individual bits of data one-at-a-time…? The speed of those individual bits of data moving around on a common bus becomes a physical limitation that cannot be overcome, using the Von Neumann architecture.

Ironically, it is Nvidia who is now the leader increasing processing speed, by simply cramming multiple Graphical Processing Units- GPUs- onto computer circuit boards, dividing up the computation loads between them on multiple busses, offering a ham-fisted solution to deliver enough processing power for Artificial Intelligence- AI.

In essence, what Nvidia couldn’t accomplish with a single sledgehammer to smash through the limitations imposed by the Von Neumann architecture, they decided to smash with EIGHT sledgehammers, instead. The result is massive power consumption, heat generation and correspondingly, cooling requirements. It’s a crude approach that maximizes short-term monetary gains, but it’s doomed to failure because of the massive power and cooling requirements.

Rethinking Antiquated Conventions

EVERY manufacturer, every builder, every creator of anything meaningful, abides by certain conventions of what’s acceptable within their particular space.

The maker of car engines accepts that the grade of gasoline to be used will be what’s readily available in the retail market, from 87 to 91 octane.
A maker of coffee pots assumes that 120 volts and 60Hz will be the power source, in the USA.
A home building company assumes that 2x4 wood will be the most readily available source for wall studs.

In the same way, computer chip manufacturers assume the Von Neumann protocol is the standard (if they are even aware of the protocol), and work within other industry standards for size, power source voltage, maximum temperature, and so on. They’re not in the business of changing the Von Neumann protocol.

Similarly, software engineers operate within established confines for their industry, IT end-users operate within their established confines, etc., etc.

In other words, nobody inside industry has worked at, much less seriously considered, challenging the Von Neumann architecture. There are attempts at quantum computing, where the environmental requirements are even more absurd than the Nvidia solutions... And then there is Neuromorphic computing.

Neuromorphic computing attempts to mimic how the organic brain functions, instead of the one-bit-at-a-time approach of the Von Neumann architecture. And this is where the real solution to viable AI lies.

Enter I/ONX

I/ONX is a company that reached out to me for my input.

Their approach was to throw out the Von Neumann architecture and build a completely new instruction set for computer processors called Kore, which eliminates the time limitations imposed by the one-bit-at-a-time approach.

To go back to the analogy of the traffic officer in Chicago, they’ve replaced the single stop-sign, with a normal traffic-light setup. For each change of the light, tens or hundreds of cars can move across an intersection, and there’s no more traffic jam. Similarly, each cycle of the processor now lets through a stream of data, then with the next cycle of the processor, fetch commands, then the next cycle of the processor, data output streams, and so on.

THIS is solution to the Von Neumann bottleneck.

But then I/ONX combines Kore software with hardware specifically designed to capitalize on the efficiencies gathered by the software, resulting in an AI computing solution that matches speed with the best in the industry, while consuming ~10% of the power, and generating around 2-3% of the waste byproduct, heat.

The end result is that while Nvidia can stuff 40 cores into a computer rack weighing some 1500 pounds, drawing 50 kW of power and requiring direct liquid cooling, I/ONX appears poised to deliver an alternative solution that can match the same compute capabilities on 2 cards the size of HP Blades, consuming a mere 200 watts of power, and generating essentially no heat.

Put another way, a fully populated computer rack with I/ONX hardware will be able to deliver 20x more compute capability than a fully populated Nvidia rack while consuming <6 kW of power and creating (again) essentially no heat. In other words, the efficiency of their approach seems so good that they’ve eliminated the need for any cooling system. And it would weigh in around 1200 pounds, 20% less!

Can This Be Real???

Now, at this point, I know many others in the industry will say this is complete nonsense; such things as I describe are impossible!

Umm... Yeah.

That’s EXACTLY what I thought, when I/ONX first approached me. In effect, the things they were claiming simply weren’t credible, couldn’t POSSIBLY be true...

What I’ve learned over the past few months has completely upended everything I’ve seen in the critical facilities space and forced me to re-evaluate every rule we’ve always accepted as fact. I’ve done the calculations myself, examined their approach from every way I can… and the numbers track.

The implications for this cannot be understated; a sustainable approach to AI, to enterprise IT computing, even to consumer electronics such as cell phones and laptops, all made possible by throwing out the antiquated standard that has held back the industry.

Let me say it plainly: GPU-based AI is already a zombie ecosystem; it’s dead, it just doesn't know it yet.

What’s coming, I believe, is as big of a leap forward for IT, as going from the horses and buggy-whips to modern cars.

https://youtu.be/62kxPyNZF3Q?si=xo8BGvCriAey03S5

My Predictions for the Data Center Industry

As this plays out, I predict the following:

The pushback from local communities who are expected to sacrifice their power and ground water for a new data center built by Leviathan companies, will help push back the bubble-economy that is the data center construction bubble. People are waking up to the costs, in terms of resources, that the ballooning data center industry represents, and they’re not going to keep getting steam-rolled. Watch and see.
Data centers as an industry eventually going by the wayside, as the dedicated infrastructure will simply be unnecessary. Empty office buildings, a home garage, old moth-balled data centers of the past which were deemed too inefficient, anywhere with a decent power distribution system, will be able to house AI.
AI will become far cheaper and readily available to anyone who wants it; large corporations and government agencies will no longer be able to exploit their monopoly in this space.
Nvidia- while the first to bring an AI solution to the marketplace- will be the next Atari. They were the first to market with video games, rested on their successes of first-mover advantage, and were completely outclassed and history within a few years. But Nvidia’s continued survival is based on a technology that has reached the practical limits of power, water availability and time itself; GPUs.

Nobody can put the AI genie back in the bottle; it’s here, and in its current form it’s big, bad, UGLY. And GPU-based AI is not sustainable.

The only viable alternative that I have seen so far, is the technology being developed by I/ONX.

Keep an eye on them!