Performance Speed Limits

travisdowns.github.io
7 min read
fairly easy
A laundry list of speed limits that your code can't exceed.
How fast can it go?

Sometimes you just want to know how fast your code can go, without benchmarking it. Sometimes you have benchmarked it and want to know how close you are to the maximum speed. Often you just need to know what the current limiting factor is, to guide your optimization decisions.

Well this post is about that determining that speed limit. It's not a comprehensive performance evaluation methodology, but for many small pieces of code it will work very well.

Table of Contents

This post is intended to be read from top to bottom, but if it's not your first time here or you just want to skip to a part you find interesting, here you go:

The Limits

There are many possible limits that apply to code executing on a CPU, and in principle the achieved speed will simply be determined by the lowest of all the limits that apply to the code in question. That is, the code will execute only as fast as its narrowest bottleneck.

So I'll just list some of the known bottlenecks here, starting with common factors first, down through some fairly obscure and rarely discussed ones. The real-world numbers come mostly from Intel x86 CPUs, because that's what I know off of top of my head, but the concepts mostly apply in general as well, although often different limit values.

Where possible I've included specific figures for modern Intel chips and sometimes AMD CPUs. I'm happy to add numbers for other non-x86 CPUs if anyone out there is interested in providing them.

Big List of Caveats

First, lets start with this list of very important caveats.

The limits discussed below generally apply to loops of any size, and also straight line or branchy code with no loops in sight. Unfortunately, it is only really possible to apply the simple analyses described to loops, since a steady state will be reached and the limit values will apply. For most straight line code, however, no steady state is reached and the actual behavior depends on many details of the architecture such as…
Travis Downs
Read full article