Second-guessing the Compiler

1 Sep 2003

Paul R. Potts

So, did you hear the one about the programmer who decided to rewrite all the logical tests of the form

if ((x) || (y))

if !(!(x)) && (!(y))

On the grounds that the PowerPC uses “NAND gates,” where the Pentium used “AND gates,” so the second expression would run faster on PowerPC hardware?

When I heard this, it caused me to utter some kind of sound… I don’t remember the details, but I think it involved spraying coffee all over my monitor.

Of course, the two are logically equivalent (work out the truth table for yourself, if you don’t believe me). I don’t have any idea whether there really is a difference in the performance of OR logical operators on the PowerPC. I sincerely doubt it, keeping in mind that even the assembly instructions are abstract, as far as the hardware is concerned, and I don’t have any way of knowing what is really happening in the hardware when a simple OR test is executed. If the hardware chose to execute the OR comparison using NOT and AND, I’d never know, and wouldn’t care. But the second one certainly looks a lot more obscure, and that was the programmer’s real point. (The “baffle ’em with bullshit” defense; if it looks complex, it must be complex; it will be harder for someone else to maintain; perhaps it will ensure job security.) (Don’t bet on it; if anyone who worked under my supervision wrote this without a very good reason, he or she would be out on his or her ass).

It gets better. CodeWarrior is pretty good compiler. It looked at this code, and determined, pretty much as a human could with a little thought, that it made more sense to reduce the code to a simple logical OR. So that’s what it did. So even if the original programmer had been right, the processor wasn’t executing the logic he wrote. He hadn’t looked at the resulting code. So there was, for yet another reason, no reason to write it that way.

Now, the people who wrote CodeWarrior’s optimizer aren’t dumb. It has undergone years of tweaking by very smart people. If there was some great optimization to be gained by rewriting logical operations to support the PowerPC’s “NAND gates” more efficiently, they would have implemented it; it would be described in the PowerPC documentation, to guide compiler writers; and programmers would be griping about it. IBM and Motorola have a vested information in getting optimization advice out there, to make their chips appear more competitive. There isn’t a reason to rewrite the logic like this, so they didn’t.

If you’ve been living under a rock and haven’t heard: optimize after you get it working. Optimize what you can measure. But the best initial optimization you can do on your code is to design it well and express it clearly. After you’ve tested it, crank up the compiler optimizations and test it some more. Measure its performance. Profile the hot spots. Optimize those parts. It doesn’t make sense to waste effort optimizing instructions that are only executed once, during the startup of the program, which is not noticeably slow. If your program is slow on a modern CPU, it is far more likely that you are doing something wrong algorithmically: looking up some information by traversing a long linked list every time a function is called, for example, instead of using a more efficient structure such as a tree or hash.

Don’t get me wrong: there’s a place for serious hand-optimization. I’ve worked hard to hand-optimize DSP assembly code in order to reduce the number of cycles necessary to restart a disconnected data transfer across a PCI bus. I’ve tweaked interrupt routines to block for as few instructions as possible. I’ve also worked to determine why a program that draws animated meters is using thousands of times more CPU time than I expected. (Because it was drawing far too much, far too often, due to a bug that was easy to find by single-stepping the code with a source-level debugger). But in these cases I had some way, even if it was an imperfect way, of measuring the results. And you can bet your ass I was carefully commenting the code to explain why the implementation no longer appeared to be as simple and straightforward as possible. Not just to benefit some abstract future maintenance programmer; that maintenance programmer could be someone I know and love — myself.

Blog Index • Writing Archive