When Theo de Raadt went public last week with comments about bugs in the Intel Core Duo I believed everything he said - and for two reasons:
Many slashdotters, a site which attracts lots of Linux and BSD users, share my reaction. In fact of 356 comments on the story as of early last Sunday morning an easy majority were in some way supportive of de Raadt's credentials and/or position while only about 30 engaged in out and out "Theo Bashing."
For obvious reasons the Wintel fraternity had pretty much the opposite reaction. Dan Goodin's story on the register only had 20 comments at the time I checked it, but those 20 included several ill informed ad hominem attacks like this one:
This guy, along with the whole "most famous hackers" toplist, is a media socialite. His OpenBSD operating system is poorly coded, himself citing "page file colouring is broken". How can we trust that Theo at the worst isn't trying to blame Intel because his poor coding acts erratically on their hardware. It's completely plausible!
If so that cements him in the hacker hall of fame for "publicly blaming hardware manufacturer for programming errors". He'll be on Larry King tomorrow night. This guy's boat sailed long ago, sadly for us he wasn't on it.
The general wintel industry strategy, however, seems so far to be one of trying to bury the issue by focusing attention on a misrepresentation of what de Raadt actually said about coding for the translation look aside changes Intel made for the Core Duo while simualtaneously arguing that other processors have bugs too and maintaining a loud silence on the key issue: specifically that most of the problems de Raadt cites seem to have come from Intel's decision to shoe-horn two or more cores designed to operate independently into multi-core packaging without doing a fundamental redesign first.
The canonical version of both halfs of what is being said comes from Linus Torvalds - a man who, like de Raadt, ought to know and should have an objective opinion. Here's the dismissal half:
> >How significant were the TLB handling changes?
I'd say: "Totally insignificant".
The biggest problem is that Intel should just have documented the TLB behavior better. The Core 2 changes are kind of gray area, and the old documentation simply didn't talk about the higher-level page table structures and the caching rules for them.
So that part is just a good clarification, and while it could be called a "bug" just because older CPU's didn't do that caching, I don't think it's an errata per se.
Of course, if you depended on it not happening (and a lot of people did), it's painful. But it really does make the architecture definition better and clearer.
That should be convincing - except that it doesn't respond to what de Raadt actually said about the MMU/TLB issue:
Note that some errata like AI65, AI79, AI43, AI39, AI90, AI99 scare the hell out of us. Some of these are things that cannot be fixed in running code, and some are things that every operating system will do until about mid-2008, because that is how the MMU has always been managed on all generations of Intel/AMD/whoeverelse hardware. Now Intel is telling people to manage the MMU's TLB flushes in a new and different way. Yet even if we do so, some of the errata listed are unaffected by doing so.
Notice that Torvalds refutes something de Raadt didn't say: that the changes in TLB flushing reflect errors, and doesn't refute what de Raadt did say: that the changes break backwards compatibility and thereby create opportunities for things to go wrong.
Something similar happens with the other half: the "other people are wrong too" defence. Here's Torvald's formulation, responding to a comment by Rob Thorpe:
>Whether other higher-end CPUs have more errata than x86s I don't know.
They tend to fix the bugs that are user-visible, and then not fix the bugs that can be worked around on an OS level.
Also, boutique vendors tend to not talk about them, because it's all internal to their own stuff. Of course, if they don't catch it in time, they'll have to release OS upgrades, but if they find an errata early, they can just work around it and need never tell anybody, exactly like the random embedded ones.
It's really simple: when you count your CPU's in thousands rather than millions, you generally don't want to do a whole new mask set that costs you months and a few megabucks. It's much cheaper to just ship the buggy crud.
Yeah, x86 errata get more attention. But those things are pretty damn well tested. Better than most. And since the OS is outside the control of the vendors, they get fixed too.
Now, obviously, this defence wouldn't work for a five year old but, more importantly, how you read this depends on the assumptions you make - and how much you know about the industry. Most people will, I think, assume that this refers to the PPC and SPARC architectures but, in reality, PPC outsells x86 by a considerable margin and SPARC is an open specification with open documentation for which, in the case of the T1 CMT/SMP CPU Sun has released everything down to design source (using Verilog).
I found a lot of other stuff like this too, but all of it amounting to defence by denigration and denial - and not a single word about the underlying multi-core design problem. As a result my review left me no further ahead, and therefore still in the grip of my initial bias: the default belief that if de Raadt says something bad about x86, it's bound to be true.
But what I don't know is whether that's right - and here's the bottom line: if you, like me, lack the tools, expertise, and time to find out for yourself, then you don't know either because the tech press isn't telling you and the obvious experts seem deeply committed only to grinding personal axes.