If you go to the TPC.org website and select the top ten TPC/H results by price-performance you'll find something amazing: the 100GB, 300GB, and 1000GB records are all held by ParAccel Analytic running Red Hat Linux 4.4 on Sun x86 gear - at an average 42% less per transaction than the nearest competitor.
Another Sun x86 system running DB2 under Solaris holds the 3TB record - but only by a 10% margin over its nearest competitor (a consequence, incidently of the DB2 license cost and nothing to do with hardware cost: -$706K at list for the licenses vs only $370K for the hardware.)
So what's going on? Low cost, highly parallel, data storage and retrieval made possible by putting storage and smarts together - a model first proposed in the 1950s and finally made possible by today's denser storage, faster processors, and cheap 64bit memory.
ParAccel, the company whose Linux products power the smaller scale results, makes a number of claims for its technologies including:
- Only relevant columns are retrieved (A row-wise DBMS would pull all columnns and typically discard 80-95% of them)
- all operations are done in parallel (A non-parallel DBMS must scan all of the data sequentially)
- adaptive compression makes disks faster, reduces decompress effect
- a memory-centric design maximizes in-memory processing
- additional, patent-pending innovations drive performance to unprecedented levels
The result is very cool - a data warehouse in a small rack built by using computers as smart disk drives.
As ParAccel moves up the performance scale it's going to collide with Sun's Thumper/ZFS technologies. These carry the same ideas further: moving more smarts closer to the data flows, and further simplifying the entire data management process.
I think we'll see both implementations go further - the 10TB crown should fall to Sun fairly easily if they decide to issue a T2 based "thumper" and there's no obvious reason to think ParAccel can't compete at the larger sizes. What both have already demonstrated, however, is the real bottom line: the complexities and compromises previously needed for a mid range data warehouse or PC SAN are no longer necessary.
Like many of the exciting things happening in Unix today this is another old idea made possible by new technologies and the enormous cost reductions that go with them - meaning that you can now set up a multi-million dollar data warehouse or network storage farm for significantly less than a couple of hundred thousand.
And that's a good news/ bad news story for smaller organizations: because technologies that were previously out of reach are about to become mandatory.