Draft Blog Entries

% fortune -ae paul murphy

Sun's ZFS/Flash initiative

Up to now, the coolest thing about ZFS, besides really making RAID cheap and easy to implement, has been its ability to silently correct the bit errors that creep in as data is stored, read, and written - a facility that's been particularly important to the raidz implementation.

In the near future, however, that's going to change and the coolest thing about ZFS is going to be its ability to make intelligent use of large amounts of flash memory to simultaneously speed disk I/O while letting you lower platter rotational speeds (and thus both wear and power use) to something on the order of 5400 RPM.

One of the core developers, Adam Levanthal, has an interesting article in the July ACM on how this going to work - the technology is non obvious, but the bottom line is simple: much faster, cheaper, and more reliable storage for big installations.

Properly configured systems using ZFS with flash in the storage hierarchy ahead of traditional disk should offer dramatic (order of magnitude) throughput gains on things like database transactions - and virtually eliminate some processing crisises I'd guess virtually all serious sysadmins have had to face.

Disk reconstruction delays and risks will, for example, essentially disappear - and if you mirror on two of the new JBOD arrays, layer in flash, and run something like Oracle or PostGresSQL, almost all of your backup and recover delays will disappear too.

More interestingly, there are oddball RDBMS admin problems that will get easier to resolve: a lot of production systems, for example, get constrained when databases grow beyond the point that backup and PC style table inversions (aka "cube" computation) can be done in the time available. In the past the right answer (switch to something more modern that queries the production system directly) has usually been administratively impossible, the fast answer (dump to text and use Perl) usually produces howls from outraged PC people, and the wrong answer (recreate the database schema you need on /tmp and run the inversion in memory mapped space) often becomes the only one that both works and doesn't create extensive conflicts with the MCSE crowd.

No more - with the ZFS/Flash layer in your storage hierarchy you can flash freeze the database without stopping production, and then do both your off-line backups and inversions at production I/O rates using whatever cycles are available -whether the users are on-line or not.

Most sysadmins will have run into this problem at some point - but there's a special variation too: one that's rare in general business but important to Sun's military and telco MySQL markets. What happens in these cases is that business needs make any delay on some transactions data unacceptable; so you set things up to cache the critical stuff in memory while keeping only a few indexes to the collateral data there - and then the database grows faster than you can get more memory. As a result you find yourself trying to continually re-optimize your index and caching to keep up - and that activity itself then causes more problems (especially when your boss brings in consultants "to help"). With ZFS managing the storage hierarchy and flash at the front, however, the distinctions disappear from view - and your system self-adapts as volumes change.

You can't buy ZFS/flash yet - and I'm guessing that when you do (early next year? - but note that it took Sun three years from the introduction of ZFS to its first new JBOD products) you'll pretty much need to run a workload measurement utility whose results determine the custom configuration you'll be ordering from Sun. Timing aside, however, I think that the bottom line on this for anyone now using Solaris for larger applications is that this stuff is going to be important - and that getting a running start by reading what you can and experimenting with the key ideas now (using ramdiskadm) will pay off for you.

Paul Murphy wrote and published The Unix Guide to Defenestration. Murphy is a 25-year veteran of the I.T. consulting industry, specializing in Unix and Unix-related management issues.