Draft Blog Entries

% fortune -ae paul murphy

Database vs. Application

One of the interesting design challenges facing the people supporting the IEED research effort discussed yesterday involves how much of the key applications to embed in the database design.

In that particular case the answer will probably be "essentially none" because the instrument designers can tell us what the data stream will look like, but only how they expect it to be used at first, not how it will actually be used in the long term.

That's clear for this case, and generally accepted as true for science applications - but the logic applies equally well to business applications. That's why my general preference is always to put logic in the application, reserving database operations for data management - actively avoiding things like stored procedures and treating internal event triggers with particular caution.

In contrast most application designs I see these days try to do everything in the database, reserving the external logic mainly for user interface support.

At their roots these two methodological extremes reflect the technology their proponents either grew up with, like best, or both.

Thus virtually all of the paraphernalia of the modern SQL-Server style data model evolved from the many-single-users nature of the Microsoft client-server world. In this environment the data base engine provides the multi-user component needed to serialise transactions and therefore becomes the natural place to put application logic.

Thus stored procedures were first developed as an auto-serialising means of handling multi-user application logic in single user environments - and stuff like update able views can be seen as interfacing the traditional files used in most PC applications to the more flexible multi-user, multi-table, structure provided by the database engine.

As a Unix user, however, my automatic assumption is that I need none of this because the host OS is inherently multi-user. In other words, when I run Sybase ASE on Unix, what I see is the inefficiency of SQL processing coupled with the almost complete duplication of the code needed to interface many single users to a shared data storage facility.

What this means is that the Unix application designer can safely do what the PC user can't: keeping the database as simple, and correspondingly fast and robust, as possible while separating logic from data by putting virtually all of it in the application code.

Notice that this strategy enables an odd hybrid: one in which you use a very lightweight application interpreter that creates screens and their associated logic files on demand - from data stored in database tables.

One consequence of any approach externalizing logic is better reliability, but a more important one is vastly increased flexibility. Thus such applications can update data rows without triggering procedures whose existence demonstrates data duplication, whose usage is inefficient, and whose action effectively limits what some other application can do with the original or derived data.

This difference, incidently, explains why "business intelligence solutions" designed by PC people generally rely on data extracted and reformatted from the production system while analytical packages designed by Unix people almost always run directly against the production database - a case of differing system perceptions leading to different values and different ways of doing things.

In this context it's important to remember that Codd's original intent in inventing the relational database was to eliminate application stove piping by providing a minimal common data store equally accessible to all applications but corruptible by none -and that the use of stored procedures with views now lets many people use SQL-Server to implement pretty much the opposite: producing 60s style application stove piping via virtual files accessible only by their own applications, but corruptible (via source table change) by all.

Today you see application logic kept rigorously outside the database mainly in science applications because the technologies the RDBMS market demands are defined by the PC community view, but there have been attempts to introduce improved tools and logic into the market place - and we're about to see everything old become new again if/when Oracle releases its RDBMS appliance: featuring a modified Red Hat kernel in which most of the performance "bang" comes from not doing things the PC way.

(Sadly, I think that even if they get it out as designed, it will die again too: quickly because the market is pretty much clueless about the value of simplification and slowly because Linux kernel hacks are unmaintainable against external change.)

So in terms of our hypothetical research company, the thing to remember is that hard wiring assumptions about the future uses of data into the data model is an obvious no no - and I maintain that this is in fact generally true for business applications as well.

Paul Murphy wrote and published The Unix Guide to Defenestration. Murphy is a 25-year veteran of the I.T. consulting industry, specializing in Unix and Unix-related management issues.