% fortune -ae paul murphy

BI: where 90% of everything is ... crosstabs

Business intelligence has been a hot area for years but more than 90% of everything actually being done in it still consists of cross tabulation - and a lot of people don't even understand that.

In one of my first "professional" jobs I did an analysis of grain transportation in Western Canada. As part of that I got the data for about 1,800 points of lading, best guess rail and sea freight costs by grain type from those points to each major Canadian port and thence to our historical export markets, and ran the problem as a constrained transshipment model to see where the grain would go if rail rates were set competitively. The results showed that grain flow would first max out the port of Churchill, then Vancouver, and finally New Orleans with next to nothing going down the St. Lawrence Seaway from Thunderbay - the route most of Canada's grain actually does take.

I could not get the client to understand the method or the result - and neither could a professor of Agricultural Economics from the University of Alberta. In the end, the study was used as supporting documentation for a Government of Alberta investment in the port of Prince Rupert - and I learned an important lesson: stick to crosstabs and simple bar charts when dealing with government.

Sadly, it didn't take - nearly two decades later I ran the systems for one of Canada's bigger technical recruiters and, as part of that, decided to see if I could tell from the intake data (initial recruiter notes, resume, references) which candidates would make the most money for the company -i.e. stay longest in the jobs they were sold into.

The data showed that contract duration predictions made on the basis of six simple characteristics measurable during the intake interview are significantly better than those made by either the recruiters or the employers.

What I discovered next, however, was that the senior people were all comfortable with the idea that their customers would lie to them about the available budget and thus job durations, but absolutely refused to believe that the candidate's pre-placement profile could be used to predict customer behavior in terms of contract extensions.

I had the numbers, I had beautiful overheads with color graphs and output tables, and I had their attention. What I did not get was comprehension.

From this, I learned an important lesson: when dealing with business managers, canonical correlations and factor rotations are way over the top - if you can't show it as an A by B crosstab with a few row and column percentages, don't bother showing up.

Just recently I had occasion to look at an "operational cockpit" for "business intelligence" built as a PC client to an Oracle data warehouse "solution." Know what it does? draws pretty dial-like pictures of A by B crosstabs or single variable time series and then exports the data file to Excel - where proud users are as likely to show you their "complex linear projections" (of row percentages) as anything else.


Paul Murphy wrote and published The Unix Guide to Defenestration. Murphy is a 25-year veteran of the I.T. consulting industry, specializing in Unix and Unix-related management issues.