Feedback so far

So far I've not received much feedback on the key business issues in the cocoon article. The next article discusses the active prototype, and boy do I need feedback before writing that!

This is Cocoon Wars: Episode 3, The users byte back!

I had a prof once who claimed he decided to teach computer science because it was the only way he could get to play with the toys without having to deal with users. He's dead now, but every time I find myself in a preliminary project specification meeting with local management's user nominees I get closer to agreeing with him. That's certainly what happened last week when I met with several groups of Nichievo's users, mainly from the sales and analyst communities.

On the positive side I did get to wander around a couple of working offices and had a chance to meet with several customers; that got me a few nuggets that make the technology choice easier, but the rest of it? what a waste of time.

a self deceiving prophecy?

You get a self deceiving prophecy when someone describes how they do work in a meeting with their colleagues and bosses present -because when you go and observe them at work, they'll try to mold what they do to what they said; and what they said typically describes what they think they ought to be doing, not what they really do.

For example I once did a requirements review as part of an audit on a forty million dollar child welfare systems development effort that had gone off the rails. The documentation was precise and surprisingly clear. The JAD session records showed a well defined workflow with clear opportunities for information support -precisely as implemented.

Workflow diagrams showed case workers coming in at about 8 (the official start time is 8:15), reviewing files, verifying appointments, checking out departmental vehicles, and heading out, departmental laptops in hand, to make their visits. According to the documentation, they'd start to straggle back around 3:00 to prepare their reports, update files, and go home around 5:00 (the official end-of-day is 4:30).

Reality was nothing like this, something that became obvious when I noticed that the departmental parking garage was empty at night. Most case workers worked a split shift, coming in mid morning to talk to colleagues and file reports and doing home visits during the late afternoon and early evening.

Their information needs didn't resemble what they told the interviewers either. What had happened was simple, these were out going, social empathic people who had given the interviewers exactly what they wanted: clearly defined, factually oriented, specifications that unfortunately bore only an idealized resemblance to the relationship information they really worked with.

Their primary question before going on a home visit didn't have anything to do with the mother's record or where the kid got sent to school, it boiled down to: has she got a new boyfriend whose presence means I should take a cop along? They didn't want laptops stuffed with forms or wireless access to service agency websites, they wanted hand helds with integrated phones and a panic button that let them check out the new boyfriend's criminal and mental health records.

It's sad but true, early stage requirements meetings (more professionally known as: JAD - Joint Application Design- sessions) don't just waste everyone's time but often get projects started off in the wrong direction. Not only do you mostly get the wrong people, but the few attendees who do contribute to getting the organization's work done tend to let their perceptions of you, or their concerns about how others at the meeting see them, color their comments to the point of blacking out the value. At best what you get are self deceiving prophecies -descriptions of what they think they ought to be doing. At worst you get stuff that's filtered through technology perceptions, in effect prescriptions for changing the job to fit the person's chosen technology.

What's driving this is the absence of a sense of crisis to make local management care about the project. Without that, they tend to assign their losers to the requirements team simply because these are the people most easily spared from the real work. In white collar environments that means you often get the people who've long since stopped doing their jobs and become high priced Windows support people instead. Unfortunately, these people not only don't know the job anymore, but they're perfectly willing to waste the whole meeting trying to get you to acknowledge their positions as local computer gurus.

Some of the things they say are just appalling. One solemnly intoned that "the solution needs to be web based, that would be the ideal", and got nods all round - although neither he nor anyone else could tell me what he meant. Another wanted to share his deep concern that using Linux would make the firm look unprofessional, and a third carefully shunted me outside during a break to tell me that I could just down load the entire solution from xmethods.org (it's a dot.hype site pushing SOAP).

Nevertheless senior management has been trained to believe that requirements meetings are a necessary component of the process, so I go; but spend as much time as possible wandering wander around annoying people who don't go to the meetings. What I'm looking for are the behavioral patterns that constitute the way an organization gets its work done; then I can figure out what the information support needs are and build a prototype for people to respond to. That works because people generally don't know what they need - but can you tell you with certainty when you get it wrong.

I didn't find out, for example, what the real primary requirement was by asking people in meetings; I found out by asking to go along on a few sales calls so I could understand the overall process better.

That was a mistake, first: they don't see themselves as making sales calls: "we're not underwriters" (sniff :-) ); and, secondly they didn't think the customers wanted to see a strange face. Many of Nichievo's customers are deeply concerned about confidentiality, for them Nichievo is a hole card they keep hidden from the people they deal with.

Luckily there are always a few special relationships so I got a look at the customer's view of this thing and learned something important right away --most of the larger organizations have traditional EDI setups, they're years from moving to web based e-commerce.

Even more interesting, however, was that during our travels the analysts and executives involved would start to talk about processes and control issues but end up expressing their fears about the security and reliability of the system. In some cases I wondered if their expression of fear, or my perception of it, was actually being inspired by the cab ride, but overall their focus on the potential for fraud and/or business interruption seemed real enough.

Aberdeen serves it hot and steamy
A recent report by Aberdeen Group claims that open source now has more security problems than Windows does.

The argument is that since 16 (55%) of 29 high priority CERT alerts issued in the first nine months of 2002 don't pertain to Windows, it must be more secure than open source.

Yes, Microsoft apparently denies sponsoring this gem - but issued 61 high priority alerts during the period to give themselves a more realistic 85% of this total.

As they see it, computerization removes paper and duplication from the process and thereby makes the firm increasingly dependent on electronic records. If those go away or get compromised, the firm's in deep trouble.

In many cases the concern is over rated; it's simply not true that the average teenage nephew can hack into DOD systems but, whatever the reality of the threat, many of Nichievo's users are deeply worried by the risks implicit in this system. Fraud is the least of their worries, as one put it: "so we get ripped off, well, it happens? it doesn't mean the firm goes down." Even having the system go dead for a few days or scramble records would be survivable; but publication of confidential customer records wouldn't be. That's a threat, not just to the firm, but to the personal relationships its people have with each other, with customers, and within their professions.

From a requirements perspective the bottom line on this is simple: if I can't get the risk of information leakage down to the point of being negligible, this project should not proceed.

Right up front that's the kiss of death for the dot.net approach. I may, or may not, be able to proceduralize security using a central Linux server and Cocoon but I can guarantee you that it can't be done using a Windows server with Microsoft tools.

Using SOAP to just slip right in
To use SOAP to break past network security without being detected you first need someone on your victim's internal network to access your website.

If there's real money at stake, that's not hard to arrange for any company large enough to have a few hundred employees. Just set up a honeypot: your own porno site, and do a little social engineering by having the girls involved visit the right bars or clubs and pass out some cards with private access ids.

Nothing spreads like sexcess and some nit will soon be firing up his personal wi-fi account from an office laptop to give you everything you want while nicely bypassing all those corporate controls and IT's lamentable tendency to log things.

There's a simple and direct reason for this that transcends my usual cynicism about products that have fifty million lines of unrefereed code: SOAP. The Simple Object Access Protocol is nominally just a message envelope, a way of encapsulating and forwarding information. Couple it, however, with an http binding and you have an RPC (remote procedure call) tool for bypassing firewalls and internal controls. This is great for putting spam directly on user desktops and not too shabby as a replacement for cookies either, but my guess is that it really shines brightest as a tool for getting network access information from unsuspecting users.

Combine this with the stateless nature of an HTTP connection and web services becomes obviously and irremediably insecure. You can, I think, take palliative actions such as:

  1. using Microsoft's Windows Message Queuing instead of an http connection to get logging and transaction continuity. But: this means building and supporting a Windows client to be imposed on customers;

  2. using IBM's MQSeries instead would avoid part of this. But: using that probably pushes the project scale up a notch or two too far;

  3. you could try adding a SOAP content analyzer to the firewall box since SOAP messages are just text strings. But: the failure rate of anti-spam filters suggests that this may be a lot harder than it sounds; or,

  4. if you had trustworthy hard wired machine ids (like Palladium, for example) to work with you could register correspondent machines and control everything that way. This may be the way of the future but isn't practical yet.

You could also try to strip all the stuff needed for SOAP to work off the box. That's how you beat the issue on the Linux/Cocoon server: just say no at load time. But: this may only barely be possible on Windows 2000 Server. A November 21st article on theregister.co.uk by Thomas Greene reports on a paper ostensibly written by some Microsoft people involved with switching hotmail from FreeBSD to Windows 2000 and recently liberated from an insecure Microsoft server. I don't know if the document is authentic or not, but it contains an absolutely wonderful paragraph in the "Advantages of Unix" section:

Image size. The team was unable to reduce the size of the image below 900MB; Windows contains many complex relationships between pieces, and the team was not able to determine with safety how much could be left out of the image. Although disk space on each server was not an issue, the time taken to image thousands of servers across the internal network was significant. By comparison, the equivalent FreeBSD image size is a few tens of MB.

Right, exactly. With Linux and the Apache/Cocoon tool set I can, with certainty, sidestep SOAP on the server and put in place simple safeguards against someone accidently adding the components. With Windows 2000 server someone may be able to disable it, but that will last only until the next systems administrator comes along to install the latest patch kit or new OS release. This is version control run backwards, of course, because I'm predicting future exposure to a click and drool sysadmin - but automated updates across rackmounts full of little boxes makes that a gimme.

This is a problem with all projects that push the envelope; if it doesn't end up in the mainstream, some future sysadmin or local IT manager will silently cause it to fail and later describe that failure as your fault.

For now, however, the bottom line is simple: it's no SOAP. Using Windows for data this sensitive would be utterly irresponsible.

That doesn't mean using a Linux server makes sense either. It may be that the risks involved outweigh the benefits no matter what - unless, I suppose, we dump the whole project back into a traditional EDI framework based on the X.4xx standards. That's classic security through obscurity, but it would actually appeal to a lot of customers and is something I've had to ask senior management to consider.

Since some of the data is on-line now, albeit mainly as Word documents on various file servers, I've also suggested they have their CIO install something like silentrunner.

This is a Windows based packet analyzer that will let them know when their security has been compromised - rather like an automated way to spot the open barn door, but better than finding out via the front page of the Wall Street Journal.

So how secure and reliable can I make a Linux/Cocoon setup? I can do pretty well on the server side:

  1. put in two systems, both physically secured but in separate Nichievo offices;

  2. set up parallel but independent management for each machine;

  3. automate the process of keeping the two databases in sync;

  4. use routing to automate fail over;

  5. spend a few extra dollars for higher reliability gear and RAID 0+1 (mirroring striped disk arrays) at each end;

  6. use the cocoon authentication framework under tomcat/jakarta;

  7. log everything;

  8. impose stringent verification processes before issuing logins; and,

  9. institute a series of random failure drills to ensure that the administrators practice the steps needed for recovery from various levels and kinds of system or network failure.

With all that in place I can feel confident about getting at least five nines on uptime and having a close to zero probability of significant data loss due either to system failure or to external attack.

What I can't do, however, is affect security at the client end either inside the firm or among customers. The best I can do -as far as I know- is verify the integrity of the access process and use end to end encryption to render most interceptions harmless. If someone, customer or internal user, who is authorized to access the system abuses the information, there's nothing I can do about it.

Part of what I have to be concerned about here is the database replication strategy. Leave that open to someone who can spoof a router to run a man in the middle attack (in which both databases think they're connected to each other, but they're really both connected to the a third party system) and everything else is a waste of effort.

PostgresSQL doesn't yet have replication but there's a serious effort under way (see http://gborg.postgresql.org/) and they're considering using encyrption at the database level. That's be nice, but it's not ready yet - it may be ready by deployment time, a lot happens in six months in the open source community - but I can't count on it. I could wimp out and put Solaris on that server so I can use interprocess PKI but that's not trivial to do and, besides, I want this to be a Linux solution.

One way to fake it is to hang those two servers on their own T1 links to UUnet or some other large ISP and then use router based packet encryption. That's expensive but effective and offers the side benefit that I can detect a man in the middle attack rather easily by monitoring packet travel time.

From a commercial software perspective the slickest replication service is offered by Informix but I'm uncomfortable about their future with IBM. Oracle and DB2 have good solutions too, but they're expensive and too often invite DBA parameter twiddling of the kind that eventually brings the system to a stop.

So this comes down to Sybase, PostgresSQL, and mySQL. The latter doesn't have replication but there are apparently some PERL tools around to do the job and, of course, that might have to be done with PostgresSQL too.

On the other hand mySQL doesn't have stored procedure capability internally. That's critical in the client-server world because the database engine is often the only piece of multi-user software in the system and so by default the place transaction serialization is handled. Here, however, that's a non issue as I can safely externalize procedures and leave mySQL to do nothing more than store and retrieve data - something it's quite good at.

This is an argument for using mySQL, after all, what isn't there, can't be subverted. That's paranoia, I know, but I've been paranoid ever since I had to trace an information leak to a guy who had simply set up logging on his boss's PostScript printer and was happily downloading everything they guy's secretary printed for him.

This is not an obvious decision but next week, when I attempt a working prototype, I'm going to use PostgreSQL rather than Sybase or mySQL. But: I'm not using any stored procedures or placing any bets on PostgreSQL getting replication services. Come deployment time, we'll revisit this and meanwhile I'm going to see what it would take to break mySQL as the most likely long term choice.