Feedback so far

So far I've not received much feedback on the key business issues in the cocoon article, although quite a lot of comment has come in on two cost related issues: the impact Microsoft licensing has on hardware choices, and the need to use Biztalk.

No one questioned the notion that you'd put all of the applications on the same box on the Unix side, but several people wanted to know why I put everything on one box instead of using an N-tier architecture for the Windows side.

There are two answers to that: first I didn't want to load the comparison against Windows by using Windows SMP, aka the rackmount, to get inter-application isolation. Doing that would, I thought, drive Windows side costs up quite dramatically --particularly when we look at the complexities of recoverability and synchronization in a two site environment-- and so be considered unfair.

More importantly, however, there's a technical management reason that starts with a negative. The applications don't need separate CPU and memory resources because they generally operate as a pipeline, meaning that first one is busy, then the next, and so on rather than all of them competing for CPU and memory resources at the same time. Since there's no resource requirement for separate computers I can get significant managerial simplification by putting everything on one machine.

The complexity I'm trying to avoid comes mainly from the need to maintain a very high level of confidence in the integrity of the data stored. Since most security related problems arise because of internal action, the ability to avoid the additional points of vulnerability (things like second network cards) that go with the rackmount seems highly desirable.

On the other hand, Microsoft's licensing policies might make an n-tier approach more attractive financially then I had expected. Several people told me that I would need the enterprise license for SQL-Server and can not run a single processor license on a dual CPU machine.

Since the enterprise license for SQL-Server is $19,999 per CPU, it would be quite a lot cheaper to put SQL-server on a single CPU machine by itself than to buy a second license.

Similarly, Biztalk enterprise edition is $24,999 per CPU; if I need that, buying a separate uni-processor to run it will save about $16K up front and thus be worthwhile despite the need to buy a rack, a switch, four more net cards, and three more $999 Windows 2000 Server licenses.

That's a new idea to me - the notion that Microsoft licensing drives de-consolidation - and will take some thinking about.

Even more people questioned the use of Biztalk for this application.

Bruce Hutfless, who seemed to know what he was talking about, said I didn't need either Biztalk or ISA, which I described as a caching server:

That's the Microsoft line, it is really just a cheap firewall. Which is why I have never used it.

You can do XML/XSL translation without Biz-Talk. All you are getting is pre-canned SOAP and some XML/XSL templates and translations. An ActiveX DLL server-side object using the MSXML4 SDK, gives you everything you need. What Bix-Talk gives you is some Server-side objects and a pipeline. Just as easy to implement your own pipeline.

The problem is you have to hunt to find the MSXML4 SDK, which is the XML/XSL parser used in Windows 2000, Internet Explorer 6.0 and .Net. Microsoft hasn't done much to promote this little jewel. For obvious reasons!

In point of fact, ISA, Biz-Talk are going by the way side in .Net server. Right now, ala Windows 2K, there a 13 different server product offerings of the OS. MS claims they will limit this number in the next release of .Net.

I'm still not sure about this -although if I needed the enterprise license, I'd sure want to prove him right. Microsoft included Biztalk in their promise to eventually bring forth a cocoon-like bundle (their Jupiter announcement), but no-one with significant Microsoft experience offered a clearly better idea tied to the Nichievo application.

So far no one's come forward with manpower utilization information although one person with a .co.uk address did ask me, in polite language, what I'd been smoking to pick cocoon for this job; suggesting that ordinary people could deliver this application with more widely understood tools.

He's right, cocoon isn't necessary to do the Nichievo job, but I picked it in order to be able to offer downstream benefits the other guys can't match. Using it isn't critical to doing the job, it is critical to winning the job.

Several people told me that they can't use stuff like cocoon on Linux because they can only find Windows programmers to hire.

Aside from the instinctive response that real programmers don't do Windows, this is economic nonsense. The market evolves to meet demand: if you can only get Windows people, it's because that's who you hire. Demand Linux expertise and you'll quickly get people willing to try - which is, in my experience, the best you can say about 9 out of ten Windows people: that they're willing to try.

This ties to one of those odd things you see all the time in systems consulting: people have learned about the value of experience in using systems technologies, so you see formal RFPs, particular government and large data center ones, requiring five years of experience with just released products - and big name consultants solemnly signing off as having it. It's a nonsensical response to a nonsensical requirement, but it illustrates the market at work. Make Linux expertise a condition of employment and people will make the learning investment needed to meet your requirements.

I recently had a lunchtime conversation with a client that went something like this:

Client: Linux came up in the discussion on our new web documents server but we're going with Windows

Me: why?

Client: IT won't support Linux; they say that if we use it, we're on our own.

Me: So what happened with your departmental email last week?

Client: We eventually got Greg (a contractor) in, those guys at Kingsway don't want to be bothered coming out, talking to them is, is, I mean what the hell does "registry corruption" mean and why do they always make it seem like its our fault and never happens to anyone else?

Me: but you're staying with Windows for the web server because they'll support it?

Client: what do you think of this Washington sniper business?

The bottom line is simple: you get what you pay for. If you don't get the support you pay for, whether that's Linux or Windows, you've got a management problem that calls for defenestration, not resigned acquiescence.

Help! We need feedback!
This little series about doing something fairly hard in both a Linux and a Windows environment depends for its relevance and facts on reader contributions. If you've worked with, or are thinking of working with, either toolset, please contact the author. If you know someone with experience to share, particularly on the Windows side, please ask them to read this article.

Getting the infrastructure in place

As the first step in doing the work, I installed the tools - on Solaris instead of Linux because that's what's on my desk.

The documentation says to install the servlet engine, apache tomcat, first. So I downloaded that and discovered that I needed the Sun java sdk 1.4 installed for tomcat to work.

The SDK installed itself into /tmp via a self extracting shell file and I moved it to /opt, setenv JAVA_HOME, untarred tomcat in place and setenv CATALINA_HOME. One minute and 47 seconds later my 8080 port was showing the tomcat welcome page.

Getting cocoon installed was trickier - no Solaris pkg or binaries - so I downloaded the source file for 2.0.3, gunzip'ed it, and tried to untar it.

Tar bombed out with a directory checksum error so I promptly fired off a bug report about the tar problem.

Then I downloaded the previous release, 2.0.2 and tried again with the same result, but including the -i parameter (ignore directory checksum errors) seemed to work, so I set the environment variables needed and fired off build.sh

That failed with lots of helpful messages like:

../work/lw/cocoon/code/cocoon-2.0.3/build/cocoon/src/org/apache/cocoon/serialization/POIFSSerializer.java:163: cannot resolve symbol
symbol : class POIFSElementProcessor
location: class org.apache.cocoon.serialization.POIFSSerializer
( ( POIFSElementProcessor ) processor ).setFilesystem( _filesystem );

Some fresh coffee helped me remember that standard tar has a problem with absurdly long paths and file names. Gnu tar unpacked the file with no problems - so then I had to go file a "sorry, there's a bug, but its not in your code .." report.

One minute and nine seconds after starting build.sh a second time the cocoon page was up at 8080/cocoon . Start to finish the whole thing had taken just under ninety minutes of elapsed time, most of it waiting for a total of about 107MB of downloads to finish.

Joe Barr, who's been writing for Linuxworld on comparing Linux and Windows install processes should check this out. Download, unzip, use the tools provided to build or install, and the whole complicated structure fires up and runs. No licenses, no reboots, no media swaps, and for those who use gnutar by default (i.e. Linux users), no errors.

I don't have a Win2K box or the licenses necessary to try this project with but everything I've done with Windows says it's nowhere near as slick and effective as this for loading new applications toolsets.

Getting the system to switch to https so the material being sent back and forth the user would be encrypted was almost equally trivial. Once I found the documentation on running keytool to create my own certificate (needed to start encrypted sessions, not for authentication), it took about two minutes to copy the connector definitions into server.xml and restart tomcat to get a functioning https server connection on port 8443.

Everything worked out of the box; well, except for the box -Java may be the slickest tool yet for turning an Ultrasparc II into a i80386. Loaded but idle, Tomcat/cocoon uses 19.1MB of ram and 0.19% of one cpu; but after I read some of the on-line cocoon documentation and played with the sample webapp provided, that had zoomed to 188MB of ram while some page requests took 100% of a cpu and measurable time to fill.

Astonishingly, it's actually faster to read documentation directly from the apache site than from my own test machine - provided I don't read anything twice. Caching helps a lot, the second time you process a document, response seems normal even if you use a different browser to avoid browser caching effects.

Apache provides lots of documentation but much of it assumes that the reader is comfortable with both Java and XML operational concepts and terminology. The Cocoon Overview document does, however, illustrate how complex subjects can be clearly and simply introduced and should be required reading for anyone looking at working with this product set.

The core idea is the separation of management, logic, content, and style; meaning that you can change any one of these without affecting the other three. Since it's aimed mainly at web publishing the classic example, from somewhere in one of the documents I read about it, involves setting up a style to match a particular holiday and then switching the entire appearance of the site for one day simply by exchanging one definitions file for another.

That's neat, but not what Nichievo needs. On the other hand the technology looks like a near perfect fit here. The separation idea, for example, is carried forward to the basic web site structure. A sitemap file sets the general rules for the site, what's included; how it's processed, and so on, but you can have sub-site maps that apply different rules to different hierarchies or projects.

This meshes perfectly with what we want to achieve at Nichievo: store the data once (actually twice, but the document copy exists for legal, rather than functional, reasons) write the logic once, get the users to learn one set of addresses, passwords, and behaviors; and yet maintain several very different production environments or webapps.

In playing with the samples provided two things become very clear:

  1. overall, this stuff is way beyond cool, it'll not only do I what I want for Nichievo, it really may make a lot of the advanced stuff they're dreaming of deliverable within this project's lifetime; and,

    Help! We need feedback!
    This little series about doing something fairly hard in both a Linux and a Windows environment depends for its relevance and facts on reader contributions. If you've worked with, or are thinking of working with, either toolset, please contact the author. If you know someone with experience to share, particularly on the Windows side, please ask them to read this article.

    That's particularly important now because the next article in this series looks at code development and related language or database choices.

  2. it looks like almost everything I want to do at Nichievo qualifies as "easy"; meaning that it's a variation on what comes with the system. Of course, going from perception to reality requires expertise; expertise I don't have.

    Realistically I believe that it would take an expert 2 to 3 days to get this project to the working prototype stage and about the same to debug to deliverable status, but will take me several weeks - most of it spent learning stuff that an expert would consider pretty basic.

From a programming perspective Cocoon does several things I hadn't known about. There's an entire authentication framework under development that will let me, even in its current state, avoid having to invent that wheel. There's enough of a webDAV inclusion capability already available that I'm considering using it to avoid having to worry about managing file transfers while giving users greater freedom (illusory, of course, since everything gets logged and nothing, ever, gets deleted) to add or modify things.

Equally importantly the data handling for forms validation, database access, and process flow management is more advanced than I expected. It may be that not everything can be handled through these facilities, but I don't now see any major gaps and that means I can reduce my expected time to finish on those tasks while reducing downstream vulnerabilities to code changes by my successors.

It also means that next time we'll be discussing database, not language, issues. On the Linux side at least, language may be a non issue if I can find a way to make getting signoffs a point and click process that doesn't require back end coding. On the Windows side, unfortunately, I have no idea - and boy do I need feedback on this!