Draft Blog Entries

% fortune -ae paul murphy

plagiarism, Gibberish, IEEE, and the textbook business

About two weeks ago a regular register contributer rejoicing in the name of "Verity Stob" published a story called "Educating Verity" about enrolling for an Open University course on object oriented design - only to find that one of the key papers offered for discussion was largely irrelevant to the topic, filled with gibberish, and at least partially hacked together, without attribution or acknowledgment, from work done by others.

Another register correspondent later reported on his efforts to get the Open University and the IEEE, which had published the offending paper in IEEE Software (a prestigious, peer reviewed, academic journal) to either defend or modify their actions, but essentially got stonewalled on the gibberish and relevancy issues - and could get the IEEE to move no further than a weak acknowledgment that:

... this paper has been found to be in violation of IEEE's Publication Principles.
This paper contains portions of original text from the sources cited below. Text from Paper 1) was reused without attribution. Text from Paper 2) was reused with attribution but without being clearly delineated from the above authors' own text.

With that in mind, I'd like to focus on the content quality issue. Here's the example from Stob's first article on this:

The following is one of the passages we were instructed to discuss in our homework. It purports to explain how companies choose open source software.

Open source projects that are too platform-specific aren't good either. For example, many open source content management system developers have based their spawning, multiple (often competing), and derivative projects on a single platform. To develop them into useful applications requires excessive, code-based customization. While extensibility is important, customers expect to see the inclusion of core features, together with the ability to configure key settings. Open source components with low code volatility, high platform heterogeneity, and high configuration and optimization space are the best choices. Robust test cases and user credibility are other dimensions developers must consider to identify the right components.

I assure you that context improves the sense of the above not one pica-jot.

I spoke to a completely credible liberal academic (my wife) on this and was told that the paragraph quoted may not exemplify the clearest use of English but makes perfect sense - in the sense that a reader of goodwill can easily make sense of it.

I can't. It must be the lack of goodwill toward idiots that comes from actually knowing something about the subject - but, as Stob illustrates, the key sentences here do make sense if you put back the words that were deleted during the plagiarizing process:

I can't remember, now, the exact phrase that led me to Tony Byrne's article Open-Source CMS: Prohibitively Fractured? on the CMS Watch site. But something, perhaps 'spawning, multiple (often competing)', led me to this:

Many leading open-source CMS projects have resigned themselves to becoming development "platforms," spawning multiple (often competing) derivative projects to undertake the difficult work of actually fashioning products that will appeal to real business users. To be sure, building a good platform is hard, too. It takes a lot of architectural savvy, trial and error, and constant refactoring. (Certain Apache projects fulfill important lower-level functions and properly remain platforms rather than polished products.) But a platform doth not a CMS application make.
This trend is ironic, because much of the criticism of bloated or failed CMS projects has centered around the commercial products involved being too "platform-oriented" and therefore requiring excessive, code-based customization to convert into practical CMS applications. Extensibility is important, but savvy customers expect to see the inclusion of core features, together with the ability to configure key settings via simple browser interfaces.

It seems familiar, does it not? Except this time, the words are arranged in such a way as to convey meaning.
For example, in Madanmohan and De', the phrase 'to configure key settings' is one of those dangling, sounds-as-though-it-means-something-but-not-quite-sure-what phrases with which their paper abounds. Tony Byrne's version makes things clear: you must be able to configure the damn software with your browser, rather than faff around with a text editor, or write more code. By deleting the key words 'via simple browser interfaces', Madanmohan and De' convert a straightforward observation into a non-committal abstraction.

So what's going on with people who think the Madanmohan and De version is perhaps poorly written but clear enough for people of goodwill to understand? The answer, I think, is that those who believe this are nodding at the buzz phrases without knowing enough about the material to see that the order in which these are put together here reduces the agglomeration to nonsense.

The real issue, in other words, isn't how the plagiarism got past the IEEE's peer reviewers, but how they could accept this gibberish as worthy of publication.

My answer to that comes from work I did a few years ago reviewing some introductory IT textbooks (full version here.)

These books account for more than 90% of their markets and are filled with errors of fact, errors of omission, and deeply rooted structural and anachronistic errors that together leave the student unable to form any coherent mental map of the IT industry - and my bet is that IEEE reviewers are both graduates of, and teachers of, courses in which books like these form the instructional basis.

IT teaching, in other words, has been going steadily from bad to worse - with the quality of instructional materials driving down the quality of instruction and that, in turn, driving down the quality of published "research" and the value of the peer review stage going into that process.

Paul Murphy wrote and published The Unix Guide to Defenestration. Murphy is a 25-year veteran of the I.T. consulting industry, specializing in Unix and Unix-related management issues.