% fortune -ae paul murphy

Relearning something about perl

I've liked Perl since he was a useful little guy from Stanford's computing center hanging around UCB hoping to learn something about computing -or, more practically, since Larry Wall drew up a chart featuring the major Unix tools and their capabilities in a row/column format and set out to provide all the elements near the diagonal in one consistent tool.

Releases one through four did this increasingly well - Perl5, however, got a little complicated. On the positive (if not exactly karma building) side Perl5 made it possible to include lines like this:

for my $role (keys %$self) {my $type= shift && (print $role=$self->{$role})";}

in scripts I did for COBOL blinkered colleagues who told their equally Dilbertian bosses that they wrote, and could maintain, the stuff.

I know, you're thinking "&& .say what?" but that's actually perfectly legal syntax in Perl6 - now due to arrive only eight years after the first Apocalypse and a mere six after the 2003 Perl6 book by Allison Randal, Dan Sugalski, and Leopold Tötsch.

Now the reason this suddenly matters to me is that last week a person of evil intent pointed me at the netflix challenge.

If you're one of the lucky ones whose wives don't know about this contest, let me spoil your day - by quoting the netflix introduction to it:

We're quite curious, really. To the tune of one million dollars.

Netflix is all about connecting people to the movies they love. To help customers find those movies, we've developed our world-class movie recommendation system: CinematchSM. Its job is to predict whether someone will enjoy a movie based on how much they liked or disliked other movies. We use those predictions to make personal movie recommendations based on each customer's unique tastes. And while Cinematch is doing pretty well, it can always be made better.

Now there are a lot of interesting alternative approaches to how Cinematch works that we haven't tried. Some are described in the literature, some aren't. We're curious whether any of these can beat Cinematch by making better predictions. Because, frankly, if there is a much better approach it could make a big difference to our customers and our business.

So, we thought we'd make a contest out of finding the answer. It's "easy" really. We provide you with a lot of anonymous rating data, and a prediction accuracy bar that is 10% better than what Cinematch can do on the same training data set. (Accuracy is a measurement of how closely predicted ratings of movies match subsequent actual ratings.) If you develop a system that we judge most beats that bar on the qualifying test set we provide, you get serious money and the bragging rights. But (and you knew there would be a catch, right?) only if you share your method with us and describe to the world how you did it and why it works.

I'm not dumb enough to think I can sensibly go after this, but the problem is interesting because its really about squeezing inteligence from large arrays of sparse information - and therefore perfectly suited to expression in my favorite and most natural programming language: APL.

As Netflix points out, however, any actual working solution has to go into production environments run largely by people like my former Dilbertian colleagues who would suffer debilitating colon spasms just at the thought of letting a language like APL or Fortress, both of which are appropriate to the problem, anywhere near their hallowed precincts.

Perl, however, they'd have to nod at - and Perl6, despite being weird and most definitely not Perl5+, looks like a perfect fit for both the experimentation and development to production stages of any process aimed at addressing this problem.

Why? conceptually because it models programming as speech and practically because it lets programmers do some really quite difficult things with ease. It's possible, for example, to pretty much elide the difference between an index and the data it points at; to manipulate very large arrays with the Jeux de vive we normally associate with operations on scalars; or to recursively pull vector matches from an array while recalculating the weights assigned to each element in that array.

So what's the problem? Two really: I don't know how to solve the netflix problem and not only don't I know Perl6 but what little I know about using Perl5 seems likely to get in the way of learning it properly.

On the other hand, even a cursory Perl6 review suggests that there's serious value in learning it - that it incorporates, in other words, some real breakthroughs in thinking about programming - and therefore that any time you or I spend learning to think in terms appropriate to Perl6 expression, will be time well spent.


Paul Murphy wrote and published The Unix Guide to Defenestration. Murphy is a 25-year veteran of the I.T. consulting industry, specializing in Unix and Unix-related management issues.