% fortune -ae paul murphy

The opportunities in search

A couple of weeks ago I got the cuil.com press release touting their company as offering the latest and greatest in search.

After a few tries I didn't think their front page cool and didn't think their search better than google's either - and that's too bad because google is the most over priced company in history and a little real competition would be good for them.

When you look closely at google's search product it turns out to be basically just 1950s boolean text search ideas (cf COLEX (1958), SDC Dialog (1967) and BRS Search on BSD (1984)) implemented in a grid file framework with a simple front end and an automated data collection backend. Cool, but a breakthrough in business and presentation, not technology. And Cuil? looks like less of the same to me - at least so far.

Three things about this situation bother me:

  1. the technology needed is well understood. It takes money, not genius, to go after google's market using a comparable technology - and yet Yahoo suicides, Microsoft dawdles, and Cuil, isn't.

  2. search seems to have lots of room for genius - for someone to improve it to the point that the improvements can replace money in the drive to business success. The person with the idea exists somewhere - so why haven't we heard about it?

  3. the technology underlying the current financial model for search is extremely weak. Google makes most of its money selling eyeballs on ads, but an effective ad should provide value to the viewer, and google's ad matching almost never does that.

Back in the mid nineties when Illustra got spun out of the UCB Postgres group it came with text and image data blades and one of the options was a video indexing and retrieval blade (plugin) out of Stanford that let you highlight an image and then search a video for occurances of that image - something you'd think You Tube would kill for.

It was neither efficient nor 100% accurate, but it was a mid nineties out of the box solution for the two kinds of search google does best and one it doesn't currently do at all: image matching. More subtly, the datablade technology it came with would make it fairly easy to incorporate much more information about the user into ad selection and placement - thus offering greater value to both users and advertisers.

The main problem with image matching is that you have to convert the input images to a form that's independent of both perspective and scale while not trusting color as primary discriminator - something you can do by computing the minimal set of normal vectors for every contiguous color surface, because that produces a unique description of the object's shape whose determinant happily functions as your primary hash key. Since both Cell and T2 are now fast enough to do this in real time for video input, it would now be possible to resurrect that technology, improve it a little, and enable someone like me to upload a single image for network search.

A couple of weeks ago, for example, I bought this little statuette at a garage sale for $5.00.

It's signed, apparently as "Alfred", appears to be a thin, somewhat patina-ed, bronze coating over some light ceramic like terra-cotta, and is numbered X18 - where the "X" is unreadable but could be almost anything from a dash to an A, F, or 8.

What I'd like to do is upload that image to a search engine and have it return as hits sites that have photos of very similar things - and therefore might have the information I need to see whether I overpaid for junk with spray on patina or got something others see as valuable.

But I can't: Google's image search is based on text labels, cuil doesn't seem to have gotten there at all -and neither Microsoft nor Yahoo are in the game.

And that, I think, is the bottom line on Cuil and the state of search: to the extent that it's about money, it's very 1950ish - and to the extent that it should be about making use of technology it's a whole bunch of technical and financial niches waiting for someone to fill them.


Paul Murphy wrote and published The Unix Guide to Defenestration. Murphy is a 25-year veteran of the I.T. consulting industry, specializing in Unix and Unix-related management issues. "