% fortune -ae paul murphy

Making AI work

Last week's blog wondering what a consumer would use a petascale handheld for led to some interesting comments - both on the record an off.

Here's part of one comment by frequent contributor Roger Ramjet:

Maybe with petascale computing, AI could become more than the joke that it is today. I doubt it. In order to model the human mind, you need to do more than the "poke it and see what it does" mentality of today's "soft" science researchers."

I doubt it too, and for the same reason: you can't build an artificial intelligence without knowing what intelligence is.

What you can build, however, is an expert system -specifically a classifier application built to operate along the lines of the 20 questions game in which you try to sequentially reduce the number of possible answers by asking simple questions until only one possibility is left and then give it.

For example:

It was, or is, bright red with navy blue high lighting. What is it?

  1. Is it wearable?

    Not really

  2. Is it fixed in one place?


  3. Is it a machine?


  4. Is it a 1971 Dodge Charger R/T with the California 426 hemi package and spoiler?

    How'd you guess?

The first and most obvious generalisations come in two forms: first, allow contextual information to influence the classifier (a car magazine on the coffee table) and, secondly, allow more than one right answer. For example:


commit(command() = set A17C to On)

voicerec() = Dave

empsched() = early morning

voice_stress() = low

image_mood() = not unhappy, not angry, not rushed

resp_indic() = affirm, greet


commit(vocalize() ="Good Morning, Dave.")

Wait(next input)

Making something like this that plays "20 questions" with conversational data in order to beat the Turing test is fairly straightforward - much more a matter of funding and implementation than invention, and a lot easier than building IBM's "deep blue" chess machine must have been.

But would a program that did this be intelligent? And how could we possibly know?

When I first looked at this question, back in the mid seventies, the technology needed to address it didn't exist - now I think it does, and that the ideas we need to make use of it can be derived from the example above.

The big issues in trying to make a sort of super-classifier work are two fold: first how to represent and use context information inside the classifier, and secondly how to choose between equally appropriate responses.

Choosing randomly, as above, is simple but not right because that's not what the only intelligence we know of does: people develop characteristic patterns - I tend to say "Hi" a lot more often than "Good morning."

The obvious first step in improving on random is to weight the choices according to previous use by all of the people involved in the context - i.e. Dave's habitual use of "good morning" would add to the probability that the computer selects it over "Can you see me now?" in the context above.

A more sophisticated second step, however, is to continually reweight the choice library according to outcome evaluations for each action undertaken - making it more likely, but not certain, that actions resulting in positive consequences will be repeated when similar contexts arise.

This, of course, is exactly what babies do as they learn - every stimulation exists in a context, for every context they emit some action or group of actions, they evaluate the results of those actions in terms of some very basic normative drivers: food, warmth, comfort, and the immediacy of change, and then repeat the ones that produce pleasing outcomes.

If you think of these combinations of action, consequence, and evaluation as stimulus-response (S/R) sets, it should be obvious first that they occur very quickly and secondly that large numbers of them can be grouped to form patterns describing interaction processes.

Notice, however, that each time a common interaction process such as getting hugged and fed after wailing happens, some of the specifics in the S/R sets making up the process record change -but the pattern doesn't.

Patterns abstract the S/R data and can therefore be represented independently of it - meaning first that patterns can be recognised on partial data, secondly that patterns can be grouped to form other patterns, thirdly that new S/R data either creates new patterns or modifies existing ones, and finally that a decision process based on pattern recognition would not need access to the underlying S/R data.

In the human analogue pattern recognition on partial data means that commonalities dominate differences: mommy is mommy no matter what she's wearing, and most of us can spot a chick flick in ten seconds no matter who's in it or where we see it.

For our super-classifier, partial data pattern recognition means that the primary internal operation would be pattern comparison and reweighting, not data retrieval -meaning that it would not need to store most of the S/R sets and could, therefore, "learn" to play intuitive, strategic, chess without needing a large book.

Could we build a machine to do this today? I think so; the critical issues right now aren't computing power or memory, but pattern representation and getting enough input for the thing to build itself - and that's where petascale computing could help, although hooking it up to the TV cable to get S/R input would, I assume, grow an insane AI.

Oh, and the implicit definition of intelligence? The time and information (e.g. number and weight of matching elements) needed to recognise that a current pattern matches all or parts of one or more previously stored patterns.

Paul Murphy wrote and published The Unix Guide to Defenestration. Murphy is a 25-year veteran of the I.T. consulting industry, specializing in Unix and Unix-related management issues.