% fortune -ae paul murphy

The tattered history of OOP

This is a guest blog by frequent contributor Mark Miller - a followup on our earlier discussions of the effectiveness and value of the object oriented programming idea.

OOP has been a mixed bag, but I say this only because like with the PC, the creative vision of OOP was lost. What follows is largely based on Alan Kay's presentation called "The Computer Revolution Hasn't Happened Yet" that he gave at the 1997 OOPSLA conference (available at Google Video), and a paper he wrote many years ago, available online, called "The Early History of Smalltalk", among other sources.

Smalltalk is the OO system Alan Kay and others developed at Xerox PARC in the Learning Research Group in the 1970s. It was inspired by Kay's ideas about the Dynabook that I talked about earlier. The language used on the system was also called Smalltalk. It was the first one to really flesh out OOP ideas. It was used as a way to construct programs and carry out activities on the system. Rather than a process model (like in Unix), you had an object model of a system. The versions of Smalltalk they worked with at PARC functioned more like an operating system and a development system all rolled up in one, on their Alto prototypes, rather than just a set of executable tools (editor, compiler, etc.). They booted the machine up with it. The entire system was OO. Not a process/executable in sight. Unix is not too different if you think about it (except that it's process-oriented). It's largely written in C (the primary development language on it). Each Unix system comes with its own C compiler, and supporting libraries. You can create your own processes that you write yourself as executables or shell scripts, just as you could create your own objects and then make them do something in a Smalltalk system.

The architecture was intentional. In a conversation Kay participated in on the Squeak development list (Squeak is a modern implementation of Smalltalk-80) he said that the operating system is just the part that's left out of the programming language. As I thought about it, it makes some sense. A programming language and development. environment is nothing without the resources that an operating system provides. It makes sense to have them together.

Kay's primary interest was in creating software systems that scaled well. ARPA, where he worked for a time, shared this interest in creating scalable systems in the 1960s, and it was the philosophy Xerox PARC was founded on in 1970, under Bob Taylor. Kay moved there when it was founded. The way they thought they could accomplish scalability was via. a philosophy of "no centers". In other words, to riff off of Plato, "carve" systems at their "joints" and decompose them to essential elements. What came out of this was ideas about personal computing (decentralized, interactive computing instead of centralized mainframes), the ARPANet/internet based on a decentralized cellular structure, rather than a proprietary small scale network using mainframes; ethernet (TCP/IP), objects rather than monolithic procedural programs, "no OS", "no applications", etc. In a brief e-mail conversation I had with Kay last year he said of their philosophy, "[A] lot of the interest was how to get things done without having to concentrate the knowledge in one or a few places (because this would require these places to have to control 'ingredients')."

Kay's vision of OOP began as an attempt to improve on what Lisp was at the time, and indeed the early versions of Smalltalk used some of the same principles. Objects in Kay's scheme could be thought of as simple Lisp-like REPLs (without the screen prompt). Object primitives were used to parse messages sent to objects, and then the messages were matched with methods via. pattern matching. This happened at run time using late binding. Message passing was not tightly coupled with method invocation. So it really was like Object A passing a message to Object B, not mere hand waving to make a procedure call look like something it's not. This scheme enabled the ability of the programmer to easily create their own domain language, and it promoted the loose coupling of specialized functionality. One could write domain languages in Lisp as well, but OOP allowed for infix expressions to be used, rather than prefix expressions.

Kay wanted objects to follow the example of nature, which shows that complex processes can be carried out by life forms and cells carrying out simple behaviors. Kay saw objects as "virtual computers", and made an analogy between objects and Unix processes in his 1997 speech:

"Unix had that sense about it [offering a virtual computer environment to each user and process], and the biggest problem with that scheme is that a Unix process had an overhead of about 2,000 bytes just to have a process. And so it's going to be difficult in Unix to let a Unix process just be the number 3. So you'd be going from 3 bits to a couple of thousand bytes, and you have this problem with scaling."

Kay comes at technology from a different place than most technologists. He didn't start his professional education in technology. He started with science. Kay's prior work in microbiology had a major influence on the development of OO computing at PARC (think objects = cells). The reason he believed in the object paradigm is he saw how biology scaled from single cell bacteria, to life forms like us with billions of cells, and he believed that objects could help give us software that scaled from small to very large problem domains. Your system could grow with you as the complexity of your systems grew, without major disruptions, in no small part because of late binding. The folks at PARC created a "live" system, one where objects existed all the time in a system environment.

Kay borrowed an idea from Ivan Sutherland's Sketchpad project, that of "masters" and "instances". What few people know is that Sutherland invented OO computing in 1963 (though not OOP as we've come to know it). In Kay's scheme of things classes and methods could be updated at any time, even while the corresponding objects were doing something. These aspects are parts of the "masters". Any (object) "instances" are instantly updated with the modifications from their "masters". This promotes experimentation with ideas much better than our current early-bound stop-program, edit, compile, link, test cycles.

Kay asserts that we have not figured out software engineering yet, and so it is imperative to keep things as flexible as possible so that oversights in design can be fixed. This demands a late binding design in the VM.

He created a tight coupling between code and data with objects. He wanted to avoid the situation we've had in most software from time immemorial, which is code acting on data. By "code acting on data" he meant that you have code in one space, and then in some other separate store you have data. You have data structures to optimize access to the data. Your code manipulates the data in the separate store. This configuration works, but it's not powerful. In terms of the software itself it doesn't scale well.

The problem we have with most software is it's not designed to scale, that is to grow in size and complexity. It grows in size and complexity anyway, but it becomes a mess. We as developers tend to scoff at the notions that support software scalability, because creating a scalable software architecture feels like overkill for problems that start small. So many times the messy, quick-and-dirty prototype becomes the end product! Granted, the latter is often due to incompetent software project management as well, but we developers don't think about the possibility that our creation will be adopted and built upon for decades to come. We're focused on solving immediate problems. The Y2K crisis was created the same way.

The philosophy today, and it seems like it's being done to get around developer intransigence, is instead of scaling software we scale systems via. N-tier architecture and such. This forces a certain arrangement of software pieces, but the underlying architecture of that software is still not too scalable. Apparently IS administrators understand something about scaling. Maybe developers could learn a thing or two from them.

In Kay's scheme of OOP no data, as we know it, exists. Instead you model a problem domain with representations of things in the domain "universe". These representations can include things that are commonly found in data, like numbers and strings, if they accurately represent what's really being dealt with. In addition, every object has information that can be revealed about itself, commonly called "metadata", and has certain things it knows how to do, either with itself or in conjunction with other objects. All member variables are protected (in the C++ sense of protected access), period. The programmer is not allowed to have public variables. All methods are public. The reason is the OOP idea doesn't really work if other objects have access to an object's implementation. The idea is that other objects can only influence an object. They can send it messages, but the object is in total control of how it responds to them. C++, Java, and .Net enable programmers to break this idea. Java and .Net allow some properties of late binding, but they have to be explicitly requested by the programmer, and are a pain to use.

What Kay considered most important about OOP was not the objects, but the message passing that goes on between them. "The abstraction is in the messages, not the objects." He's apologized for years for coming up with the term "object-oriented programming", because it got everyone focused on the objects, not the messages. He's said what's most important about it is how objects interact with each other, their relationships, which leads to thinking about software architecture. I think he's also indicated that he considers the representation of those relationships important as well. It's important that the language that's formed through the description of these relationships have some kind of coherence to the reader of the code.

If you haven't gotten it yet, what Kay has really been talking about is "cellular programming". It resembles the internet in a way: computers passing messages to each other, except that the message passing in Smalltalk has been synchronous. Incidentally Erlang also uses message passing (between "nodes", I believe), but it's asynchronous, enabling concurrency.

I think it's understandable that Paul Murphy would come to the conclusion he did about OOP, given the way it's been implemented in the majority of settings. I think the reason is despite our use of OOP-like languages we still insist on separating data from code, and have a mindset that says "Data must be processed." Our tools embody this mindset as well. Most programmers are also trained in the style of procedural programming, which promotes the idea of creating monoliths of code. Yet there's confusion, because you have this schizophrenic thing going on in OO practice where for brief periods of time data and code are tightly coupled in business objects. So I can see why one would have the reaction, "What's the point of having objects? You're really just fooling yourself," because we kind of are. In common use of what we call "OOP" we use it to manage the complexity of the business computing model, and the data interactions, but we're not using its full power. We're doing data processing with "OOP glasses on" to pretend we're doing something different than we did before.

The way object-orientation has been implemented in industry has been in conservative baby steps. It has formalized the notion of abstract data types, which is not too far from data structures.

What happened historically is that the development world chose the Simula model of objects (Simula was written in the late 1960s, based on Algol) as its primary development model, and then borrowed a few ideas from Smalltalk (inheritance and polymorphism), resulting in C++. Java and .Net, which derive from this heritage, have some properties that are steps in the right direction, but they are so conservative. Java and .Net today could adopt a late-bound architecture in the VM that would make them more dynamic, and still run at a decent speed, but to date Sun and Microsoft haven't done it. They've emulated late binding in a few web technologies like JSP and ASP.Net 2.0 where you write scripts that are compiled on the fly, and source changes are constantly monitored via. the "come alive and die" action of web apps. The other Java and .Net technologies are still compiled and linked in an early-bound fashion, which forces the development. environment to either recompile everything, even for small changes, or do gymnastics of incremental compilation and linking to optimize the process. "Edit and continue" functionality gets really hairy with early binding, which is the reason it took Microsoft so long to fold it back into .Net.

As real engineers know, it's sometimes necessary to change your perspective, introduce yourself to unfamiliar knowledge, to find the most effective solution to a problem. This is a real problem within the software development field, because programmers strongly prefer the familiar, and almost universally reject the unfamiliar. The failure rate in software development is astounding, yet our methods of software development change extremely slowly. This is what Einstein identified as insanity: Doing the same thing over and over again and expecting a different result.

Paul Murphy wrote and published The Unix Guide to Defenestration. Murphy is a 25-year veteran of the I.T. consulting industry, specializing in Unix and Unix-related management issues.