My earlier post about increased productivity through functional programming stirred up a good deal of comment. A number of people replied that the libraries are more important than the language. More recently a similar point has been made by Karsten Wagner, who argues that code reuse is what makes languages productive.
I remember the early days of OO, when it was argued by many people (me amongst them) that OO languages would finally let us write reusable software. There was optimistic talk of “software factories”, and Brad Cox gave lots of talks about how software could now finally move from craft to engineering discipline, built on the back of libraries of reusable code. So the entire industry went for C++, but the easy creation of reusable software remained elusive. That is not to say you can’t ever produce reusable software in the language, but it is not significantly easier than in C.
Karsten accurately pins the reason why C++ failed to deliver: memory management. Back in those old days I was trying to explain to people why Eiffel would be so much more productive than C++, and garbage collection was a big part of it.
The reason why GC is so important lies in a more general principle called composability. Composability means that you can put two bits of code together and important correctness properties will be preserved automatically. This does not mean, of course, that the composition is automatically correct in a wider sense, but it does mean that you don’t introduce new bugs merely by sticking two things together.
Imagine two modules of code in a non-GC language like C or C++. Module X creates an object and hands a reference to that object over to Module Y. At some point in the future X will delete that object. However it is only safe to do so once Y has finished with it. So if X and Y were written independently then it is quite possible that Y will hang on to the reference longer than X expects. In short, manual memory management is not composable, because the composition of X and Y can introduce stale pointer bugs that were not present in either X or Y.
In a language with GC this is a non-problem: the collector will reap the object once it sees that both X and Y have finished with it. X can therefore forget about the object in its own time without having to know anything about Y. But programmers in non-GC lanuages have to resort to a number of workarounds. Either objects have to be copied unecessarily (which leads to stale data bugs instead of stale pointer bugs), or else some kind of reference counting or similar scheme must be employed. Reference counting is of course merely an ad hoc, informally-specified, bug-ridden, slow implementation of GC, and therefore stands as a classic example of Greenspun’s Tenth Rule.
But programming languages contain other examples of non-composable constructs. The current biggest offender is shared memory concurrency using locking. If X takes locks Foo and Bar, and Y takes locks Bar and Foo (in those orders), then sooner or later they are going to deadlock with X holding Foo and Y holding Bar. Ironically Java has the biggest problems here.
Surprisingly, program state generally is a source of non-composability. Mutable state is actually another form of manual memory management: every time you over-write a value you are making a decision that the old value is now garbage, regardless of what other part of the program might have been using it. So suppose that X stores some data in Y, and then Z also stores some other data in Y, overwriting what X did. If X assumes that its old data is still there then it is going to be in trouble. Either X needs to defensively check for new data, or Y needs to tell X about the change (the Observer pattern). Whichever way, the addition of Z to the system can introduce bugs that stem from the composition of modules rather than the modules themselves.
Another example is the Command pattern, which includes a method to “undo” a previous command. Why undo it? Because the global state has been changed. Get rid of the concept of a single unique state for the entire application and you get rid of the problem.
On a more prosaic level, many library modules have some internal state, and so require an initialisation call to “set things up”. Who exactly is responsible for this initialisation? If you compose two modules together then they may both imagine themselves soley responsible for making this initialisation call, and hence it will be done twice.
It is a truism of programming that “integration” is the most difficult and risky (from a scheduling point of view) part of developing a large system. Integration basically means composing together all the units to try to get a working system. And the biggest integration problems are due to the non-composability of the units. A defect within a unit is fairly easy to identify, but composition bugs are down to the interaction of two separate units in some unforseen way, which makes it correspondingly more difficult to pin them down.
So what would a language look like if it got rid of all of these non-composable constructs? The answer, basically, is Haskell. The underlying goal of functional language designers is to make everything completely composable. Haskell is the language that has advanced furthest towards this goal.
Erlang is pretty good as well. It doesn’t eliminate state, but it does keep it confined to individual processes: there are no shared memory locks. Processes have to communicate according to shared protocols, but this is a manageable dependency on a common abstraction. Even within a process the shared state can be reduced by careful use of functional programming rather than imperative.