Wednesday, August 22, 2007

Microsoft versus FOSS Configuration Management

This was originally posted to my old blog on December 3rd 2006. It was also discussed on Reddit at the time, so please do not repost it.


Joel Spolsky writes about the Vista shutdown menu and its excess of confusing options (what exactly is the difference between Hibernate and Sleep?). Moishe Lettvin, who happened to have worked on that menu, chimed in with an explanation of why it came out that way, which included a fascinating insight into how Microsoft handles configuration management.

For the uninitiated, configuration management is a lot more than just version control. It includes managing all the libraries and tools used in a build, and if multiple components are being incorporated into the final product then it also involves keeping track of those. The goal is to be able to go back to some build that happened last year, repeat the build, and come out with the same MD5 checksum at the end. Stop and think about what that involves for a minute. Its highly non-trivial. (And some compilers happen to make it impossible by using multiple threads, so that two consecutive builds generate the same code, but with functions in a different order. Under some high integrity quality regimes this actually matters).

The Microsoft problem is that they have thousands of people working on Vista, and a simple repository with everyone checking stuff in and out simply won’t scale. So they have a tree of repositories about 4 deep. Each developer checks stuff in to their local repository, and at intervals groups of repositories are integrated into their local branch. Hence the changes propogate up the tree to the root repository, from where they then propogate back down the other branches.

The trouble is, propgation from one side of the tree to the other can take a month or three. So if developer A and developer B need to collaborate closely but are on distant branches of the build tree then their code takes ages to propogate between them.

Now consider the case of open source software. The Linux kernel and Windows are actually organised in very similar ways: Linus owns the master repository, people like Alan Cox own sub-repositories, and handle integration up to Linus. Sub-repository owners are responsible for QA on the stuff they merge in, just like in Windows.

So why is Windows Vista in trouble, but free / open source software is not? After all, GNU/Linux overall has tens of thousands of people developing for it, and thousands of packages of which the kernel is just one. Worse yet, these tens of thousands of people are not properly organised and have very little communication. Microsoft can at least order all its programmers to work in certain ways and conform to certain standards.

Actually this disconnection is a strength, not a weakness. Conway’s Law says the structure of any piece of software will duplicate the structure of the organisation that created it. So in Microsoft there are lots of programmers who all talk to one another, and this leads to software where all the bits are inter-dependent in arbitrary ways. Open source developers, on the other hand, are spread around and have very narrow interfaces between each other. This leads to software with narrow well defined interfaces and dependencies.

Dependencies are the crucial thing here: if I am writing a new application that uses, lets say, the Foobar library, then I will want to depend on a stable version. If I write to the API in the daily snapshot then my code could suddenly break because someone submits a patch that changes something. So I write to the last stable release. If I really need some feature that is still being developed by the Foobar team then I can use it, but that is an exceptional case and I won’t be releasing my application until the Foobar feature stabilises.

Dependency management is probably the most important contribution of open source to software engineering. The requirement first became obvious under Windows, with “DLL hell”: different applications installed and required different dynamic libraries, and conflicts were inevitable. Then Red Hat users encountered a similar problem in “RPM hell”, in which they had to manually track the dependencies of any package they wanted and download and install them.

As far as I know Debian was the first distribution to really solve the dependency problem with apt-get. These days Fedora has yum and pirut, which do essentially the same job. Its not a simple job either. A program may have a dependency not just on Foobar, but on Foobar version 1.3.*, or anything from 1.3 onwards but not 2.*. Meanwhile some other package may depend on Foobar 1.6 onwards, including 2.*, and yet a third package requires Foobar 2.1 onwards. It is the job of apt-get and its relatives to manage this horrendous complexity.

(Side note: I remember when OO software was young in the nineties, people thought that the biggest challenge for resuable software was searching repositories. They were wrong: it is dependency management).

Open source also has a clear boundary to every package and a very precise process for releasing new versions. So when the new version of Foobar comes out anyone interested finds out about it. Any changes to interfaces are particularly carefully controlled, so I can easily tell if the Foobar people have done anything to break my application. If they have I can carry on depending on the previous version until I can resolve the problem. Then before my application finds its way into, say, Fedora, it has to be integrated into an explicit dependency graph and checked for conflicts with anything else.

Microsoft doesn’t do this. Vista is effectively a big blob of code with lots of hidden dependencies and no effective management of what depends on what. No wonder they are in trouble.

1 comment:

James Graves said...

There's a few other factors at work in the FLOSS world too.

There's a lot of interconnectedness in the Linux kernel development module. The SCSI guys can communicate directly with the USB guys, for example. And via git, they can easily swap patches around, even if they are mostly tracking different sub-trees.

You'd think that the situation with Linux kernel development would be worse, not better than Vista anyway. Sure, you've got some full-time people working on the kernel, but most are just part-time. So there are many more people and coordination should be more difficult.

I think the age and stability of the relevant APIs may also have an affect on the Windows vs. Linux development. However, I can't make an easy judgement as to what has been "stable enough" and what hasn't. Though I note that the Posix and X Window client protocols are so old and stable they're practically decrepit. I also note that DirectX has had a whole bunch of major releases in its 13 year life.