Paul's Pontifications: 2008

Sunday, December 14, 2008

Where did all the money go?

I've got a simple question about the Credit Crunch? Where did all the money go?

I've always thought of money as something that is "conserved" in the physics sense, like energy. Money in = money out + money stored. So when someone "loses" money it implies that someone else got it. If I run a company that loses money, what it means is that I'm getting less money in than I'm spending on staff and supplies. But the money hasn't disappeared, its just moved from my pockets to my suppliers (and their suppliers, and so on).

But the financial institutions seem to have had billions of dollars disappear into thin air. They got poorer, sometimes so poor that they went bust, but nobody seems to have become correspondingly richer.

I guess this has something to do with fractional reserve banking, which I know doesn't conserve money. Suppose I start with £1000 and put it in a bank. The bank shows a balance of £1000, but it doesn't just hang on to my money, it loans £500 of it to Joe Bloggs, who spends it on a new TV, and the person who sold the TV also puts the money in the bank. So now the bank has $1,500 on deposit even though our imaginary economy just started with £1,000. So what happens if Joe can't pay the money back?

Can someone enlighten me?

Wednesday, November 5, 2008

Fundamentalist terrorist assassination plot defeated

Tonight the British people celebrate the defeat of a terrorist assassination plot. A group of disaffected men, followers of a foreign religion and egged on by propaganda from overseas and a radical cleric, planned a spectacular atrocity: they would blow up the entire government in one huge explosion. The plot was foiled by the security services, who, responding to a tip-off, mounted a surveillance operation and arrested one of the plotters red handed as he prepared to detonate the explosives. The rest of the gang were tracked down and shot or arrested when soldiers stormed their hideout. In accordance with an executive order from the head of state the survivors were subsequently imprisoned, tortured, and executed after a show-trial.

This happened over 400 years ago, on November 5th 1605. The plotters were Catholics rather than Muslims, but apart from that the story sounds disconcertingly modern, although we tend not to go in for show trials these days: they went out of fashion with Stalin.

Today the Gunpowder Plot is celebrated as Bonfire Night: Guy Fawkes (the one caught with the explosives) is still burned in effigy, and in some places the Pope gets the same treatment as well. However, over the past 400 years the anti-Catholic content has almost entirely leached away: now its just an excuse to let off some fireworks. A Catholic can join in the celebrations just as I (a Briton) joined in the 4th of July celebrations when I was living in the US.

Maybe one day we will look back on 9/11 the same way. I certainly hope so.

Tuesday, October 28, 2008

Petition on Broadband Advertising

I've become rather depressed about broadband advertising. I found it very difficult to discover the web page describing my own ISPs traffic policy. Furthermore if you compare it to this version from January on the Wayback Machine you can see that the headline speed on the "Large" package has gone up from 4Mbit/sec to 10Mbit/sec, but the evening download limit has merely increased from 800MB to 1,200MB, while extra restrictions have been added starting at 10am as well. Somehow that 10MBit/sec upgrade doesn't feel so generous now: when I want to install Fedora 10 I'd better make sure I schedule the download to start after 9pm.

Virgin Media, in common with most other ISPs, says that the few people who make heavy use of their broadband link reduce capacity for everyone else, and that providing truly "unlimited" service for these few would mean higher bills for everyone. They do indeed have a point. If I want truly unlimited service then I'm sure I can get it, for a price. And the fact is that I probably don't want to pay the price. The occasions when I do want a multi-gigabyte download are sufficiently rare that I can put up with scheduling around Virgin's traffic limits.

So why am I depressed about it? Its because my problem with finding Virgin Media's traffic management policy is not unusual. A sample taken from Google found:

TalkTalk have a 40GB monthly limit right up there on the front page, although I'm not sure if you can find out how much of your limit you have used this month. What happens when you use it all? Presumably you are cut off until next month.

O2 claims "unlimited" usage, but if you search the site for the word "unlimited" you find that excessive use at peak times will lead to warnings and then account termination. They don't define "excessive" or "peak time". I couldn't find a link to this information from their list of broadband features; it certainly wasn't obvious.

Tiscali have a similar policy, except that after three warnings they limit your peak time speed. They also don't define what "excessive" and "peak time" mean, and they don't say how long this will last or what the speed limit will be. This page was two not-very-prominent links away from the package features list.

Fast actually sells a range of monthly capacity limits. They warn you by email when you hit 90%, and when you go over 100% they throttle you down to 100kBit/sec. Full marks!

So out of 5 providers (including Virgin) we have only one that makes all the limits and policies clear up-front while three try to hide behind weasel words and hard-to-find web pages. This makes it difficult for consumers to figure out what they are buying. Is Tiscali's unstated policy better or worse value for money that one of the Fast packages? Even if its better at the moment, will it still be so next month? As a consumer I have no way of knowing. Worse yet, if I were a technically naive consumer I might not even realise that the question needed asking. "Unlimited" sounds much better than "40Gb monthly limit".

I think that something needs doing about this. However the Advertising Standards Authority have wimped out: they have declared that "unlimited" actually means "95% of users don't hit the limit in any given month". So its hardly surprising that people are confused.

Therefore I have started an on-line petition at the Number 10 web site. If you are a British citizen or resident then I urge you to sign it. The petition calls on the Government to require ISPs to make all caps and limits on their services a prominent part of their advertising. Only when they do so will consumers have a clear choice between different packages.

Sunday, October 26, 2008

SSDs and the return of the root partition

Back in the days when Unix was new to the world computers came with two sorts of mass storage: fast-small-expensive, and slow-big-cheap. A minicomputer typically had one of each, so that frequently-used files could be kept in the fast unit while everything else was kept on the slow one.

Unix was designed with this in mind, which is why to this day most programs on Unix and Linux are found in one of two directories: /bin holds the frequently used ones, and /usr/bin holds the less frequently used ones. A similar arrangement applies to libraries, which are held in /lib and /usr/lib. The idea was that the "root" partition was put on the fast small device, and the "user" partition went on the big one. This led to some other restrictions as well; /bin and /lib together have to have all the software needed for a minimally functioning system, because until all that stuff is going the system can't mount /usr to get at the rest of it.

All that became irrelevant in the age of the Linux PC because the entire system hung off one physical disk. But maybe that is about to change. Solid State Disks (SSDs) are entering the disk world. They have zero mechanical latency and their read speed is blisteringly fast. Write speed is more variable, but the new Intel product is reported to be very fast. It is also small (80GB) and expensive ($600).

So it looks like the time has once again arrived for keeping the commonly used programs and libraries on a small fast disk and everything else on something big and slow. The original root versus /usr division doesn't work for this of course: people want the system to fire up X windows, KDE or Gnome and a bunch of apps, not just a shell prompt. But all of those things will fit very nicely onto one of these SSDs. Better yet, the user files have now been shifted out from /usr (from which it got its name) and put in /home. So this needs zero surgery to the standard Linux layout.

For the average user $600 is a bit steep though. What is actually needed here is something smaller and cheaper. I run a well-populated version of Fedora, and my root partition with all the Fedora packages is under 10GB. So I while I can't imagine blowing $600 (or whatever that is in £s these days) on 80GB of SSD, I could certainly imagine putting a 10GB version in for $100. That would hold the Linux kernel, X Windows and all the apps, with enough room left over for /etc. That would increase the speed of booting up, logging in and starting applications without greatly increasing the cost of the system, especially since that often involves reading lots of different files scattered around the disk. Even better, most of this stuff is only written rarely, and then mostly in the background by system updates. So the current poor write performance of most SSDs doesn't matter.

(Update: I have of course forgotten that by default Unix and Linux keep track of the access time (atime) of files, which means that every time you read something a little metadata write happens. So write performance does matter here).

Looking around, the nearest product to this specification seems to be the OCZ 32GB unit, which retails for a bit over £100 (say, around $170). Thats bigger and more expensive than what I want, but hey, Moore's Law seems to be working so I'll just wait a bit. I also note that all these products seem to be targetted at the laptop market, which makes sense given SSDs other advantages of low power consumption and physical robustness. Maybe they should start thinking about the market for hybrid desktop systems as well.

Tuesday, September 23, 2008

Why the banks collapsed, and how a paper on Haskell programming can help stop it happening next time

Trading Risk

The financial system exists to trade three kinds of thing: money, commodities and risk. Money and commodities are the easy bit. Either I have £1,000 or I don't. Similarly with commodities: either I have a barrel of oil or I don't. But risk is a very different matter. In theory risk is easy to quantify: just multiply the probability of something by the cost, and you have the expected loss. But in practice its not so simple because the probability and cost may be difficult to quantify, especially for rare events (like, say, a global credit crunch). Many of the factors that go into a risk model are subjective, so honest people can have genuine disagreements about exactly what the risk is.

The Slippery Slope

Unfortunately risk assessment is not value-neutral. Risk has negative value: you have to pay people to take it off you. The higher the risk, the more you have to pay. And because the amount of risk is always debatable this is a very slippery slope; the people paying others to take the risk away have every incentive to present a lower estimate. Everyone can see that everyone else is doing the same, and so methods of hiding or downplaying risk migrate from dodgy dealing to open secret to standard practice.

Specific examples abound throughout the recent history of the finance industry;

The retail mortgage houses that originally lent to "sub-prime" clients would hire valuers who were known to be on the generous side with their valuations. So any valuer who wasn't so generous found their income drying up. Background checks on clients were cut back, then eliminated. Eventually borrowers were simply told what income to claim on the forms, regardless of what they actually earned.
These loans were then bundled up and sold. The idea was that the buyers would each get a share of the incoming loan repayments. Rights to this stream of money were divided into "tranches", the idea being that, for instance, Tranche 1 would get the first 34% of whatever money was due, Tranche 2 would get the next 33%, and Tranche 3 would get the last 33%. When some borrowers defaulted (as some always do), Tranche 3 would lose out first, then Tranche 2. Tranche 1 would only fail to get all their money if the overall repayment rate fell below 34%, which had never happened. The game here was to persuade a credit rating agency that Tranche 1 was so safe that it was worthy of a "Triple A" rating, because that meant that banks, insurance companies and similar big financial institutions could legally buy this debt without having to put cash aside to cover potential losses. The rating agencies earned fees for evaluating securities, so just like the house valuers they found it paid to be on the generous side.
All these institutions had risk management departments who were supposed to watch out for excessively risky behaviour. But in practice they found it very difficult to blow the whistle. Risk managers tell stories of being given two days to review a deal that took ten people a month to negotiate, and of accusations of "not being a team player" when they questioned over-optimistic claims. This story from the New York Times has some details. Look through the comments after the story as well; many of them are by people with their own tales of risk.

None of this is new; similar behaviour has contributed to past financial crises. In theory more regulation can prevent this, and everyone is now planning or demanding lots more regulation (even the banks). But in practice regulation has failed repeatedly because the regulators always concentrate on the way it went wrong last time. Regulations have to be written carefully, and the companies being regulated have to be given time to understand changes and put their compliance mechanisms in place. This prevents regulators from moving as fast as the institutions they regulate.

The regulators also don't have visibility of the information they need to assess systemic risk. Systemic risk arises because financial companies are exposed to each other; if one institution fails, others have to write off any money it owed them, possibly pushing them into bankruptcy as well. Regulators try to make companies insulate themselves by avoiding excessive risk and keeping some cash on hand, but without a clear picture of the risks being run by each company they have no way to tell if this is enough.

The basic problem, I believe, is the food chain of risk management within each institution. At the top are the negotiators and fund managers who design and package the securities. Then the lawyers are bought in to specify the precise terms of the deal. Somewhere along the way the "quants" will be asked to develop mathematical models, and at the bottom coders will be given the job of turning the models into executable code that will actually determine the real price and risks. It is this food chain that needs to be rethought, because its hiding important information.

This 2000 paper by Simon Peyton Jones, Jean-Marc Eber and Julian Seward shows a way forwards. It describes a Domain Specific Language embedded in Haskell for describing the rights and obligations imposed by a contract. Arbitrarily complicated contracts can be built up using a small collection of primitives. Aggregations of these contracts can also be created, as can risks of default and bankruptcy. This created quite a stir in the quantitative analysis world when it was presented, as it was the first time anyone had proposed a formal language for describing contracts. Today the list of commercial Haskell users includes a number of financial institutions using this kind of technique to model their market positions.

But on its own this is just a faster and more efficient way of getting the same wrong answer. It doesn't solve the underlying problem of concealed systemic risk. The solution has to be for big financial companies to reveal their positions to the regulators as formal models of the contracts they have written. At the moment they don't even have to reveal all their contracts, but merely knowing the legal terms of a contract is only the first step. Those terms have to be converted into a mathematical model. That model probably already exists, but only as an internal artefact of the parties to the contract. What ought to be happening is that the contract is specified in a well-defined mathematical language that can be converted into a model automatically. If the regulators have this information about all the contracts entered into by all the finance companies then they can model the impact of, say, a downturn in the housing market or a jump in the price of oil, and if they see systemic risk looming then they can order the companies involved to take corrective action. Unlike the various Risk Management departments they will be able to see the whole picture, and they don't have to worry about being "team players".

Sunday, September 7, 2008

ISPs and Bandwidth

This Slate article reminded me of some stuff about how the Internet works that isn't widely appreciated. For a few years now ISPs have been complaining about "bandwidth hogs" while at the same time advertising high bandwidths for fixed prices. Comcast has had its knuckles rapped for discriminating against particular traffic. But this hides the real issues, which are to do with the structure of the market for long-haul infrastructure.

Your ISP has to pay for connection to the Internet in exactly the same way that you do: it pays a big telecom company like AT&T or Qwest. The only exception is if your ISP is one of those companies, but even then the retail ISP business will be a separate division that has to "buy" bandwidth from its parent. These "tier 1 ISPs" don't publish price lists, but the general pricing structure looks a lot like the ones described in the Slate article, and for the same fundamental reason: capacity is limited. So your local ISP may buy, say, 1 terabyte per day upstream and 2 terabytes downstream, with any traffic over those thresholds being charged per megabyte.

Aside: Actually the whole thing is much more complicated: a retail ISP will generally buy connections to more than one upstream ISP, which may be either tier 1 or "tier 2" (with long-haul bandwidth but not global presence). Often it will have multiple connections to each of these ISPs, each of which may have different price plans. Any ISP may also have "peering" agreements with other ISPs of the same size. These are free, but are not allowed to carry "transit" traffic destined for anywhere else. Everyone always tries to offload traffic onto someone else as fast as possible, even if the resulting routes are not ideal. Managing this mess to keep the customers happy at minimum cost is a key skill in the ISP business

The retail ISPs are therefore caught between a rock and a hard place. They are in a commodity business, but the traditional retail price plan of "all you can eat at a given bandwidth" doesn't match their cost structure. Its a general rule that if you are in such a market and have a competitor who's price plan does match their cost structure then you are bound to make a loss, because the customers who find you cheaper are going to be the ones who cost you more than they pay, while the ones who would balance this by paying more than they cost find your competitor cheaper, so they go there instead.

Thus ISPs will gradually converge on pricing plans that are simplified versions of the cost structure of their industry. This will probably be based on a combination of peak-time limitations and traffic caps that give people an incentive to shift their heavy usage off-peak. The winners will be the ones who can innovate. The challenges are:

At any given point in time the network has a fixed bandwidth. Hence the challenge is not to reduce the total amount of data moved but to even out usage. Similarly heavy users are only a problem when they start pushing out other customers who might collectively pay more.
You can't force users to track their usage in detail. Price plans that suddenly cut a customer off until tomorrow (or next month) are scary and unfriendly. Plans that charge extra for heavy usage are even worse, especially for families with teenagers. Throttling is more user-friendly.
Negotiate with your upstream ISPs to bring your costs into line with your pricing structure. They, too, need to shift usage off-peak and will be prepared to offer pricing plans accordingly. However their idea of "off-peak" may not be the same as a retail ISP with lots of home users.
Transparency will happen whether you want it to or not. At least some end users are smart enough to detect traffic shaping and other tricks, and their results will be picked up by price-comparison sites for everyone else to read.

Application-specific traffic shaping (as tried by Comcast) won't work. Customers and regulators both hate it, but more importantly it gets you into an arms race between stealth P2P protocols and your packet inspection software that you can't win. However there is another option: offer your customers the opportunity to do their own traffic shaping. For instance you could have a metered high-priority service for everyday browsing combined with an unmetered low priority service for bulk downloads. The challenge is to give the customer a simple easy-to-use system that distinguishes between the two and (automatically as far as possible) uses the right one.

In many ways the situation reminds me of 1997, when large ISPs first started to deny peering agreements to smaller competitors and made them pay for transit agreements instead. There were many calls to ban the practice, and grave predictions of the "balkanisation of the Internet". But in practice the economics of Metcalf's Law guaranteed that a well-connected network would have more value than a disconnected network, and after that it was just a matter of how that extra value was distributed. I believe that much the same thing will happen with bandwidth. The Internet has most value when its pipes are full of traffic, and if there is demand for more bandwidth then there will be money to be made by providing it. After that, its just a matter of working out who pays how much for what. As long as the market remains competitive it will converge on the optimum solution, probably quite rapidly.

Sunday, August 3, 2008

The Netwise Kids of Today

My son is keen on Habbo (a virtual world ostensibly aimed at teenagers, but probably attracting a lot of pre-teen kids), and spends quite a bit of his pocket money on it. We regard Habbo as a cynical method of separating kids from their money. Lack of credit cards isn't a problem as you can buy Habbo credits using a mobile phone. However we also regard that "I've just blown all my money on this junk" moment as a valuable learning experience, so we've let him take his pocket money in the form of Habbo credits bought with my Visa card.

This being the Internet, a wide range of scams have appeared trying to separate kids from their virtual "furni" bought with Habbo credits. I'm not going to link to any directly because I suspect that they will be too short-lived, but a search for "Habbo cheat" turns up a good selection. Warning: turn your browser up to maximum paranoia before visiting any of these sites.

My son told me enthusiastically about one of these sites that promised lots of free stuff. He just had to fill in his Habbo username and password on a form, and come back in 24 hours. I was horrified. But no, my son reassured me, he had created a new empty Habbo account just for this experiment, so if it was a scam he wouldn't lose anything. (Sure enough, he didn't gain anything either).

I had never told him about throw-away accounts. I had told him that the Internet is not always a friendly place, but I had not expected him to personally be the target of an attempted fraud. Still less did I expect him to identify the fraud and devise a work-around. There are adults who are dumber than this. However I'm not posting this to boast about my clever son, because I've since found out that this is actually fairly typical. The kids round here generally have Net access, either in their own homes or via a friend. They play in a wide variety of virtual worlds. They have learned, sometimes the hard way, about password security, how and where to write down account details so you don't forget them, not letting people "shoulder surf", and how to recognised various forms of fraud. They talk about this stuff, exchanging war stories and security tips.

Internet crime is still enjoying its boom years at the moment. The archetypal victim is too computer-clueless to even understand what malware is, never mind defending against keylogging viruses or spotting phishing websites. There are still many such people, but the next generation is growing up too Internet-savvy to be easily scammed. They can list different categories of malware and describe their uses and how to defend against them. They have grown up surrounded by the Internet, so taking measures to protect themselves from its hostile elements strikes them as entirely normal. As they grow up they will take this attitude, and the associated knowledge, for granted. My generation taught its parents how to program their video recorders. The next generation is going to teach its parents how to secure their computers.

Saturday, July 12, 2008

Rate Cap Revisited

In my last post I complained that I was being rate capped by Virgin Media. I said I didn't mind them having a capping policy, but it ought to be public. I also complained I wasn't getting the full bandwidth I was paying for.

Turns out I was wrong on both counts. Their rate capping policy is here. I did look for it before posting, and it could be easier to find. But its there, its public, and it says everything I would expect it to.

As for bandwidth, the "Large" package I am signed up for was originally for 4 Mbits, which is what I am getting. Virgin are currently upgrading their network to give everyone 10Mbits, and my region is due to be upgraded this month. However new customers get the full 10Mbits wherever they are. I just looked at their headline offer and assumed it applied to me as well.

So, I no longer have any issues with Virgin Multimedia.

Saturday, July 5, 2008

Rate Capped

This is a screenshot from KDE System Guard showing my download speed for the Fedora Core 9 distribution DVD, around mid-day on Saturday. After downloading 1.8GB I was suddenly slapped down to 100 kBytes/sec, which translates to around 1 Mbit/sec with all the packet overhead. This happens consistently with various download sites, so I'm confident its my ISP.

I'm paying Virgin Media for their "Unlimited" 10Mbit service (although I haven't seen anything over 500 kBytes/sec, which would mean around 5Mbit in reality). I was never told that my download rate would be capped, although the fine print in the sign-up page points to their AUP, which says in Section 7 that they reserve the right to restrict Internet services in any way at any time.

However this post is not actually a complaint about rate capping. ISPs are there to make money by providing a service, and as a rule you get what you pay for. In theory if I want a better service all I have to do is switch ISPs, and possibly pay more money. Our email address is routed through a domain I personally own, so we don't even have to tell anyone else that our household has switched ISPs.

But how do I know that I'll get a better deal if I go elsewhere? All retail ISPs offer basically the same terms, which consist of a big headline rate accompanied by a fine-print disclaimer pointing out that you might never get it. If they have a rate-capping policy they certainly don't advertise it.

I don't actually want a faster headline speed (although I'd be very happy to get the one I'm currently promised). I don't even want a promise of "no rate caps": the ISP argument that heavy downloaders hurt responsiveness for all users is valid. All I want is to be able to look at ISP adverts and figure out where the best value for money is. That means the following information:

Ratio of customers to actual incoming bandwidth.
The rate-capping policy: e.g. capped rate and criteria for applying it.

Anybody know a UK ISP that actually provides this?

Friday, May 9, 2008

Is Functional Programming the new Python?

Back in 2004 Paul Graham wrote an essay on the Python Paradox:

if a company chooses to write its software in a comparatively esoteric language, they'll be able to hire better programmers, because they'll attract only those who cared enough to learn it. And for programmers the paradox is even more pronounced: the language to learn, if you want to get a good job, is a language that people don't learn merely to get a job.

Some tentative support for this theory comes from a study of programming languages done in 2000. The same task was given to over 80 programmers. The chart shows how long they took. Obviously the average for some languages was a lot less than for others, but the interesting thing for the Python Paradox is the variability. Java had huge variability: one developer took over 60 hours to complete the task. Meanwhile the Python developers were the most consistent, with the lowest variance as a percentage of the mean. I suspect (but can't prove) that this was because of the kind of programmers who wrote in Java and Python back in 2000. Java was the language of the Web start-up and the dot-com millionaire, but Python was an obscure open source scripting language. The Pythonistas in this study didn't learn it to get a job, but many of the Java programmers did.

But if this study was repeated today I bet the spread for Python would be a lot larger. Maybe still not as big as Java, but more like C++ or Perl. Because today you can get a good job writing Python. A quick check of jobs on dice.com found 1450 Python jobs against 7732 C++ jobs and 15640 jobs for Java. Python hasn't taken over the world, but the jobs are there.

So the smart employers and developers need something new to distinguish themselves from the crowd, and it looks like functional programming might be it. Programming Reddit carries lots of cool stuff about Haskell, and job adverts are starting to list a grab-bag of functional languages in the "would also be an advantage" list. For instance:

- Programming experience with more esoteric and powerful languages for data manipulation (Ruby, Python, Haskell, Lisp, Erlang)

So it looks like the with-it job-seekers and recruiters may be starting to use functional programming to identify each other, just as they used Python up to 2004.

Update: Oops. I just remembered this post which started me thinking along these lines.

Sunday, May 4, 2008

An Under-Appreciated Fact: We Don't Know How We Program

I was talking to a colleague from another part of the company a couple of weeks ago, and I mentioned the famous ten-to-one productivity variation between the best and worst programmers. He was surprised, so I sketched some graphs and added a few anecdotes. He then proposed a simple solution: "Obviously the programmers at the bottom end are using the wrong process, so send them on a course to teach them the right process."

My immediate response, I freely admit, was to open and shut my mouth a couple of times while trying to think of response more diplomatic than "How could anyone be so dumb as to suggest that?". But I have been mulling over that conversation, and I have come to the conclusion that the suggestion was not dumb at all. The problem lies not with my colleague's intelligence but in a simple fact. It is so basic that nobody in the software industry notices it, but nobody outside the industry knows it. The fact is this: there is no process for programming.

Software development abounds with processes of course: we have processes for requirements engineering, requirements management, configuration management, design review, code review, test design, test review, and on and on. Massive process documents are written. Huge diagrams are drawn with dozens of boxes to try to encompass the complexity of the process, and still they are gross oversimplifications of what needs to happen. And yet in every one of these processes and diagrams there is a box which basically says "write the code", and ought to be subtitled "(and here a miracle occurs)". Because the process underneath that box is very simple: read the problem, think hard until a solution occurs to you, and then write down the solution. That is all we really know about it.

To anyone who has written a significant piece of software this fact is so obvious that it seems to go without saying. We were taught to program by having small examples of code explained to us, and then we practiced producing similar examples. Over time the examples got larger and the concepts behind them more esoteric. Loops and arrays were introduced, then pointers, lists, trees, recursion, all the things you have to know to be a competent programmer. Like many developers I took a 3 year degree course in this stuff. But at no point during those three years did any lecturer actually tell me how to program. Like everyone else, I absorbed it through osmosis.

But to anyone outside the software world this seems very strange. Think about other important areas of human endeavor: driving a car, flying a plane, running a company, designing a house, teaching a child, curing a disease, selling insurance, fighting a lawsuit. In every case the core of the activity is well understood: it is written down, taught and learned. The process of learning the activity is repeatable: if you apply yourself sufficiently then you will get it. Aptitude consists mostly of having sufficient memory capacity and mental speed to learn the material and then execute it efficiently and reliably. Of course in all these fields there are differences in ability that transcend the mere application of process. But basic competence is generally within reach of anyone with a good memory and average mental agility. It is also true that motor skills such as swimming or steering a car take practice rather than book learning, but programming does not require any of those.

People outside the software industry assume, quite reasonably, that software is just like all the other professional skills; that we take a body of knowledge and apply it systematically to particular circumstances. It follows that variation in productivity and quality is a solvable problem, and that the solution lies in imposing uniformity. If a project is behind schedule then people need to be encouraged to crank through the process longer and faster. If quality is poor then either the process is defective or people are not following it properly. All of this is part of the job of process improvement, which is itself a professional skill that consists of systematically applying a body of knowledge to particular circumstances.

But if there is no process then you can't improve it. The whole machinery of process improvement loses traction and flails at thin air, like Wiley Coyote running off a cliff. So the next time someone in your organisation says something seemingly dumb about software process improvement, try explaining that software engineering has processes for everything except actually writing software.

Update: Some of the discussion here, and on Reddit and Hacker News is arguing that many other important activities are creative, such as architecture and graphic design. Note that I didn't actually mention "architecture" as a profession, I said "designing a house" (i.e. the next McMansion on the subdivision, not one of Frank Lloyd Wright's creations). People give architects and graphic designers room to be creative because social convention declares that their work needs it. The problem for software is that non-software-developers don't see anything creative about it.

The point of this post is not that software "ought" to be more creative or that architecture "ought" to be less. The point is that we need to change our rhetoric when explaining the problem. Declaring software to be creative looks to the rest of the world like a sort of "art envy", or else special pleading to be let off the hook for project overruns and unreliable software. Emphasising the lack of a foundational process helps demonstrate that software really does have something in common with the "creative" activities.

Wednesday, March 19, 2008

Why Voting Machines Can't Add Up

Ed Felten is continuing his excellent work exposing the broken state of electronic voting machines. Many people are wondering how such software can have been allowed out by its developers. The discrepancies don't (at the moment) seem to be a result of fraud, just very buggy software.

Voting machines are obviously important, so their development is regulated. I've never worked in the voting machine industry, but I have worked in another kind of federally regulated software: medical devices. So I know how regulated software projects work, and how they don't.

The fundamental problem underlying this is that nobody in the world actually knows how to write software that reliably does what you want. There are quite a lot of people who can write such software, but if you ask them how its done they basically waffle. Most of them agree on a list of steps to take, starting with writing down exactly what the software is supposed to do. Various attempts have been made to codify this list, and they all look pretty similar. The voting machine standards are just another variation on this theme.

However this is all cargo-cult engineering. We know that the people who can summon up the magic cargo planes do it by putting things over their ears and saying magic words, but it doesn't follow that if we put things on our ears and say the same magic words the cargo will appear. So it is with software engineering. You can write Requirements Documents and Class Diagrams and Test Scenario Documents and Test Execution Reports until you run out of paper, but it won't make any difference if you don't have the Quality Without a Name.

Imagine you are managing a development project to build a voting machine. Your mission is to get the thing on the market. You have been given a bunch of programmers, half a human factors person and a quarter of an industrial designer. The time available isn't long enough, but you know its no use complaining about that because its not your boss's fault, or even the CEO's fault. Its the fault of the people at Big Competitor who are planning to release their product just in time to tie up the whole market, so if you don't deliver the product at the same time then its not going to matter who's fault it was, the whole division is going to get laid off anyway. You could get some more people if really wanted, but you know that more people aren't actually going to speed things up.

The Quality Department have downloaded the voting machine regulations and someone has been going through them and writing down a list of the things they say you have to do and the order they have to be done in. This is very good. In fact you send the Head of QA a little note saying how helpful his minion has been to your project, because now you have something to aim at. Project Management is mostly a matter of getting your hoops lined up so that you and your minions can jump through them as quickly as possible, and the QA minion has done the regulatory hoops for you. The regulations boil down to a list of documents that have to be shown to the inspectors (who you are going to hire, but thats another story). Each document has a list of things it must contain, and some of those things have to be traceable to other things. All you need to do now is start allocating people to things on the list and getting them ticked off. The list is long, but you have one big advantage: there is nothing to say how good any of these documents have to be. They don't have to be good, they just have to exist.

One of these documents is called "source code". Of course that one does have some quality requirements on it: its got to pass a bunch of tests. But the tests themselves don't have any quality requirements; like everything else they just have to exist. And passing the tests is the only quality requirement on the code. Once the independent laboratory you hired has run the tests and said "pass" you are over the finishing line and you can start selling these things.

This means that you have a very strong motivation to keep the testing to the minimum you can get away with. The regulations say you have to have a test for each item in the original requirements document, and this test has to be run once. If your software fails a test then you get to fix it, and if the fix was small enough you can get away without repeating all the other tests as well. During this whole time your eyes are fixed on the finishing line: the objective is to get this thing over the line. What happens to it after that is someone else's problem.

When you look at these machines from a project manager's point of view you start to see how they got to be so unreliable. "Quality Assurance" is primarily a matter of making sure you get all the items in the regulations ticked off; it has nothing at all to do with the original meaning of the word "quality". Ironically the regulations may actually do more harm than good because they divert energy from real quality onto generating the required inches of documentation.

Over the years I've spent a lot of time trying to figure out how to fix this problem, and I still don't have an answer. Abolishing private companies is a cure worse than the disease, and it won't cure the disease anyway because it won't abolish projects and the need to manage them. Software Engineering has a bad case of Quality Without A Name, and there is no prospect of it getting better soon.

However in the limited domain of voting machines I believe the best cure is sunlight: we may not be able to define quality in software, but we know it when we see it. The source code for voting machines must be published. The manufacturers will scream and shout about their precious IPR and trade secrets. This is nonsense. Any voting machine must have a well defined version of someone's software running, so any illegal copying will generate a cast-iron audit trail back to the perpetrator. And there are no real trade secrets in voting machines: counting votes is not, when it comes down to it, a particularly complicated problem. The voting machine manufacturers will make just as much money as they do now. In fact they'd probably make more because if the machines were trustworthy then people would learn to trust them. However the first vendor to start publishing their source code will be at a disadvantage because everyone else can pinch bits of it with very little risk of detection (and if they get caught they can just blame a rogue programmer). So the regulations on voting machines should be changed to require the publication of the code (and other design documentation too, while we are about it). That will create a real requirement for quality source code. Until then we are stuck with the current mess.

Thursday, January 10, 2008

Why Haskell is Good for Embedded Domain Specific Languages

Domain Specific Languages (DSLs) are attracting some attention these days. They have always been around, of course: Emacs Lisp is a DSL, as are the various dialects of Visual Basic embedded in MS Office applications. And of course Unix hands know YACC (now Bison) and Lex (now Flex).

However creating a full-blown language is a lot of work: you have to write a parser, code generator / interpreter and possibly a debugger, not to mention all the routine stuff that every language needs like variables, control structures and arithmetic types. An embedded DSL (eDSL) is basically a short cut if you can't afford to do that. Instead you write the domain-specific bits as a library in some more general purpose "host" language. The uncharitable might say that "eDSL" is just another name for "library module", and its true there is no formal dividing line. But in a well designed eDSL anything you might say in domain terms can be directly translated into code, and a domain expert (i.e. a non-programmer) can read the code and understand what it means in domain terms. With a bit of practice they can even write some code in it.

This paper describes an eDSL for financial contracts built in Eiffel which worked exactly that way. It doesn't talk about "domain specific language" because the term hadn't been invented back then, but the software engineers defined classes for different types of contracts that the financial analysts could plug together to create pricing models. Its interesting to compare it with this paper about doing the same thing in Haskell.

But eDSLs have problems. The resulting programs are often hard to debug because a bug in the application logic has to be debugged at the level of the host language; the debugger exposes all the private data structures, making it hard for application programmers to connect what they see on the screen with the program logic. The structure of the host language also shows through, requiring application programmers to avoid using the eDSL functions with certain constructs in the host language.

This is where Haskell comes in. Haskell has three closely related advantages over other languages:

Monads. The biggest way that a host language messes up an eDSL is by imposing a flow of control model. For example, a top-down parser library is effectively an eDSL for parsing. Such a library can be written in just about any language. But if you want to implement backtracking then its up to the application programmer to make sure that any side effects in the abandoned parse are undone, because most host languages do not have backtracking built in (and even Prolog doesn't undo "assert" or "retract" when it backtracks). But the Parsec library in Haskell limits side effects to a single user-defined state type, and can therefore guarantee to unwind all side effects. More generally, a monad defines a model for flow of control and the propagation of side effects from one step to the next. Because Haskell lets you define your own monad, this frees the eDSL developer from the model that all impure languages have built in. The ultimate expression of this power is the Continuation monad, which allows you to define any control structure you can imagine.
Laziness. Haskell programmers can define infinite (or merely very large) data structures because at at any given point in the execution only the fragment being processed will actually be held in memory. This also frees up the eDSL developer from having to worry about the space required by the evaluation model. (update: this isn't actually true. As several people have pointed out, while laziness can turn O(n) space into O(1), it can also turn O(1) into O(n). So the developers do have to worry about memory, but lazy evaluation does give them more options for dealing with it.)
The type system allows very sophisticated constraints to be placed on the use of eDSL components and their relationships with other parts of the language. The Parsec library mentioned above is a simple example. All the library functions return something of type "Parser foo", so an action from any other monad (like an IO action that prints something out) is prohibited by the type system. Hence when the parser backtracks it only has to unwind its internal state, and not the rest of the universe.

There are other programming languages that are good for writing eDSLs, of course. Lisp and Scheme have callCC and macros, which together can cover a lot of the same ground. Paul Graham's famous "Beating the Averages" paper talks about using lots of macros, and together with his patent for continuation-based web serving it is pretty clear that what he and Robert Morris actually created was an eDSL for web applications, hosted in Lisp.

But I still think that Haskell has the edge. I'm aware of the Holy War between static and dynamic type systems, but if I you put a Haskell eDSL in front of a domain expert then you only have to explain a compiler type mismatch message that points to the offending line. This is much easier to grasp than some strange behaviour at run time, especially if you have to explain how the evaluation model of your eDSL is mapped down to the host language. Non-programmers are not used to inferring dynamic behaviour from a static description, so anything that helps them out at compile time has to be a Good Thing. And its pretty useful for experienced coders too.

(Update: I should point out that monads can be done in any language with lambdas and closures, and this is pretty cool. But only in Haskell are they really a native idiom)

Paul's Pontifications