Showing posts with label functional programming. Show all posts
Showing posts with label functional programming. Show all posts

Saturday, April 30, 2011

Patent 5,893,120 reduced to mathematical formulae

I have previously argued that the Bilski ruling renders all software unpatentable in the US because any method of manipulating information (i.e an algorithm) can be translated into a Haskell program, every Haskell function is a formula in the Typed Lambda Calculus, and all the variants of the Lambda Calculus are part of mathematics. Hence any algorithm can be reduced to a mathematical formula, which US patent law states is not patentable subject matter.

The computer science community has tried to explain this to the patent industry, but these explanations have fallen on deaf ears. So I thought I would try a different tack.

Google has just been ordered to pay $5M for infringing patent 5,893,120 (hereafter "Patent 120"). This patent covers a very simple data structure and the algorithms for manipulating it. In fact much of the text of the patent is a pseudo-code implementation in a Pascal-like language. So I thought I would provide a practical demonstration of what has, until now, been a theoretical proposition; the reduction of a software patent to set of mathematical formulae. The result is below, and is also posted here (although I believe that the paste-bin will eventually expire).

I chose this patent partly because it is comparatively simple (there are 95 lines of Haskell source code in the file), but also because it is potentially important; Google's "infringement" consisted of using Linux, which uses this algorithm. Hence anyone using Linux is a potential target.

Red Hat is reportedly seeking a declarative judgement that this patent is invalid, and Google is expected to appeal, so this is not over yet. If anyone knows how I can contact their lawyers to draw these formulae to their attention, I'd be grateful.

Edit 1:A brief overview of the relationship between Haskell and the Lambda Calculus can be found here.

Edit 2:Of course a judge isn't going to know the Lambda Calculus from a lump of rock, but that is what expert witnesses are for. Get a professor of mathematics from an internationally recognised university to testify that these are formulae in the Lambda Calculus, and that the Lambda Calculus is part of mathematics, and you have a sound legal proof. The only thing the patent holders could do is find another professor to testify differently.

Edit 3:Sorry about the broken formatting. If you want to review the code you can either look at this colourised version or copy and paste it into your favourite editor.

Edit 4: This post has been slashdotted. Rather than try to respond to all the various comments individually I have posted a general response to the common themes.

In the meantime, here are the formulae:


{- |

The claims of United States Patent 5,893,120 may be reduced to a set
of mathematical formulae expressed in the Typed Lambda Calculus, also
known as System F. This file contains these formulae. The language
used is Haskell, which is a version of System F with extra "syntactic
sugar" to make it useful for practical purposes.

It happens that formulae expressed in Haskell can also be transated by
a Haskell compiler into executable computer programs. However that
does not make a Haskell function any less of a mathematical formula.
Otherwise the quadratic formula

> x = (-b +/- sqrt (b^2 - 4ac))/2a

would be rendered patentable subject material by the same argument.

-}


module Patent120 where

import Data.Char (ord)

-- The "ord" function defines a bijection between text characters and
-- numbers.

import Data.Map (Map)
import qualified Data.Map as M

-- The Data.Map module is a standard module defined entirely as a set
-- of Haskell formulae, in the same way as this module. The "import"
-- means that these formulae are incorporated with the formulae given
-- here.

{-
Claim 1. An information storage and retrieval system, the system
comprising:

a linked list to store and provide access to records stored in a
memory of the system, at least some of the records automatically
expiring,

a record search means utilizing a search key to access the linked
list,

the record search means including a means for identifying and removing
at least some of the expired ones of the records from the linked list
when the linked list is accessed, and

means, utilizing the record search means, for accessing the linked
list and, at the same time, removing at least some of the expired ones
of the records in the linked list.
-}

-- | A record with Birth value below some threshold is considered "expired".
type Birth = Integer


-- | Records in the list have their birth recorded, and also contain
-- a key and a value.
data Record k a = Record Birth k a

-- | True iff the record is older than the threshold argument.
expired t (Record b _ _) = b < t -- | True iff the record matches the key. matches k (Record _ k1 _) = k == k1 -- | The value stored in a record. value (Record _ _ v) = v -- | Records are stored in a linked list in the system memory. type Storage k a = [Record k a] -- | The "search1" function includes a threshold age parameter. -- It returns both the items that match the key and the linked list -- with the expired items removed. -- -- The recitation of "means" cannot be used to disguise the formula -- given here. Otherwise any mathematical formula could be patented -- by reciting e.g. "addition means" and "multiplication means" that are -- mechanically derived from the formula. search1 :: (Eq k) => Birth -> k -> Storage k a -> (Storage k a, [a])
search1 t k records = foldr nextItem ([], []) records
where
nextItem r@(Record b k2 v) (retained,found) =
(if b > t then r:retained else retained,
if k == k2 then v:found else found)


{-
Claim 2. The information storage and retrieval system according to
claim 1 further including means for dynamically determining maximum
number for the record search means to remove in the accessed linked
list of records.
-}

-- | Similar to "search1", but with an added parameter for the maximum
-- number of records to remove.
search2 :: (Eq k) => Int -> Birth -> k -> Storage k a -> (Storage k a, [a])
search2 n t k = (\(_,x,y) -> (x,y)) . foldr nextItem (n,[],[])
where
nextItem r@(Record b k2 v) (n2,retained,found) =
(n2-1,
if b > t || n2 == 0 then r:retained else retained,
if k == k2 then v:found else found)


{-
Claim 3. A method for storing and retrieving information records using
a linked list to store and provide access to the records, at least
some of the records automatically expiring, the method comprising the
steps of:

accessing the linked list of records,

identifying at least some of the automatically expired ones of the
records, and

removing at least some of the automatically expired records from the
linked list when the linked list is accessed.

Claim 4. The method according to claim 3 further including the step of
dynamically determining maximum number of expired ones of the records
to remove when the linked list is accessed.
-}


-- | Claim 3 can be reduced to the same formula as Claim 1. The use of
-- a sequence of steps cannot disguise the fact that this represents
-- a mathematical formula. Otherwise it would be possible to patent
-- any mathematical formula simply by reciting the steps in evaluating
-- it. e.g. "step 3: the addition of the results of step 1 and step 2"
search3 :: (Eq k) => Birth -> k -> Storage k a -> (Storage k a, [a])
search3 = search1

-- | Likewise Claim 4 can be reduced to the same formula as Claim 2.
search4 :: (Eq k) => Int -> Birth -> k -> Storage k a -> (Storage k a, [a])
search4 = search2

{-
Claim 5. An information storage and retrieval system, the system
comprising:

a hashing means to provide access to records stored in a memory of the
system and using an external chaining technique to store the records
with same hash address, at least some of the records automatically
expiring,

a record search means utilizing a search key to access a linked list
of records having the same hash address,

the record search means including means for identifying and removing
at least some expired ones of the records from the linked list of
records when the linked list is accessed, and

meals, utilizing the record search means, for inserting, retrieving,
and deleting records from the system and, at the same time, removing
at least some expired ones of the records in the accessed linked list
of records.
-}


-- | Every key has a hash code. Strings of characters may be used as
-- keys.
class (Eq k) => Hashable k where
hash :: k -> Int -- Every key has a hash code.

instance Hashable Char where hash = ord

instance (Hashable a) => Hashable [a] where
hash = foldr (\x h -> ((hash x + h) * 53 + 1) `mod` 1733) 0

type HashedStore k a = Map Int [Record k a]


-- | Access a hashed store with a function that returns the modified list of records.
hashAccess5 :: (Hashable k) =>
(Storage k a -> (Storage k a, [a])) -> Birth -> k -> HashedStore k a -> (HashedStore k a, [a])
hashAccess5 f t k store = (M.insert h retained store, result)
where
h = hash k
(retained, result) = case M.lookup h store of
Just records -> f $ filter (expired t) records
Nothing -> ([], [])


-- | Search using hashAccess.
search5 :: (Hashable k) => Birth -> k -> HashedStore k a -> (HashedStore k a, [a])
search5 t k = hashAccess5 srch t k
where srch store = (store, map value $ filter (matches k) store)

-- | Insert using hashAccess.
insert5 :: (Hashable k) => Birth
-- ^ Expiry threshold for old entries.
-> Birth
-- ^ Birth date for new entry.
-> k -> a -> HashedStore k a -> HashedStore k a
insert5 t b k v = fst . hashAccess5 (\store -> (Record b k v : store, [])) t k


-- | Delete using hashAccess
delete5 :: (Hashable k) => Birth -> k -> HashedStore k a -> HashedStore k a
delete5 t k = fst . hashAccess5 (\store -> (filter (not . deleted) store, [])) t k
where deleted (Record _ k1 _) = (k == k1)


{-
Claim 6. The information storage and retrieval system according to
claim 5 further including means for dynamically determining maximum
number for the record search means to remove in the accessed linked
list of records.
-}


-- | Access a hashed store with a function that returns the modified list of records. The "Int"
-- argument is the maxiumum number of expired records to remove.
hashAccess6 :: (Hashable k) =>
Int -> (Storage k a -> (Storage k a, [a])) -> Birth -> k -> HashedStore k a -> (HashedStore k a, [a])
hashAccess6 n f t k store = (M.insert h retained store, result)
where
h = hash k
(retained, result) = case M.lookup h store of
Just records -> f $ filterN n (expired t) records
Nothing -> ([], [])
filterN _ _ [] = []
filterN n1 p (x:xs) = if n1 <= 0 || p x then x : filterN n1 p xs else filterN (n1-1) p xs -- | Search using hashAccess. search6 :: (Hashable k) => Int -> Birth -> k -> HashedStore k a -> (HashedStore k a, [a])
search6 n t k = hashAccess6 n srch t k
where srch store = (store, map value $ filter (matches k) store)

-- | Insert using hashAccess.
insert6 :: (Hashable k) => Int
-> Birth
-- ^ Expiry threshold for old entries.
-> Birth
-- ^ Birth date for new entry.
-> k -> a -> HashedStore k a -> HashedStore k a
insert6 n t b k v = fst . hashAccess6 n (\store -> (Record b k v : store, [])) t k


-- | Delete using hashAccess
delete6 :: (Hashable k) => Int -> Birth -> k -> HashedStore k a -> HashedStore k a
delete6 n t k = fst . hashAccess6 n (\store -> (filter (not . deleted) store, [])) t k
where deleted (Record _ k1 _) = (k == k1)


{-
Claim 7. A method for storing and retrieving information records using
a hashing technique to provide access to the records and using an
external chaining technique to store the records with same hash
address, at least some of the records automatically expiring, the
method comprising the steps of:

accessing a linked list of records having same hash address,

identifying at least some of the automatically expired ones of the records,

removing at least some of the automatically expired records from the
linked list when the linked list is accessed, and

inserting, retrieving or deleting one of the records from the system
following the step of removing.

Claim 8. The method according to claim 7 further including the step of
dynamically determining maximum number of expired ones of the records
to remove when the linked list is accessed.
-}

-- | As with Claim 3 vs Claim 1, the formulae for Claim 7 are the same as for Claim 5
hashAccess7 :: (Hashable k) =>
(Storage k a -> (Storage k a, [a])) -> Birth -> k -> HashedStore k a -> (HashedStore k a, [a])
hashAccess7 = hashAccess5

search7 :: (Hashable k) => Birth -> k -> HashedStore k a -> (HashedStore k a, [a])
search7 = search5

insert7 :: (Hashable k) => Birth
-- ^ Expiry threshold for old entries.
-> Birth
-- ^ Birth date for new entry.
-> k -> a -> HashedStore k a -> HashedStore k a
insert7 = insert5

delete7 :: (Hashable k) => Birth -> k -> HashedStore k a -> HashedStore k a
delete7 = delete5


-- | And the formulae for Claim 8 are the same as for Claim 6
hashAccess8 :: (Hashable k) =>
Int -> (Storage k a -> (Storage k a, [a])) -> Birth -> k -> HashedStore k a -> (HashedStore k a, [a])
hashAccess8 = hashAccess6


search8 :: (Hashable k) => Int -> Birth -> k -> HashedStore k a -> (HashedStore k a, [a])
search8 = search6

insert8 :: (Hashable k) => Int
-> Birth
-- ^ Expiry threshold for old entries.
-> Birth
-- ^ Birth date for new entry.
-> k -> a -> HashedStore k a -> HashedStore k a
insert8 = insert6

delete8 :: (Hashable k) => Int -> Birth -> k -> HashedStore k a -> HashedStore k a
delete8 = delete6

Tuesday, September 23, 2008

Why the banks collapsed, and how a paper on Haskell programming can help stop it happening next time

Trading Risk

The financial system exists to trade three kinds of thing: money, commodities and risk. Money and commodities are the easy bit. Either I have £1,000 or I don't. Similarly with commodities: either I have a barrel of oil or I don't. But risk is a very different matter. In theory risk is easy to quantify: just multiply the probability of something by the cost, and you have the expected loss. But in practice its not so simple because the probability and cost may be difficult to quantify, especially for rare events (like, say, a global credit crunch). Many of the factors that go into a risk model are subjective, so honest people can have genuine disagreements about exactly what the risk is.

The Slippery Slope

Unfortunately risk assessment is not value-neutral. Risk has negative value: you have to pay people to take it off you. The higher the risk, the more you have to pay. And because the amount of risk is always debatable this is a very slippery slope; the people paying others to take the risk away have every incentive to present a lower estimate. Everyone can see that everyone else is doing the same, and so methods of hiding or downplaying risk migrate from dodgy dealing to open secret to standard practice.

Specific examples abound throughout the recent history of the finance industry;
  • The retail mortgage houses that originally lent to "sub-prime" clients would hire valuers who were known to be on the generous side with their valuations. So any valuer who wasn't so generous found their income drying up. Background checks on clients were cut back, then eliminated. Eventually borrowers were simply told what income to claim on the forms, regardless of what they actually earned.
  • These loans were then bundled up and sold. The idea was that the buyers would each get a share of the incoming loan repayments. Rights to this stream of money were divided into "tranches", the idea being that, for instance, Tranche 1 would get the first 34% of whatever money was due, Tranche 2 would get the next 33%, and Tranche 3 would get the last 33%. When some borrowers defaulted (as some always do), Tranche 3 would lose out first, then Tranche 2. Tranche 1 would only fail to get all their money if the overall repayment rate fell below 34%, which had never happened. The game here was to persuade a credit rating agency that Tranche 1 was so safe that it was worthy of a "Triple A" rating, because that meant that banks, insurance companies and similar big financial institutions could legally buy this debt without having to put cash aside to cover potential losses. The rating agencies earned fees for evaluating securities, so just like the house valuers they found it paid to be on the generous side.
  • All these institutions had risk management departments who were supposed to watch out for excessively risky behaviour. But in practice they found it very difficult to blow the whistle. Risk managers tell stories of being given two days to review a deal that took ten people a month to negotiate, and of accusations of "not being a team player" when they questioned over-optimistic claims. This story from the New York Times has some details. Look through the comments after the story as well; many of them are by people with their own tales of risk.
None of this is new; similar behaviour has contributed to past financial crises. In theory more regulation can prevent this, and everyone is now planning or demanding lots more regulation (even the banks). But in practice regulation has failed repeatedly because the regulators always concentrate on the way it went wrong last time. Regulations have to be written carefully, and the companies being regulated have to be given time to understand changes and put their compliance mechanisms in place. This prevents regulators from moving as fast as the institutions they regulate.

The regulators also don't have visibility of the information they need to assess systemic risk. Systemic risk arises because financial companies are exposed to each other; if one institution fails, others have to write off any money it owed them, possibly pushing them into bankruptcy as well. Regulators try to make companies insulate themselves by avoiding excessive risk and keeping some cash on hand, but without a clear picture of the risks being run by each company they have no way to tell if this is enough.

The basic problem, I believe, is the food chain of risk management within each institution. At the top are the negotiators and fund managers who design and package the securities. Then the lawyers are bought in to specify the precise terms of the deal. Somewhere along the way the "quants" will be asked to develop mathematical models, and at the bottom coders will be given the job of turning the models into executable code that will actually determine the real price and risks. It is this food chain that needs to be rethought, because its hiding important information.

This 2000 paper by Simon Peyton Jones, Jean-Marc Eber and Julian Seward shows a way forwards. It describes a Domain Specific Language embedded in Haskell for describing the rights and obligations imposed by a contract. Arbitrarily complicated contracts can be built up using a small collection of primitives. Aggregations of these contracts can also be created, as can risks of default and bankruptcy. This created quite a stir in the quantitative analysis world when it was presented, as it was the first time anyone had proposed a formal language for describing contracts. Today the list of commercial Haskell users includes a number of financial institutions using this kind of technique to model their market positions.

But on its own this is just a faster and more efficient way of getting the same wrong answer. It doesn't solve the underlying problem of concealed systemic risk. The solution has to be for big financial companies to reveal their positions to the regulators as formal models of the contracts they have written. At the moment they don't even have to reveal all their contracts, but merely knowing the legal terms of a contract is only the first step. Those terms have to be converted into a mathematical model. That model probably already exists, but only as an internal artefact of the parties to the contract. What ought to be happening is that the contract is specified in a well-defined mathematical language that can be converted into a model automatically. If the regulators have this information about all the contracts entered into by all the finance companies then they can model the impact of, say, a downturn in the housing market or a jump in the price of oil, and if they see systemic risk looming then they can order the companies involved to take corrective action. Unlike the various Risk Management departments they will be able to see the whole picture, and they don't have to worry about being "team players".

Friday, May 9, 2008

Is Functional Programming the new Python?


Back in 2004 Paul Graham wrote an essay on the Python Paradox:

if a company chooses to write its software in a comparatively esoteric language, they'll be able to hire better programmers, because they'll attract only those who cared enough to learn it. And for programmers the paradox is even more pronounced: the language to learn, if you want to get a good job, is a language that people don't learn merely to get a job.


Some tentative support for this theory comes from a study of programming languages done in 2000. The same task was given to over 80 programmers. The chart shows how long they took. Obviously the average for some languages was a lot less than for others, but the interesting thing for the Python Paradox is the variability. Java had huge variability: one developer took over 60 hours to complete the task. Meanwhile the Python developers were the most consistent, with the lowest variance as a percentage of the mean. I suspect (but can't prove) that this was because of the kind of programmers who wrote in Java and Python back in 2000. Java was the language of the Web start-up and the dot-com millionaire, but Python was an obscure open source scripting language. The Pythonistas in this study didn't learn it to get a job, but many of the Java programmers did.

But if this study was repeated today I bet the spread for Python would be a lot larger. Maybe still not as big as Java, but more like C++ or Perl. Because today you can get a good job writing Python. A quick check of jobs on dice.com found 1450 Python jobs against 7732 C++ jobs and 15640 jobs for Java. Python hasn't taken over the world, but the jobs are there.

So the smart employers and developers need something new to distinguish themselves from the crowd, and it looks like functional programming might be it. Programming Reddit carries lots of cool stuff about Haskell, and job adverts are starting to list a grab-bag of functional languages in the "would also be an advantage" list. For instance:
- Programming experience with more esoteric and powerful languages for data manipulation (Ruby, Python, Haskell, Lisp, Erlang)
So it looks like the with-it job-seekers and recruiters may be starting to use functional programming to identify each other, just as they used Python up to 2004.

Update: Oops. I just remembered this post which started me thinking along these lines.

Thursday, January 10, 2008

Why Haskell is Good for Embedded Domain Specific Languages

Domain Specific Languages (DSLs) are attracting some attention these days. They have always been around, of course: Emacs Lisp is a DSL, as are the various dialects of Visual Basic embedded in MS Office applications. And of course Unix hands know YACC (now Bison) and Lex (now Flex).

However creating a full-blown language is a lot of work: you have to write a parser, code generator / interpreter and possibly a debugger, not to mention all the routine stuff that every language needs like variables, control structures and arithmetic types. An embedded DSL (eDSL) is basically a short cut if you can't afford to do that. Instead you write the domain-specific bits as a library in some more general purpose "host" language. The uncharitable might say that "eDSL" is just another name for "library module", and its true there is no formal dividing line. But in a well designed eDSL anything you might say in domain terms can be directly translated into code, and a domain expert (i.e. a non-programmer) can read the code and understand what it means in domain terms. With a bit of practice they can even write some code in it.

This paper describes an eDSL for financial contracts built in Eiffel which worked exactly that way. It doesn't talk about "domain specific language" because the term hadn't been invented back then, but the software engineers defined classes for different types of contracts that the financial analysts could plug together to create pricing models. Its interesting to compare it with this paper about doing the same thing in Haskell.

But eDSLs have problems. The resulting programs are often hard to debug because a bug in the application logic has to be debugged at the level of the host language; the debugger exposes all the private data structures, making it hard for application programmers to connect what they see on the screen with the program logic. The structure of the host language also shows through, requiring application programmers to avoid using the eDSL functions with certain constructs in the host language.

This is where Haskell comes in. Haskell has three closely related advantages over other languages:
  1. Monads. The biggest way that a host language messes up an eDSL is by imposing a flow of control model. For example, a top-down parser library is effectively an eDSL for parsing. Such a library can be written in just about any language. But if you want to implement backtracking then its up to the application programmer to make sure that any side effects in the abandoned parse are undone, because most host languages do not have backtracking built in (and even Prolog doesn't undo "assert" or "retract" when it backtracks). But the Parsec library in Haskell limits side effects to a single user-defined state type, and can therefore guarantee to unwind all side effects. More generally, a monad defines a model for flow of control and the propagation of side effects from one step to the next. Because Haskell lets you define your own monad, this frees the eDSL developer from the model that all impure languages have built in. The ultimate expression of this power is the Continuation monad, which allows you to define any control structure you can imagine.
  2. Laziness. Haskell programmers can define infinite (or merely very large) data structures because at at any given point in the execution only the fragment being processed will actually be held in memory. This also frees up the eDSL developer from having to worry about the space required by the evaluation model. (update: this isn't actually true. As several people have pointed out, while laziness can turn O(n) space into O(1), it can also turn O(1) into O(n). So the developers do have to worry about memory, but lazy evaluation does give them more options for dealing with it.)
  3. The type system allows very sophisticated constraints to be placed on the use of eDSL components and their relationships with other parts of the language. The Parsec library mentioned above is a simple example. All the library functions return something of type "Parser foo", so an action from any other monad (like an IO action that prints something out) is prohibited by the type system. Hence when the parser backtracks it only has to unwind its internal state, and not the rest of the universe.
There are other programming languages that are good for writing eDSLs, of course. Lisp and Scheme have callCC and macros, which together can cover a lot of the same ground. Paul Graham's famous "Beating the Averages" paper talks about using lots of macros, and together with his patent for continuation-based web serving it is pretty clear that what he and Robert Morris actually created was an eDSL for web applications, hosted in Lisp.

But I still think that Haskell has the edge. I'm aware of the Holy War between static and dynamic type systems, but if I you put a Haskell eDSL in front of a domain expert then you only have to explain a compiler type mismatch message that points to the offending line. This is much easier to grasp than some strange behaviour at run time, especially if you have to explain how the evaluation model of your eDSL is mapped down to the host language. Non-programmers are not used to inferring dynamic behaviour from a static description, so anything that helps them out at compile time has to be a Good Thing. And its pretty useful for experienced coders too.

(Update: I should point out that monads can be done in any language with lambdas and closures, and this is pretty cool. But only in Haskell are they really a native idiom)

Saturday, September 1, 2007

Composability and Productivity

This was posted to my original blog on January 7th 2007. It was also discussed on Reddit, so please do not repost it.

----------------------------------------------------

My earlier post about increased productivity through functional programming stirred up a good deal of comment. A number of people replied that the libraries are more important than the language. More recently a similar point has been made by Karsten Wagner, who argues that code reuse is what makes languages productive.

I remember the early days of OO, when it was argued by many people (me amongst them) that OO languages would finally let us write reusable software. There was optimistic talk of “software factories”, and Brad Cox gave lots of talks about how software could now finally move from craft to engineering discipline, built on the back of libraries of reusable code. So the entire industry went for C++, but the easy creation of reusable software remained elusive. That is not to say you can’t ever produce reusable software in the language, but it is not significantly easier than in C.

Karsten accurately pins the reason why C++ failed to deliver: memory management. Back in those old days I was trying to explain to people why Eiffel would be so much more productive than C++, and garbage collection was a big part of it.

The reason why GC is so important lies in a more general principle called composability. Composability means that you can put two bits of code together and important correctness properties will be preserved automatically. This does not mean, of course, that the composition is automatically correct in a wider sense, but it does mean that you don’t introduce new bugs merely by sticking two things together.

Imagine two modules of code in a non-GC language like C or C++. Module X creates an object and hands a reference to that object over to Module Y. At some point in the future X will delete that object. However it is only safe to do so once Y has finished with it. So if X and Y were written independently then it is quite possible that Y will hang on to the reference longer than X expects. In short, manual memory management is not composable, because the composition of X and Y can introduce stale pointer bugs that were not present in either X or Y.

In a language with GC this is a non-problem: the collector will reap the object once it sees that both X and Y have finished with it. X can therefore forget about the object in its own time without having to know anything about Y. But programmers in non-GC lanuages have to resort to a number of workarounds. Either objects have to be copied unecessarily (which leads to stale data bugs instead of stale pointer bugs), or else some kind of reference counting or similar scheme must be employed. Reference counting is of course merely an ad hoc, informally-specified, bug-ridden, slow implementation of GC, and therefore stands as a classic example of Greenspun’s Tenth Rule.

But programming languages contain other examples of non-composable constructs. The current biggest offender is shared memory concurrency using locking. If X takes locks Foo and Bar, and Y takes locks Bar and Foo (in those orders), then sooner or later they are going to deadlock with X holding Foo and Y holding Bar. Ironically Java has the biggest problems here.

Surprisingly, program state generally is a source of non-composability. Mutable state is actually another form of manual memory management: every time you over-write a value you are making a decision that the old value is now garbage, regardless of what other part of the program might have been using it. So suppose that X stores some data in Y, and then Z also stores some other data in Y, overwriting what X did. If X assumes that its old data is still there then it is going to be in trouble. Either X needs to defensively check for new data, or Y needs to tell X about the change (the Observer pattern). Whichever way, the addition of Z to the system can introduce bugs that stem from the composition of modules rather than the modules themselves.

Another example is the Command pattern, which includes a method to “undo” a previous command. Why undo it? Because the global state has been changed. Get rid of the concept of a single unique state for the entire application and you get rid of the problem.

On a more prosaic level, many library modules have some internal state, and so require an initialisation call to “set things up”. Who exactly is responsible for this initialisation? If you compose two modules together then they may both imagine themselves soley responsible for making this initialisation call, and hence it will be done twice.

It is a truism of programming that “integration” is the most difficult and risky (from a scheduling point of view) part of developing a large system. Integration basically means composing together all the units to try to get a working system. And the biggest integration problems are due to the non-composability of the units. A defect within a unit is fairly easy to identify, but composition bugs are down to the interaction of two separate units in some unforseen way, which makes it correspondingly more difficult to pin them down.

So what would a language look like if it got rid of all of these non-composable constructs? The answer, basically, is Haskell. The underlying goal of functional language designers is to make everything completely composable. Haskell is the language that has advanced furthest towards this goal.

Erlang is pretty good as well. It doesn’t eliminate state, but it does keep it confined to individual processes: there are no shared memory locks. Processes have to communicate according to shared protocols, but this is a manageable dependency on a common abstraction. Even within a process the shared state can be reduced by careful use of functional programming rather than imperative.

Sunday, August 19, 2007

Anatomy of a new monad

The monad described here was originally written in response to a posting by Chad Scherrer on Haskell-Cafe:

I need to build a function
buildSample :: [A] -> State StdGen [(A,B,C)]

given lookup functions
f :: A -> [B]
g :: A -> [C]

The idea is to first draw randomly form the [A], then apply each
lookup function and draw randomly from the result of each.


I suggested Chad try using ListT for the non-determinism, but this didn't work, so I decided to try solving it myself. I discovered that actually ListT doesn't work here: you need a completely custom monad. So here it is. Hopefully this will help other people who find they need a custom monad. This isn't meant to be yet another ab-initio monad tutorial: try reading the Wikibook or All About Monads if you need one.

The basic idea of the MonteCarlo monad is that each action in the monad returns a list of possible results, just like the list monad. However the list monad then takes all of these possible results forwards into the next step, potentially leading to combinatorial explosion if you can't prune the tree somehow. It was this explosion that was giving Chad problems when he scaled up his original solution. So instead the MonteCarlo monad picks one of the results at random and goes forwards with that.

Picking a random element means we need to thread a random number generator StdGen as a form of monadic state. So the monad type looks like this:
newtype MonteCarlo a = MonteCarlo {runMC :: StdGen -> (StdGen, [a])}
This is based on the state monad, except that the type parameter "s" is replaced by StdGen. I could have used the state monad here, but it would have meant working its "bind" and "return" into MonteCarlo, which would have been more trouble than it was worth.

Picking a random item from a list is going to be necessary, and as seen below it is actually needed more than once. So:

-- Internal function to pick a random element from a list
pickOne :: [a] -> StdGen -> (StdGen, a)
pickOne xs g1 = let (n, g2) = randomR (0, length xs - 1) g1 in (g2, xs !! n)
Monad instance
Now for the Monad instance declaration. In this we have to declare the bind function (>>=) and the return function.
instance Monad MonteCarlo where
MonteCarlo m >>= f = MonteCarlo $ \g1 ->
let
(g2, xs) = m g1
(g3, x) = pickOne xs g2
(g4, f') = case xs of
[] -> (g2, mzero)
[x1] -> (g2, f x1)
_ -> (g3, f x)
in runMC f' g4

return x = MonteCarlo $ \g -> (g, [x])
The return function shows the minimal structure for a Monte-Carlo action: wrapped up in the MonteCarlo type is a function that takes a StdGen state and returns the state (in this case unmodified) along with the potential results. If the action had used the generator then the result pair would have had the new generator state instead of the old one.

The bind (>>=) is a bit more complicated. The type for monadic bind is:
(>>=) :: m a -> (a -> m b) -> m b
In the MonteCarlo instance of Monad this becomes:
(>>=) :: MonteCarlo a -> (a -> MonteCarlo b) -> MonteCarlo b

The job of bind is to take its two arguments, both of which are functions under the covers, and compose them into a new function. Then it wraps that function up in the MonteCarlo type.

We unwrap the first argument using pattern matching to get the function inside "m". The second parameter "f" is a function that takes the result of the first and turns it into a new action.

The result of the bind operation has to be wrapped up in MonteCarlo. This is done by the line
MonteCarlo $ \g1 -> 
The "g1" lambda parameter is the input generator state for this action when it is run, and the rest of the definition is the way we compose up the functions. We need to call the first function "m" to get a result and a new generator state "g2", and then feed the result into the second argument "f".

The "let" expression threads the random generator manually through the first two random things we need to do:
  1. Get the possible results of the first argument as a list.
  2. Pick one of these at random.
However there is a twist because the result set could be empty. This is interpreted as failure, and the "MonadPlus" instance below will explain more about failing. In addition the result set could also be a single item, in which case there is no need to generate a random number to pick it. So the f-prime value is the result of applying "f" to whatever item was picked, or an empty list if the "m" action returned an empty list. We also avoid getting a new generator state if the random pick was not needed.

Finally we use "runMC" to strip the MonteCarlo type wrapper off the second result, because the whole thing is being wrapped by the MonteCarlo type constructor at the top level of bind.

And there you have it. In summary, when defining an instance of "bind" you have to construct a new function out of the two arguments by extracting the result of the first argument and passing it to the second. This then has to be wrapped up as whatever kind of function is sitting inside your monadic action type. The arguments to this hidden function are the monadic "state" and therefore have to be threaded through bind in whatever way suits the semantics of your monad.

MonadPlus instance
MonadPlus is used for monads that can fail or be added together in some way. If your underlying type is a Monoid then its a good bet that you can come up with a MonadPlus instance from the "mempty" and "mappend" functions of the Monoid. In this case the underlying type is a list, so the "mempty" is an empty list and the "mappend" is concatenation.
instance MonadPlus MonteCarlo where
mzero = MonteCarlo $ \g -> (g, [])
mplus (MonteCarlo m1) (MonteCarlo m2) = MonteCarlo $ \g ->
let
(g1, xs1) = m1 g
(g2, xs2) = m2 g1
in (g2, xs1 ++ xs2)

Here the "mzero" returns an empty list. The bind operation interprets an empty list as failure, and so indicates this to its caller by returning an empty list. Thus a single failure aborts the entire computation.

"mplus" meanwhile threads the random number state "g" through the two arguments, and then returns the results concatenated. The bind operation will then pick one of these results, so "mplus" has become a kind of "or" operation. However note that the alternation is done over the results, not the original arguments. If you say "a `mplus` b" and "a" returns ten times as many results as "b" then the odds are 10:1 that you will be getting one of the results of "a".

One important combinator introduces a list of items to pick one from.
-- | Convert a list of items into a Monte-Carlo action.
returnList :: [a] -> MonteCarlo a
returnList xs = MonteCarlo $ \g -> (g, xs)
Finally we need to be able to run a MonteCarlo computation. You could do this using runMC, but that gives you the full list of results for whatever the last action was, which is not what you want. Instead you want to pick one of them at random. So there is a runMonteCarlo function that looks an awful lot like the bind operation, and for similar reasons. This returns a Maybe type with "Just" for success and "Nothing" for failure.
-- | Run a Monte-Carlo simulation to generate a zero or one results.
runMonteCarlo :: MonteCarlo a -> StdGen -> Maybe a
runMonteCarlo (MonteCarlo m) g1 =
let
(g2, xs) = m g1
(_, x) = pickOne xs g2
in case xs of
[] -> Nothing
[x1] -> Just x1
_ -> Just x
Exercises for the reader:
  1. Add a new combinator allowing access to the random number generator.
  2. Each computation returns only a single result, but Monte Carlo methods depend on statistical analysis of many results. How should this be done?
  3. Can we introduce a "try" combinator to limit the scope of failure?

Sunday, August 12, 2007

Responses to 'Silver Bullets Incoming!'

(((
These are the responses originally posted to "Silver Bullets Incoming". Many of these responses are worth reading in their own right, and I'd like to thank the respondants for taking the time for such thoughtful posts.

Please do not post this to reddit, as it has already been discussed there under the original URL.
)))
  1. Stephen Says:

    Paul:

    I enjoyed your first article quite a bit - it got me thinking about technical language issues again (always fun).

    I’d like to comment on your update to the original article. Specifically, I have some comments regarding C++

    C++ is not an “old” language, incorporating many language features of more “modern” languages, including exceptions, automatic memory management (via garbage collection libraries and RIIA techniques), and templates, a language feature that is only available in C++, and that provides support for generic programming and template metaprogramming, two relatively new programming paradigms. Yes, C++ has been around a while, but until I see programmers constantly exhausting the design and implementation possibilities of C++, I won’t call the language “old.”

    C++ was not designed to support just OO programming: From “Why C++ Isn’t Just An Object-Oriented Programming Language” (http://www.research.att.com/~bs/oopsla.pdf):

    “If I must apply a descriptive label, I use the phrase ‘multiparadigm language’ to describe C++.”

    Stroustrup identifies functional, object-oriented, and generic programming as the three paradigms supported by C++, and I would also include metaprogramming (via C++ templates or Lisp macros) as another paradigm, though it is not often used by most developers.

    Of course, we should also keep in mind Stroustrup’s statement regarding language comparisons (”The Design and Evolution of C++”, Bjarne Stroustrup, 1994, p.5): “Language comparisons are rarely meaningful and even less often fair.”

    Take care, and have a good weekend!

    Stephen

  2. pongba Says:

    I found it so weird that, on the one hand you argue that haskell is fast( to the extend that it might be even faster than some compiling language such as C++), while on the other hand you said “where correctness matters more than execution speed its fine today”.
    Does that sound paradoxical?

  3. Another Paul Says:

    Paul:

    “I think that Dijkstra had it right: a line of code is a cost, not an asset. It costs money to write, and then it costs money to maintain. The more lines you have, the more overhead you have when you come to maintain or extend the application”

    By that measure, there’s no such thing as an asset. Think about that a moment - someone buys a general ledger or CAD/CAM system and modifies it as companies do. Either system reduces staff, provides more accurate information much more quickly, and renders the company more competitive. Take it away and what happens?

    It’s been my experience that while these systems require maintenance (and sometimes a lot) they usually result in a net reduction in staff and the cost of doing business. And some types of systems provide a clear competitive edge as well. I think that makes many systems just as much an asset as a house, factory building, or a lathe.

    Interesting article. Thanks.

    Another Paul

  4. BobP Says:

    >> An order of magnitude is a factor of 10, no less

    > Well, the Wikipedia entry does say about 10. All this stuff is so approximate that anything consistently in excess of 5 is close enough.

    0.5 orders of magnitude = power(10.0,0.5) = sqrt(10.0) = 3.1623 (approx)
    1.5 orders of magnitude = power(10.0,1.5) = sqrt(1000.0) = 31.623 (approx)

    If we are rounding off, a factor of 4 is about one order of magnitude; also, a factor of 30 is about one order of magnitude.

  5. Jeremy Bowers Says:

    You missed my point with Python, or at least failed to address it.

    My point wasn’t that Python is also good. My point was that you lept from “10x improvement” to “it must the chewy functional goodness!” But your logic falls down in the face of the fact that Python, Perl, Ruby, and a number of non-functional languages that also have a 10x improvement over C++, therefore it clearly is not a sound argument to just leap to the conclusion that “it must be the chewy functional goodness!” when there are clearly other factors in play.

    I’m not criticizing Haskell or functional programming, I’m criticizing your logic, and you’ve still got nothing to back it up.

    (This is par for the course for a claim of a silver bullet, though.)

  6. Sam Carter Says:

    “Libraries and languages are complicit: they affect each other in important ways. In the long run the language that makes libraries easier to write will accumulate more of them, and hence become more powerful.”

    This argument has a large flaw in it, namely the current state of libraries doesn’t reflect this claim. The largest and most powerful collection of libraries seem to be .NET, CPAN, and the Java libs, certainly not Lisp libraries.

    But the advocates of Lisp would argue that it’s the most powerful language, and it’s clearly been around for a long time, yet the Lisp community has not accumulated the most powerful collection of libraries. So unless the next 40 years are going to be different from the previous 40 years, you can’t really assert that language power is going to automatically lead to a rich set of libraries.

    I stand by my original comment to the previous article that programming is more about APIs and libraries than about writing their own code, and that if you are focused on measuring code-writing performance, you are just measuring the wrong thing.

    I also disagree with the claim that this is unmeasurable because doing a real-world test is too expensive. As long as the project is solvable in a few programmer-weeks, you can test it out with different languages. I took a computer science class (Comp314 at Rice) where we were expected to write a web browser in 2 weeks. It wouldn’t be that hard to have a programming test which incorporated a database, a web or GUI front end, and some kind of client/server architecture, e.g. implementing a small version of Nagios, or an IM client, or some other toy application.

    I’m sorry but writing a command line application that parses a CSV file and does some fancy algorithm to simulate monkeys writing Shakespeare is about as relevant to modern software engineering as voodoo is to modern medicine.

  7. Paul Johnson Says:

    pongba:

    I’m arguing that Haskell programs are faster to *write*. Execution speed is a much more complicated issue. FP tends to lose in simple benchmarks, but big systems seem to do better in higher level languages because the higher abstraction allows more optimisation.

  8. Paul Johnson Says:

    Another Paul:

    The functionality that a package provides is an asset, but the production and maintenance of each line in that package is a cost. If you can provide the same asset with fewer lines of code then you have reduced your liabilities.

    Paul.

  9. Paul Johnson Says:

    Jeremy Bowers:

    Teasing apart what it is about Haskell and Erlang that gives them such a low line count is tricky, because it is more than the sum of its parts. One part of it is the high level data manipulation and garbage collection that Python shares with functional languages. Another part of it is the chewy functional gooodness. Another part, for Haskell at least, is the purity. OTOH for Erlang it is the clean and simple semantics for concurrency.

    What I see in the results from the Prechelt paper is that Python was, on average, about 3 times better than C++ while the average Haskell program (from a sample of 2) was about 4 times better. Actually the longer Haskell program was mine, and I was really embarassed when someone else showed me how much simpler it could have been.

    In terms of pure line count I have to conceed that Python and Haskell don’t have a lot to choose between them. A 25% improvement isn’t that much. Its a pity we can’t do a controlled test on a larger problem: I think that Haskell’s type system and monads are major contributors to code that is both reliable and short. Unfortunately I can’t prove it, any more than I could prove that garbage collection was a win back in the days when I was advocating Eiffel over C++.

    Paul.

  10. Paul Prescod Says:

    If you cannot “tease apart” what it is about Haskell and Erlang that makes them so productive then you cannot say that any one improvement is a silver bullet. It just feels truthy to you. Furthermore, if you are presented with counter-examples in the form of Python and Ruby then surely you must discard your thesis entirely. The best you can say is that there exist non-functional languages that are 10 times less productive than some functional languages for some projects.

    Duh.

  11. Paul Johnson Says:

    Sam Carter:

    On languages with expressive power gathering libraries; point mostly conceeded, although Perl certainly is a very expressive language, so I don’t think it supports your point, and .NET has Microsoft paying its mongolian hordes, so its not a fair comparison.

    There are two sorts of libraries: general purpose ones (e.g. data structures, string manipulation, file management) that get used in many applications, and vertical libraries (HTTP protocol, HTML parsing, SMTP protocol) that are only useful in specific applications. There is no hard dividing line of course, but the usefulness of a language for general purpose programming depends on the language and its general purpose libraries. The vertical libraries have a big impact for applications that use them, but not elsewhere. So I would generally judge a language along with the general purpose libraries that are shipped with it. The special purpose libraries are useful as well, but its a secondary consideration.

    Paul.

  12. Paul Johnson Says:

    Sam Carter (again):

    Sorry, just looked back at your post and realised I’d forgotten the second point.

    A worthwhile test is going to take about 10 versions to average out the impact of different developers. So thats 2 weeks times 10 coders is 20 developer-weeks, or almost half a man-year. Say a coder is on $30K per year and total cost of employment is three times that (which is typical). Round numbers $40-50 per language. Ten languages will cost the best part of half a million dollars to evaluate. Not small beer.

    Of course you could use students, but on average they will know Java or Javascript better than Python or Haskell, so how do you correct for that?

    Paul.

  13. pongba Says:

    I’m arguing that Haskell programs are faster to *write*. Execution speed is a much more complicated issue. FP tends to lose in simple benchmarks, but big systems seem to do better in higher level languages because the higher abstraction allows more optimisation.

    I always hear people saying that, but I really don’t get it.
    I know that *theoretically* abstraction( or non-side-effect, etc) gives more opportunity for optimization, but I have never seen people show some real data that can *really* prove it.
    One question constantly annoys me - If higher-level of abstraction allows more optimization, then why .NET put the burden of discriminating value-types and reference-types on us programmers. Shouldn’t the referential-transparency-ness be better at this?

  14. Jonathon Duerig Says:

    I have two specific (and one general) criticisms to make about your line of argumentation:

    First, I think you do not adequately address the criticisms about lines of code as a metric. The cost of a line of code is the sum of five factors: (a) Difficulty of formulating the operation involved (original coder*1), (b) Difficulty of translating that operation into the target programming language (original coder*1), � Difficulty of parsing the code involved to understand what the line does (maintainer*n), (d) Difficulty of later understanding the purpose of that operation (maintainer*n), and (e) Difficulty of modifying that line while keeping it consistent with the rest of the program (maintainer*n).

    (a) and (b) are done only once, but �, (d), and (e) are done many times whenever the program needs to be fixed or modified. Brooks’ argument was specifically that in the general case the time for (a) is more than 1/9 the time for (b), and the time for (d) is more than 1/9 the time for � and (e). This is important because (a) and (d) are both language and tool independent.

    When comparing the lines of code from different languages, it is important to realize that the formulation of the operations and the understanding of purpose are spread across those lines. And the verbosity of the language usually doesn’t impede either of these problems (unless it is extreme).

    For instance, take the creation of an iterator or enumeration in C++ or Java respectively and compare that to creating a fold function in Scheme. These are roughly equivalent tasks. In C++, an iterator is defined first by defining a class with various access operators like * and -> and ++ and — and then implementing them. This adds a lot of baggage because there are half a dozen or so functions that must be defined and there is a separate class specification. In constrast, a scheme fold function is much simpler from the language perspective. A single function is defined rather than half a dozen. It will almost certainly have fewer lines, possibly by 4 or 5 times.

    But let us look at what the creation of the iterator or fold function means from the perspective of items (a) and (d). Both of these are common idioms in their respective languages, so all of the code specifically dealing with iteration/folding is trivial to conceptualize and trivial to understand the purpose of. The difficulty in writing either a custom iterator or a custom fold function lies within the subtleties of the iteration. If it is a tree, what information needs to be maintained and copied to successive iterations (whether that be in the form of state, or in the form of argument passing)? Are there multiple kinds of iterations? How would they be supported? (For example, sometimes a user wants to traverse a tree in pre-order, sometimes in post-order, sometimes in-order, and sometimes level by level in a breadth-first order.) These are the questions which the original coder and the later maintainers will have to contend with. And these are really orthogonal to lines of code counts.

    But there is another factor at work here which makes lines of code a faulty cross-language measurement. Every language has a grain to it. If you program with the grain, then any difficulty will be easily solved by the tools in the language. If you program against the grain, then you will run into difficulty after difficulty. This applies to fundamental language properties. You can bypass the type system in C++ and avoid all the type checks, but it is cumbersome and unpredictable if you do it wrong. Ruby allows you to be much more flexible with types and provides a safety net. If you try to enforce a more strict typing in Ruby, then you will have to fight the language every step.

    But the grain of the language also includes the question of scale. Some languages provide a lot of flexibility. They allow compact and loose representations of programs which can be customized to the problem domain easily. These languages include Scheme and Ruby and Haskell. These flexible languages are very useful for small projects with one or a few developers because they can be metaphorically molded to fit the hand of the person who wields them. But there is a trade-off because they tend to be more difficult to use in large groups because it is harder for others to undestand what it going on. This is a fundamental trade-off that programming languages must make. And it means that a language which is great at one end of the spectrum will likely be lousy at the other end. And this is reflected in the lines of code required for a particular scale of problem.

    My second criticism is in regard to your discussion of state. You point out that Brooks considered managing of state to be a major essential difficulty of programming and you then claim that functional languages obviate this difficulty and hypothesize this as the reason that they can be a silver bullet.

    I believe that you have misunderstood the kind of state the Brooks was referring to. He was not talking about run-time state but compile-time state. He was not talking about what variables are changed at run-time. He was talking about the interactions between components of the program. These interactions are still there and just as complex in functional languages as in imperative languages.

    Second, even when considering just the run-time state, the referential transparency of functional languages simplifies only the theoretical analysis of a program. As far as a normal programmer who is informally reasoning about what a program does, the programmer must consider how state is transformed in the same way whether or not a modified copy is made or a destructive write is made. This is the same kind of reasoning.

    Finally, I have seen many people talk about getting an order of magnitude improvement by finding some incredible programming tool. Functional programming is not unique in that respect. But in my experience this is more likely to be about finding a methodology that suits the persons mindset than about finding the one true language or system. Somebody who thinks about programming in terms of a conceptual universe that changes over time will be an order of magnitude less effective in a functional environment. And somebody who thinks about programming in terms of a conceptual description of the result which is broken up into first class functions will be an order of magnitude less effective in an imperative environment.

    I have programmed in both imperative and functional languages. I know and understand the functional idioms and have used them. My mindset tends to the empirical. I am a less effective programmer in such languages. But I have seen programmers who can pull a metaphorical rabbit out of a hat while tapdancing in them. This says to me that evangelism about functional languages or empirical languages is fundamentally misguided regardless of the side.

  15. Paul Johnson Says:

    Jonathon Duerig:

    I had decided not to respond to any further comments and instead get on with my next article. But yours is long and carefully argued, so it merits a response regardless. Its also nice to be arguing the point with someone who knows what a fold is.

    You make the point that during maintenance the difficulty of later understanding the purpose of an operation is language independent. I’m not so sure. A maintainer may suspect that a C++ iterator is truly orthogonal, but it can’t be taken for granted. There may be a bug hiding in those methods, or perhaps someone fixed a bug or worked around a problem by tweaking the semantics in an irregular way. Also a lot of the understanding of a piece of code comes from context, and it helps a lot to be able to see all the context at once (why else would 3 big monitors be a selling point for coding jobs?). So terse code makes it a lot easier to deduce context because you can see more at once.

    (Aside: I remember in my final year project at Uni going into the lab at night because then I could get two VT100s to myself).

    You say that Scheme, Ruby and Haskell can be moulded to fit the hand of the user, making them more productive for single person tasks, but less productive for groups because of mutual comprehension difficulties.

    This is hard to test because of the lack of statistics, but Haskell is strongly typed and the community has already developed conventions and tools for documentation and testing (Haddock and QuickCheck). I can see that Scheme macros can be used to construct an ideosyncratic personal language, but I really don’t see how this could happen in Haskell. Things that get done with macros in Scheme are usually done with monads in Haskell, but whereas Scheme macros are procedural monads are declaritive and must conform to mathematical laws, making them tractable. My experience with Haskell monads is that you generally build a monadic sub-language in a single module and provide libraries for it in some other modules (e.g. Parsec), and that the end result is intuitive and simple to use. But maybe I’ve only been exposed to well-designed monads.

    On the subject of state and informal reasoning: personally I use whatever reasoning forms that will work. In debugging a particularly complex monad I once resorted to writing out the algebraic substitutions long-hand in order to understand how the bind and return operators were interacting. It worked, and I got the monad to do what I wanted. I routinely use informal algebraic reasoning of this sort in simpler cases in order to understand what my program is doing. Any informal reasoning must be a hasty short-hand version of what a full formal proof would do, and it follows that language features that make full formal proof easier will make the informal short-hand mental reasoning easier too.

    Pure functions are particularly valuable when trying to understand a large program because you don’t have to worry about the context and history of the system for each call; you just look at what the function does to its arguments. In a real sense this is as big a step forwards as garbage collection, and for the same reason: any time you overwrite a value you are effectively declaring the old value to be garbage. Functional programs (at least notionally) never require you to make this decision, leaving it up to the GC and compiler to figure it out for you based on the global system context. Thus complex design patterns like Memento and Command are rendered trivial or completely obsolete.

    Finally you talk about the many over-hyped technologies in this industry. Yes, hype is a common problem. Those of you who think you have a silver bullet are very annoying for those of us who actually do. :-)

    Paul.

  16. Paul Johnson Says:

    Jonathon Duerig:

    I had decided not to respond to any further comments and instead get on with my next article. But yours is long and carefully argued, so it merits a response regardless. Its also nice to be arguing the point with someone who knows what a fold is.

    You make the point that during maintenance the difficulty of later understanding the purpose of an operation is language independent. I’m not so sure. A maintainer may suspect that a C++ iterator is truly orthogonal, but it can’t be taken for granted. There may be a bug hiding in those methods, or perhaps someone fixed a bug or worked around a problem by tweaking the semantics in an irregular way. Also a lot of the understanding of a piece of code comes from context, and it helps a lot to be able to see all the context at once (why else would 3 big monitors be a selling point for coding jobs?). So terse code makes it a lot easier to deduce context because you can see more at once.

    (Aside: I remember in my final year project at Uni going into the lab at night because then I could get two VT100s to myself).

    You say that Scheme, Ruby and Haskell can be moulded to fit the hand of the user, making them more productive for single person tasks, but less productive for groups because of mutual comprehension difficulties.

    This is hard to test because of the lack of statistics, but Haskell is strongly typed and the community has already developed conventions and tools for documentation and testing (Haddock and QuickCheck). I can see that Scheme macros can be used to construct an ideosyncratic personal language, but I really don’t see how this could happen in Haskell. Things that get done with macros in Scheme are usually done with monads in Haskell, but whereas Scheme macros are procedural monads are declaritive and must conform to mathematical laws, making them tractable. My experience with Haskell monads is that you generally build a monadic sub-language in a single module and provide libraries for it in some other modules (e.g. Parsec), and that the end result is intuitive and simple to use. But maybe I’ve only been exposed to well-designed monads.

    On the subject of state and informal reasoning: personally I use whatever reasoning forms that will work. In debugging a particularly complex monad I once resorted to writing out the algebraic substitutions long-hand in order to understand how the bind and return operators were interacting. It worked, and I got the monad to do what I wanted. I routinely use informal algebraic reasoning of this sort in simpler cases in order to understand what my program is doing. Any informal reasoning must be a hasty short-hand version of what a full formal proof would do, and it follows that language features that make full formal proof easier will make the informal short-hand mental reasoning easier too.

    Pure functions are particularly valuable when trying to understand a large program because you don’t have to worry about the context and history of the system for each call; you just look at what the function does to its arguments. In a real sense this is as big a step forwards as garbage collection, and for the same reason: any time you overwrite a value you are effectively declaring the old value to be garbage. Functional programs (at least notionally) never require you to make this decision, leaving it up to the GC and compiler to figure it out for you based on the global system context. Thus complex design patterns like Memento and Command are rendered trivial or completely obsolete.

    Finally you talk about the many over-hyped technologies in this industry. Yes, hype is a common problem. Those of you who think you have a silver bullet are very annoying for those of us who actually do. :-)

    Paul.

  17. Toby Says:

    Since I happened to stumble upon an actual Dijsktra cite just now, I thought I’d add it here (having read and appreciated your original post a few days ago).

    In EWD513, “Trip Report E.W. Dijsktra, Newcastle, 8-12 September 1975,” he writes,

    “The willingness to accept what is known to be wrong as if it were right was displayed very explicitly by NN4, who, as said, seems to have made up his mind many years ago. Like so many others, he expressed programmer productivity in terms of ‘number of lines of code produced’. During the discussion I pointed out that a programmer should produce solutions, and that, therefore, we should not talk about the number of lines of code produced, but the number of lines used, and that this number ought to be booked on the other side of the ledger. His answer was ‘Well, I know that it is inadequate, but it is the only thing we can measure.’. As if this undeniable fact also determines the side of the ledger….”

    That is the edited version as printed in “Selected Writings on Computing: A Personal Perspective”. The original text can be found in the EWD Archive, at http://www.cs.utexas.edu/users/EWD/transcriptions/EWD05xx/EWD513.html

Original responses to '"No Silver Bullet" and Functional Programming'

(((
These are the comments originally appended to the article. Many of them are thoughtful and worth reading in their own right.

As with the article, please do not submit this to reddit.
)))

  1. Chris Morris Says:

    ” experts in several different languages were given a simplified spec taken from a real problem and asked to implement it.”

    So, was the use of Haskell an integral part of simplifying the original spec given to the experts in the first place? To me, the essential difficulties of programming are in figuring out what the hell to really build in the first place. None of the programming languages in that study helped them do that. They skipped the bulk of the essential difficulties and went right into studying the effects on the accidental portion.

  2. Craig Says:

    I really enjoyed that, looking forward to your next blog.

  3. Functional Lover Says:

    You know why? Because all the colleges have been brainwashed by Java. Damned be they.

  4. Paul Johnson Says:

    There has been some discussion of this at http://discuss.joelonsoftware.com/default.asp?joel.3.424533.12

    Paul.

  5. Jonathan Allen Says:

    From the study referenced:

    > 1. The NSWC experiment was conducted in a very short time-frame with very little direct funding. Thus many corners had to be cut, including significant simplification of the problem itself; the largest of the prototypes was only 1200 lines of code.

    > 2. The geo-server specification was by nature ambiguous and imprecise, thus leading to some variation in the functionalities of the developed prototypes. (On the other hand, the specification is probably typical of that found in practice, especially during requirements acquisition.)

    > 3. The participants were trusted entirely to report their own development metrics. In addition, not all of the participants attended the first meeting at NSWC; those who did attend were advantaged.

    > 4. No guidelines were given as to what exactly should be reported. For example, development times usually included documentation time, but there were major variations in the code-size/documentation-size ratios. In addition, different methods were used to count lines of code for each language, and it is not clear how one line of code for one language relates to that of another.

    > 5. The review panel had very little time to conduct a really thorough review. For example, none of the code was actually run by the panel; they relied on the written reports and oral presentations.

    The problem was grossly simplified, the code was rushed, the line count and time spent numbers questionable, and the final application never run.

    This is your proof?

    We don’t even know what those 85 lines of Haskel actually work with good inputs, let alone if they correctly responded to bad data.

  6. Mike Griffiths Says:

    I think you re missing the point with regard to the whole process of software development. Even orders of magnitude improvements in the process of writing code have no impact upon the problem of writing the right code - defining right as “fit for the required purpose” once the problem domain exceeds a remarkably small limit.

  7. v Says:

    While the arguments you present in the bulk of your post are convincing, the hard data presented hardly seems so. How does low LOC = high productivity except in the wierd world where CMM makes sense?

    What gets put down as code is but the distill of all the knowledge stored in a programmer’s head, and functional languages do demand a higher overhead of that storage space than imperative ones (more for historical reasons, i admit, but there it is).

    Moreover, code building tools reduce the effort to actually input those high LOCs with automation that is improving by the day.

    Can you present a better quantitative metric than LOC counts for functional languages being an order of magnitude better?

  8. thomas lackner Says:

    You should link to information about the STM implementation in Haskell.. sounds very interesting.

    I refuse to comment on the constant debate about programmer efficency, as I’m sure the same program would be 72 characters in K/Q!

  9. hxa7241 Says:

    I have recently translated a minimal global illumination renderer from C++ to OCaml. It is about half the size — similar to the Ruby translation. The difference is in having class interfaces defined separately, and other small things.

    The implication that functional is so much more compact and simple than imperative is wrong. Look at the granularity of abstractions: they are practically the same. If you still code with the same sized building blocks it will demand similar effort.

    Also, well-made imperative code has many restrictions on state interaction. It is not so far away from functional code.

    The only forseeable way to greatly improved productivity is with reuse: more, better libraries/frameworks and ways of using them. (As Brooks says.)

  10. Achilleas Margaritis Says:

    As always, benchmarks are biased.

    Getopt is a parser.

    You can write a command line parser in a few lines of code in C++ using a library like boost::Spirit. For example (code not tested, just an illustration):

    rule letter = range(’a', ‘z’) | range(’A', ‘Z’);
    rule digit = range(’0′, ‘9′);
    rule alphanumeric = digit | letter;
    rule id = *letter << *alphanumeric;
    rule num = +digit;
    rule cmdLine = *('-' << (id | num));

    There are many tasks in which functional programs are shorter and more concise, but these advantages are not due to the functional nature of a program, but due to design choices (better syntax, no semicolons, lambda functions etc). These advantages can happily exist in imperative languages too. The only difference is referential transparency, but, for me, it gets in the way.

    Can we please see a MVC application with Haskell where the model is cleanly separated from the view?

  11. Achilleas Margaritis Says:

    As always, results are biased.

    The command line parser could be written in C++ with a Spirit-like framework like this:

    rule id = letter << *alphanumeric;
    rule num = +digit;
    rule cmdline = *('-' << (id | num));

    The advantages of FP come from concepts that can also be used in imperative languages: lambda functions and closures (Smalltalk/Ruby), simplified syntax (Ruby), etc. On the other hand, in pure FPs many things are unnecessarily complex.

  12. anonymous Says:

    Your argument about the lack of states in functional programs is fundamentally flawed. FP-s appear to have no states simply because they describe what would happen (sort of.) However, in order for a FP to, well, run, someone, somewhere, must pull a lever so that the wheels start turning according to the spec of the FP, and what ticks under the FP ultimately does have state. I am not suggesting that there are no gains from functional programming, but the notion you offer is bogus as it stands.

  13. Edward Ocampo-Gooding Says:

    A popular excuse for why programming folks all don’t immediately jump into functional programming is because it’s not as intuitive a paradigm as the procedural style.

    I’d like to see a psychology study that measures programming proficiency vs. time trained with “fresh” students unaware of either paradigm and have two separate groups be trained.
    I

  14. You wish Says:

    And yet Lisp, which is considered a functional language, has been around since 1962 (it was spec’ed in 1958), and Brooks doesn’t point to it in his paper. Functional programming isn’t new, and it is certainly not a silver bullet.

  15. Sam Carter Says:

    The reason why the whole world isn’t using functional languages is very simple: shitty library support. The real silver bullet for most programmers is access to large, high quality libraries, with the primary examples being the .NET runtime, or Java’s libraries. Programming tests that require writing 100% of the code from scratch are just flat out inaccurate, because you are measuring the wrong thing. The bulk of modern development time isn’t writing code from scratch. It’s writing code that interfaces with libraries for networking, or XML parsing, or database handling, or whatever (see http://www.joelonsoftware.com/articles/LordPalmerston.html for a fuller treatment of this topic). The functional programming language community is more interested in marginalizing their access to external systems rather than making it easier (research into monads being a great example of that).

  16. Larry O'Brien Says:

    I intended to post the URL of some thoughts on your post, but your comment engine diagnoses dashes in URLs as indicative of spam. (Not so, I think.) Anyhow, http://www.knowing.net/PermaLink,guid,659c9535-6674-48f9-a4f9-8bc34fe724b5.aspx

  17. Paul Says:

    Comparing Haskell to C compares a variety of design parameters at once: strong versus weak-ish typing, garbage collection versus hand collection, high level data types versus low-level ones, different libraries etc.

    Based on my experience with functional languages and other high-level languages like Python, Ruby, Smalltalk, Javascript, Perl, C#, REXX etc,, I would guess that those other factors are MUCH MORE likely to be relevant than just the functional programming paradigm.

    In addition, comparing lines of code for PROTOTYPING is pretty uninteresting. I’d like to compare lines of code for a functional application used by customers. What if the design decisions in a particular language are focused on robustness and maintainability?

  18. Larry O'Brien Says:

    I’ve posted a reply at my blog. Unfortunately, I can’t paste the exact permalink, which your comment engine wrongly insists are spam.

  19. Indeed You Wish Says:

    Certain problems are more easily solved with certain tools — let’s not fool ourselves, and believe FP solves all problems better.

  20. Jeff Says:

    You can make any language look good by comparing it to C++. If being functional is the magic bullet, then why do stateful languages like Python, Perl, Ruby, Lua, Lisp, etc. always do just as well as Haskell in this sort of omparison?

  21. Stuart Dootson Says:

    Unlike the rest of your commenters (as far as I can tell), I *have* used Haskell for real-world problems, and can confirm that (for the problems I’ve used it for), it is significantly more productive than imperative languages like Java, C++ or Ada.

    I don’t think it’s quite ready for the maistream, but it’s definitely got promise.

  22. Not Silver But Still Very Good Says:

    I’m very sceptical about LOC as a measure. However, here’s a very recent informal data point, which I’m mentioning partly because it’s notable for the relatively controlled nature of the comparison: Lucian Wischik at Microsoft recently had to rewrite a 6,000 line F# program in C# (C# was required for unrelated reasons). It became 30,000 lines (references to comments by Lucian below). Now, this comparison is with essentially the same .NET libraries (F# adds a few minor libraries), the same (talented) programmer, the same .NET runtime, the same performance profile, the same problem specification, and the C# code has the advantage of being a reimplementation.

    See Lucian’s comments on the Google group at microsoft.public.dotnet.languages.csharp at http://groups.google.com/group/microsoft.public.dotnet.languages.csharp/tree/browse_frm/thread/38a144ceaf101030/fb4664a7e4c27cf6?rnum=11&q=F%23&_done=%2Fgroup%2Fmicrosoft.public.dotnet.languages.csharp%2Fbrowse_frm%2Fthread%2F38a144ceaf101030%2F38d9aa4dbda50892%3Flnk%3Dgst%26q%3DF%23%26rnum%3D1%26#doc_814fd26871c2c6b7

    More extensive analysis of the differences by Lucian at http://groups.google.com/group/microsoft.public.dotnet.languages.csharp/browse_frm/thread/38a144ceaf101030/38d9aa4dbda50892?lnk=gst&q=f%23&rnum=1#38d9aa4dbda50892

  23. Josh S Says:

    “However, in order for a FP to, well, run, someone, somewhere, must pull a lever so that the wheels start turning according to the spec of the FP, and what ticks under the FP ultimately does have state.”

    That’s not the point. The point is that an FP utilizes a framework that *safely* translates from an abstract language without a concept of state to a concrete language that is basically nothing but state.

    Likewise, object oriented languages do not actually have objects — whether they are detected at compile time or run time, they ultimately map to swaths of memory and functions.

    It’s all smoke and mirrors. But it’s the magic of the smoke and mirrors that enhance our productivity. By building the funnel from a “safe”, “imaginary” language to the “dangerous”, “unproductive” one, we eliminate the problems inherent at the lower level.

  24. Joel Says:

    Functional programming is not usually adopted because so many real-world systems are almost entirely side-effects. Look at all the things a shell script does; it’s changing system state. What’s that JDBC app doing? Changing database state.

    In the world of web applications, you’re always doing I/O. It’s almost all I/O. And that is state (of your I/O streams).

    I’ve been looking at Erlang a lot recently. I have a serious jones for its concurrency constructs. They are as awesome as advertised. But man, you can’t do a decent thing with strings in that language. Seriously, it’s not much better than C. Python, Perl, even Lisp can do a lot of manipulation of the strings - things you have to do in parsing protocols and HTML and so on. But Erlang here is almost a non-starter. Ugh!

    Then there is the subject of libraries. Some major languages provide many things you don’t have to reimplement. But do the functional languages? Lisp has been around the longest, but try getting a gui library that works on all of (SBCL,CMUCL,CLisp). Or a decent GUI on Erlang at all. Even Paul Graham, champion of Lisp as Solution To Everything, admits that you need massive libraries to be useful - it’s a goal he has for Arc.

    At least Haskell has GTK+ bindings. I haven’t explored Haskell enough to comment on it. I just wonder why folks haven’t used it to do Yahoo! Store. If it’s that good, it will make someone money.

  25. fez Says:

    I would love to see a shootout between the various web frameworks (including any based on functional languages).

    Something like the following:
    - well-defined spec
    - basic CRUD functionality for the most part
    - some integration(s) with a third-party API (thus existing off the shelf libraries will be of use)

    Let the best coders from each language duke it out. Record time spent for each segment of app development & log all commits to a Subversion repository.

    Have a panel of neutral judges look at not just the time spent but also feature completeness and any other niceties the teams added in the time alotted, and give a score to each team.

  26. Ulf Wiger Says:

    More references were asked for. Here’s one:

    “Comparing C++ and Erlang for Motorola Telecoms Software”
    carried out by Heriot-Watt University together with Motorola.

    http://www.erlang.se/euc/06/proceedings/1600Nystrom.ppt

    Two applications were re-written from C++ to Erlang, and one of the applications was benchmarked. The pure Erlang version was 2-3x faster, much more robust, and had 1/3rd to 1/18th the number of lines of code, depending on how you compare the libraries. Detailed analysis suggested:

    “- Code for successful case – saves 27%
    - Automatic memory management – saves 11%
    - High-level communications – saves 23%” (slide 31)

    Like all other studies, this one can certainly be debated. Does anyone have a reference to a study concluding that functional programming does NOT lead to a productivity increase?

  27. tndal Says:

    You omitted that ISI Relation Lisp scored the most rapid development time by far: less than half the time required by Haskell.

    Although the line count was greater, this version of Lisp cleaned the floor with Haskell as a productivity tool. And Relational Lisp is a mirror of Prolog.

  28. Jason Says:

    Some points:

    - LOC does not imply that less effort was involved in crafting the Haskell solution. Just less effort in typing it in.

    - Some problems have greater susceptibility to different approaches. One small “real world” problem is completely inadequate.

    - Double-blind experiments in this regard are, in fact, impossible, so this will remain a bench-race forever.


    Jason

  29. Joel Reymont Says:

    Paul, you may be interested in the latest article in my blog. See “Re-birth of a trading platform”. http://wagerlabs.com/2006/12/8/re-birth-of-a-trading-platform

    Thanks, Joel

  30. Larry O'Brien Says:

    Jason: “LOC does not imply that less effort was involved in crafting the Haskell solution. Just less effort in typing it in.”

    Not so. In industrial systems, lines of code produced per month is essentially constant, regardless of language. Also, defect rates per KLOC is essentially constant, regardless of language. (ref. Capers Jones works on “language levels”) (In small programs, you definitely see greater variation.)

    I agree with your other points.

  31. Larry O'Brien Says:

    “Not Silver But Very Good” ref’s Lucian’s claims. The threads have little substance. I’ve been following Lucian’s Website and I feel that a grain of salt is called for. For instance, he wrote an F# program that displays a teapot in Direct3D; fair enough, but he makes it sound like the lighting and manipulation in 3D comes from a few lines of F# when, in fact, the “teapot in a viewport” is canned functionality that can be done in a few lines of _any_ language.

  32. Ben Moseley Says:

    The thrust of your article is very close to the thrust of our “Out of the Tar Pit” paper which investigates FP (along with the relational model) as being very relevant to the Silver Bullet question - http://web.mac.com/ben_moseley/frp/paper-v1_01.pdf .

  33. Not Silver But Still Very Good Says:

    Re the teapot - you’ve got the wrong guy: Lucian works at Microsoft, and has never touched a DirectX teapot. I think you’re thinking of Flying Frog consultancy.

  34. Ulf Wiger Says:

    I totally disagree with the idea that LOC wouldn’t matter. It’s not just a matter of the effort of typing in the code. Much of that extra code often represents “unnecessary” detail that distracts the reader, hides the core logic, and which also needs to be maintained. I’ve seen projects that have problems with keeping “boilerplate” code consistent when the system gets complex enough. Some projects resort to using modeling languages that generate the boilerplate. When judged as programming languages, these tools are usually quite crappy, but their supporters defend them based on the opinion that it’s still a lot better than having to code the stuff by hand. But there are good programming langugages that work at the same abstraction level as those modeling tools. We’ve also found that the learning curve for e.g. Erlang is much shorter than for e.g. UML or C++, contrary to the statement that FPLs would be more difficult to learn, or less intuitive.

    My own conclusions are based on 10 years of working in and around very large software projects, with code volumes in the order of hundreds of thousand lines, or even millions. I’ve had the opportunity to review several projects using C++, UML, Java and Erlang. I get the feeling that many of the comments above come from very limited comparisons. That’s ok - you have to start somewhere. For me, it took 2-3 months to properly un-learn C++ and OO when I first started with Erlang.

  35. anonymous Says:

    Josh S:

    “That’s not the point. The point is that an FP utilizes a framework that *safely* translates from an abstract language without a concept of state to a concrete language that is basically nothing but state.”

    Brooks was talking about the complexity arising out of the sheer multitude of possible states (or combinations thereof). In FP, you retain (at least in part) this very complexity in order to be able to produce any useful behavior, and whether this complexity is state-based or not, is totally irrelevant.

    What’s left to argue about is whether complexity is significantly reduced by imposing the stateless view, and I for one am still a bit sceptical about any radical claims on that account, especially if well-designed programs/systems from both domains are compared.

    Another point that comes to mind is that, paradoxically, FP actually inhibits stateful programming when you need it (and you almost always do when designing and implementing systems,) by making the complexity of combining states explicit. Inherently stateful models, on the other hand, will let you hide some of this complexity, be it at the still-present risk of coming up with something inconsistent.

    It pays to note that, compared to the current state of the art in PP/OOP, the FP way of combining state is clunky even with the help of monads and monad transformers, in case you wanted to bring those into the argument.