Saturday, April 30, 2011

Patent 5,893,120 reduced to mathematical formulae

I have previously argued that the Bilski ruling renders all software unpatentable in the US because any method of manipulating information (i.e an algorithm) can be translated into a Haskell program, every Haskell function is a formula in the Typed Lambda Calculus, and all the variants of the Lambda Calculus are part of mathematics. Hence any algorithm can be reduced to a mathematical formula, which US patent law states is not patentable subject matter.

The computer science community has tried to explain this to the patent industry, but these explanations have fallen on deaf ears. So I thought I would try a different tack.

Google has just been ordered to pay $5M for infringing patent 5,893,120 (hereafter "Patent 120"). This patent covers a very simple data structure and the algorithms for manipulating it. In fact much of the text of the patent is a pseudo-code implementation in a Pascal-like language. So I thought I would provide a practical demonstration of what has, until now, been a theoretical proposition; the reduction of a software patent to set of mathematical formulae. The result is below, and is also posted here (although I believe that the paste-bin will eventually expire).

I chose this patent partly because it is comparatively simple (there are 95 lines of Haskell source code in the file), but also because it is potentially important; Google's "infringement" consisted of using Linux, which uses this algorithm. Hence anyone using Linux is a potential target.

Red Hat is reportedly seeking a declarative judgement that this patent is invalid, and Google is expected to appeal, so this is not over yet. If anyone knows how I can contact their lawyers to draw these formulae to their attention, I'd be grateful.

Edit 1:A brief overview of the relationship between Haskell and the Lambda Calculus can be found here.

Edit 2:Of course a judge isn't going to know the Lambda Calculus from a lump of rock, but that is what expert witnesses are for. Get a professor of mathematics from an internationally recognised university to testify that these are formulae in the Lambda Calculus, and that the Lambda Calculus is part of mathematics, and you have a sound legal proof. The only thing the patent holders could do is find another professor to testify differently.

Edit 3:Sorry about the broken formatting. If you want to review the code you can either look at this colourised version or copy and paste it into your favourite editor.

Edit 4: This post has been slashdotted. Rather than try to respond to all the various comments individually I have posted a general response to the common themes.

In the meantime, here are the formulae:


{- |

The claims of United States Patent 5,893,120 may be reduced to a set
of mathematical formulae expressed in the Typed Lambda Calculus, also
known as System F. This file contains these formulae. The language
used is Haskell, which is a version of System F with extra "syntactic
sugar" to make it useful for practical purposes.

It happens that formulae expressed in Haskell can also be transated by
a Haskell compiler into executable computer programs. However that
does not make a Haskell function any less of a mathematical formula.
Otherwise the quadratic formula

> x = (-b +/- sqrt (b^2 - 4ac))/2a

would be rendered patentable subject material by the same argument.

-}


module Patent120 where

import Data.Char (ord)

-- The "ord" function defines a bijection between text characters and
-- numbers.

import Data.Map (Map)
import qualified Data.Map as M

-- The Data.Map module is a standard module defined entirely as a set
-- of Haskell formulae, in the same way as this module. The "import"
-- means that these formulae are incorporated with the formulae given
-- here.

{-
Claim 1. An information storage and retrieval system, the system
comprising:

a linked list to store and provide access to records stored in a
memory of the system, at least some of the records automatically
expiring,

a record search means utilizing a search key to access the linked
list,

the record search means including a means for identifying and removing
at least some of the expired ones of the records from the linked list
when the linked list is accessed, and

means, utilizing the record search means, for accessing the linked
list and, at the same time, removing at least some of the expired ones
of the records in the linked list.
-}

-- | A record with Birth value below some threshold is considered "expired".
type Birth = Integer


-- | Records in the list have their birth recorded, and also contain
-- a key and a value.
data Record k a = Record Birth k a

-- | True iff the record is older than the threshold argument.
expired t (Record b _ _) = b < t -- | True iff the record matches the key. matches k (Record _ k1 _) = k == k1 -- | The value stored in a record. value (Record _ _ v) = v -- | Records are stored in a linked list in the system memory. type Storage k a = [Record k a] -- | The "search1" function includes a threshold age parameter. -- It returns both the items that match the key and the linked list -- with the expired items removed. -- -- The recitation of "means" cannot be used to disguise the formula -- given here. Otherwise any mathematical formula could be patented -- by reciting e.g. "addition means" and "multiplication means" that are -- mechanically derived from the formula. search1 :: (Eq k) => Birth -> k -> Storage k a -> (Storage k a, [a])
search1 t k records = foldr nextItem ([], []) records
where
nextItem r@(Record b k2 v) (retained,found) =
(if b > t then r:retained else retained,
if k == k2 then v:found else found)


{-
Claim 2. The information storage and retrieval system according to
claim 1 further including means for dynamically determining maximum
number for the record search means to remove in the accessed linked
list of records.
-}

-- | Similar to "search1", but with an added parameter for the maximum
-- number of records to remove.
search2 :: (Eq k) => Int -> Birth -> k -> Storage k a -> (Storage k a, [a])
search2 n t k = (\(_,x,y) -> (x,y)) . foldr nextItem (n,[],[])
where
nextItem r@(Record b k2 v) (n2,retained,found) =
(n2-1,
if b > t || n2 == 0 then r:retained else retained,
if k == k2 then v:found else found)


{-
Claim 3. A method for storing and retrieving information records using
a linked list to store and provide access to the records, at least
some of the records automatically expiring, the method comprising the
steps of:

accessing the linked list of records,

identifying at least some of the automatically expired ones of the
records, and

removing at least some of the automatically expired records from the
linked list when the linked list is accessed.

Claim 4. The method according to claim 3 further including the step of
dynamically determining maximum number of expired ones of the records
to remove when the linked list is accessed.
-}


-- | Claim 3 can be reduced to the same formula as Claim 1. The use of
-- a sequence of steps cannot disguise the fact that this represents
-- a mathematical formula. Otherwise it would be possible to patent
-- any mathematical formula simply by reciting the steps in evaluating
-- it. e.g. "step 3: the addition of the results of step 1 and step 2"
search3 :: (Eq k) => Birth -> k -> Storage k a -> (Storage k a, [a])
search3 = search1

-- | Likewise Claim 4 can be reduced to the same formula as Claim 2.
search4 :: (Eq k) => Int -> Birth -> k -> Storage k a -> (Storage k a, [a])
search4 = search2

{-
Claim 5. An information storage and retrieval system, the system
comprising:

a hashing means to provide access to records stored in a memory of the
system and using an external chaining technique to store the records
with same hash address, at least some of the records automatically
expiring,

a record search means utilizing a search key to access a linked list
of records having the same hash address,

the record search means including means for identifying and removing
at least some expired ones of the records from the linked list of
records when the linked list is accessed, and

meals, utilizing the record search means, for inserting, retrieving,
and deleting records from the system and, at the same time, removing
at least some expired ones of the records in the accessed linked list
of records.
-}


-- | Every key has a hash code. Strings of characters may be used as
-- keys.
class (Eq k) => Hashable k where
hash :: k -> Int -- Every key has a hash code.

instance Hashable Char where hash = ord

instance (Hashable a) => Hashable [a] where
hash = foldr (\x h -> ((hash x + h) * 53 + 1) `mod` 1733) 0

type HashedStore k a = Map Int [Record k a]


-- | Access a hashed store with a function that returns the modified list of records.
hashAccess5 :: (Hashable k) =>
(Storage k a -> (Storage k a, [a])) -> Birth -> k -> HashedStore k a -> (HashedStore k a, [a])
hashAccess5 f t k store = (M.insert h retained store, result)
where
h = hash k
(retained, result) = case M.lookup h store of
Just records -> f $ filter (expired t) records
Nothing -> ([], [])


-- | Search using hashAccess.
search5 :: (Hashable k) => Birth -> k -> HashedStore k a -> (HashedStore k a, [a])
search5 t k = hashAccess5 srch t k
where srch store = (store, map value $ filter (matches k) store)

-- | Insert using hashAccess.
insert5 :: (Hashable k) => Birth
-- ^ Expiry threshold for old entries.
-> Birth
-- ^ Birth date for new entry.
-> k -> a -> HashedStore k a -> HashedStore k a
insert5 t b k v = fst . hashAccess5 (\store -> (Record b k v : store, [])) t k


-- | Delete using hashAccess
delete5 :: (Hashable k) => Birth -> k -> HashedStore k a -> HashedStore k a
delete5 t k = fst . hashAccess5 (\store -> (filter (not . deleted) store, [])) t k
where deleted (Record _ k1 _) = (k == k1)


{-
Claim 6. The information storage and retrieval system according to
claim 5 further including means for dynamically determining maximum
number for the record search means to remove in the accessed linked
list of records.
-}


-- | Access a hashed store with a function that returns the modified list of records. The "Int"
-- argument is the maxiumum number of expired records to remove.
hashAccess6 :: (Hashable k) =>
Int -> (Storage k a -> (Storage k a, [a])) -> Birth -> k -> HashedStore k a -> (HashedStore k a, [a])
hashAccess6 n f t k store = (M.insert h retained store, result)
where
h = hash k
(retained, result) = case M.lookup h store of
Just records -> f $ filterN n (expired t) records
Nothing -> ([], [])
filterN _ _ [] = []
filterN n1 p (x:xs) = if n1 <= 0 || p x then x : filterN n1 p xs else filterN (n1-1) p xs -- | Search using hashAccess. search6 :: (Hashable k) => Int -> Birth -> k -> HashedStore k a -> (HashedStore k a, [a])
search6 n t k = hashAccess6 n srch t k
where srch store = (store, map value $ filter (matches k) store)

-- | Insert using hashAccess.
insert6 :: (Hashable k) => Int
-> Birth
-- ^ Expiry threshold for old entries.
-> Birth
-- ^ Birth date for new entry.
-> k -> a -> HashedStore k a -> HashedStore k a
insert6 n t b k v = fst . hashAccess6 n (\store -> (Record b k v : store, [])) t k


-- | Delete using hashAccess
delete6 :: (Hashable k) => Int -> Birth -> k -> HashedStore k a -> HashedStore k a
delete6 n t k = fst . hashAccess6 n (\store -> (filter (not . deleted) store, [])) t k
where deleted (Record _ k1 _) = (k == k1)


{-
Claim 7. A method for storing and retrieving information records using
a hashing technique to provide access to the records and using an
external chaining technique to store the records with same hash
address, at least some of the records automatically expiring, the
method comprising the steps of:

accessing a linked list of records having same hash address,

identifying at least some of the automatically expired ones of the records,

removing at least some of the automatically expired records from the
linked list when the linked list is accessed, and

inserting, retrieving or deleting one of the records from the system
following the step of removing.

Claim 8. The method according to claim 7 further including the step of
dynamically determining maximum number of expired ones of the records
to remove when the linked list is accessed.
-}

-- | As with Claim 3 vs Claim 1, the formulae for Claim 7 are the same as for Claim 5
hashAccess7 :: (Hashable k) =>
(Storage k a -> (Storage k a, [a])) -> Birth -> k -> HashedStore k a -> (HashedStore k a, [a])
hashAccess7 = hashAccess5

search7 :: (Hashable k) => Birth -> k -> HashedStore k a -> (HashedStore k a, [a])
search7 = search5

insert7 :: (Hashable k) => Birth
-- ^ Expiry threshold for old entries.
-> Birth
-- ^ Birth date for new entry.
-> k -> a -> HashedStore k a -> HashedStore k a
insert7 = insert5

delete7 :: (Hashable k) => Birth -> k -> HashedStore k a -> HashedStore k a
delete7 = delete5


-- | And the formulae for Claim 8 are the same as for Claim 6
hashAccess8 :: (Hashable k) =>
Int -> (Storage k a -> (Storage k a, [a])) -> Birth -> k -> HashedStore k a -> (HashedStore k a, [a])
hashAccess8 = hashAccess6


search8 :: (Hashable k) => Int -> Birth -> k -> HashedStore k a -> (HashedStore k a, [a])
search8 = search6

insert8 :: (Hashable k) => Int
-> Birth
-- ^ Expiry threshold for old entries.
-> Birth
-- ^ Birth date for new entry.
-> k -> a -> HashedStore k a -> HashedStore k a
insert8 = insert6

delete8 :: (Hashable k) => Int -> Birth -> k -> HashedStore k a -> HashedStore k a
delete8 = delete6

Wednesday, April 20, 2011

Why I am voting Yes to Alternative Vote

On May 5th the UK votes whether to change its electoral system from "first past the post" (FPTP) to "alternative vote" (AV). In FPTP you put an X next to the name of one candidate, and the candidate with the biggest number of Xs wins. In AV the first preferences are counted as for FPTP, but then the candidate with the lowest number of votes is eliminated and the second preferences of their voters are then counted towards the other candidates. This process carries on until someone has more than 50% of the vote.

When I was at University the Students Union used AV, and one year a man stood for the post of Womens' Officer under the name of "Captain Kirk". Against him were two more conventional candidates standing on the Conservative and Labour platforms.

When the votes were counted Captain Kirk got 49% of the first preference votes, with the Conservative getting 26% and Labour 25%. Under FPTP Kirk would have won, but now the Labour candidate was eliminated and it turned out that all of the people who voted for her first had voted for the other woman as second preference. So now the Conservative candidate had 26%+25%=51% of the vote, and Kirk was defeated.

I'm well aware of the Arrow Impossibility Theorem, which shows that given an election with three or more candidates and three or more voters it is impossible to have a voting system that always delivers the right result, but it seems to me that this is rather like the fact that any programming language is logically equivalent to a Turing Machine; its true, but it doesn't mean that all (voting systems | programming languages) are equally good. FPTP seems particularly prone to perverse outcomes, such as the election of a male Womens' Officer when the majority of the electorate wants a female. Pretty much every election in the UK is a 2-horse-1-pony race, with the Liberal Democrats as the pony, and every election pundits discuss the impact of "tactical voting" as people who would like to elect the Lib Dems instead vote Labour for fear of otherwise "letting the Conservatives in". Under AV they can simply vote Lib Dem first and Labour second.

The anti-AV campaign's arguments seem to come down to:
  • It will create more hung parliaments. Possibly. I don't see this as a bad thing. The current coalition seems to be doing OK, and a coalition means that more of the electorate's views get represented in government. Having a government elected by 40% of the people get 100% of the power seems to me rather undemocratic.
  • It will empower the National Front. For those who don't know, the National Front (NF) is an extreme right wing racist party, which occasionally does well in poor inner-city areas where the "immigrants are taking your jobs and houses" line gets a sympathetic hearing. But equally, if most people object to the NF on principle then AV makes it much easier to vote against them; just put the NF at the bottom of your list. That ensures that no matter how other people vote, your vote will count against the NF.
  • The most popular person doesn't always win. Well yes, if you define "most popular" as "winner under FPTP" then this is true; if AV didn't give a different result sometimes then there wouldn't be any difference. The point about AV is that the candidates with the broadest support tend to win much more often, whereas FPTP is prone to producing winners who most of the electorate actively dislike.
So for these reasons I'm going to be voting YES to AV.

Why I am not buying an ebook reader

Being the kind of person with a lot of bookshelves and still not enough room to keep all my books, I wondered about moving to ebooks. There is a lot to like in the concept; the idea of being able to take a significant chunk of my library with me when I am away from home is very attractive, as is being able to back up the whole lot somewhere off site. With a waterproof cover for my ebook reader I could take it into the bath or just read when outdoors on a rainy day without worrying about the wet. And of course you don't have to wait for days for a book to be delivered by Amazon.

So I started looking for answers to the following questions:

1: What books are available in ebook format? And for how much?

I picked a selection from the books I have around me now. Terry Pratchett's books are on the Kindle for around £4.50 each, although "I Shall Wear Midnight" is £9.24. Hang on a minute, the paperback of ISWM is only £3.49. So for almost three times the price I get to save the book industry the cost of printing and shipping me a physical product? I don't think so. Even the older Pratchett books still cost more for the Kindle than in paperback.

Looking elsewhere, the first four Google hits are for pirate torrents and Usenet indexers. Then there is "Fictionwise", which seems to be Barnes & Noble in the US selling stuff on it's Nook. Despite getting a hit in the Google search it seems that B&N don't actually sell any Pratchett. Finally the Diesel e-book store has them for around $8 each, with ISWM for $10. At the moment a dollar is about £0.61 although doing it through a credit card company is going to cost more.

OK, what about some other books? Robert Heinlein: there are four Kindle e-books out of the 32 novels and 59 short stories written by Heinlein. Diesel does better, with 8 Heinlein books. Baen also have some more Heinlein books.

"Contact" by Carl Sagan is not available for Kindle. Googling for "contact sagan ebook" turns up pages of pirate copies but nothing legitimate.

Lois McMaster Bujold's Vorkosigan saga I already have in ebook format because Baen included the entire opus on a CD with the latest volume. Not only does it not have DRM, but it is licensed for free redistribution. Kudos to Baen, and I hope the experiment works for them.

The Matthew Shardlake stories by C.J. Sansom are all available for Kindle, for about the same price as the paperbacks. They are also available in other ebook formats from Diesel for $13 each, which seems a bit steep even by ebook standards.

"Next Stop Execution" by Oleg Gordievsky is not available on the Kindle or on Diesel.

Some of Matt Ridley's books, including "The Red Queen", "Genome", and his latest "The Rational Optimist" are available on the Kindle, but "The Origin of Virtue" is not, which seems odd. Diesel only has "The Rational Optimist".

So my conclusion is that I can buy most of my existing library in ebook format, but not all. Even then its going to be an expensive business to do it legally; we probably have around 1,000 books in our house (never actually counted), and the cost of getting just half of them in ebook format is going to be in the thousands of pounds. Ouch!

2: What about Digital Restrictions Management

The attitude of most publishers here is best summed up by Barnes & Noble. Its "LendMe(TM) technology" lets you lend a book to someone else for a maximum of two weeks exactly once. Amazingly this is sold as a feature! What planet are these people living on? My wife and I both enjoyed C.J. Sansom's series of historical novels. If we used ebook readers then we would both have to buy our own copies. And our son is now starting to read my Terry Pratchett books, so make that three copies in some instances.

And then there is the limit on transfers to new devices. It seems from reports on the Web that ebooks can only be transferred to new devices a finite number of times. After that, your books are stuck on your old devices. So as I upgrade and replace my ebook readers over the years, as I am sure I will, eventually I will start having to keep my old readers around or else purchase their content yet again. Also these transfers have to be mediated by the website you bought the ebook from; if they go out of business then your content is stuck.

Of course I could resort to the pirate copies around the Net. It seems like pretty much anything that is available in a DRM version is available unlocked for free. As a test I downloaded a copy of "The Moon is a Harsh Mistress" from one such site, and sure enough I got an unencrypted PDF (which I could read on my Linux box as an added bonus). I don't feel morally obligated to buy a whole new book just to shift formats: I've already paid the author for the words when I bought the dead-tree version. (Or have I? Some of my books were purchased second-hand. Interesting conundrum there.) But such "solutions" are inherently unreliable. I also worry that I will be sucked into a life of copyright infringement; a little demon on my shoulder is asking me, if I'm going to the trouble of locating and downloading a pirate copy to escape the DRM, why bother buying a legitimate copy that I'm never actually going to look at?

The list of things I can't do with a DRM'd ebook grows the more I think about it. As our son grows out of books we have donated the old ones to his school library. I've lent technical books to colleagues. I've bought out-of-print books second hand. I've borrowed books from libraries. My parents read some of our books when they visit. None of these things are possible with ebooks.

So despite all the advantages, I think I'll pass for now. Maybe in a few years the publishing industry will learn the same lessons that the music industry is learning about the futility of DRM and how to price electronic material, and more old books will be digitised. Then I'll get an ebook reader. But not now.