Paul's Pontifications

Saturday, February 16, 2013

On Having an E-book Reader

Back in 2011 I wrote about the reasons why I wasn't getting an e-book reader. I had found that books generally cost more in e-book format than in dead-tree format, and I was nervous about the digital restrictions management (DRM) that e-books came with. These concerns were only increased when I read about Linn Nygaard who had her Amazon account closed (and all her e-books effectively confiscated) for some unexplained violation of Amazon policy that was probably committed by the previous owner of her second-hand Kindle. The fact that her account was restored after 24 hours of world-wide outrage didn't reassure me; fame is fickle, and relying on it as an ally against a giant corporation would be unwise.

However as a result of that fiasco a number of articles were posted about the removal of DRM from e-books using Calibre, which is an open-source program for managing your electronic library on your computer and converting between formats. You have to download and manually install some plugins in addition to the standard distribution, but once they are installed you just add a DRMed book to your Calibre library, and it automatically gets the DRM stripped out.

In parallel with this, a long-running legal case between the US Department of Justice and a number of e-book publishers resulted in a settlement under which prices have dropped considerably, and are now significantly cheaper than the paperback price.

So that was my two main objections to an ebook reader dealt with: I could now share books with family in a similar way to a paper copy, and I wasn't paying the publisher to leave out the paper. So I asked for a Kindle for my birthday last year.

I'm very happy with it. I've downloaded some classics from the Gutenberg project, and also picked up some more obscure books like the updated edition of "Code and Other Laws of Cyberspace" that I have been wanting to read for a long time. Specialist books like this seem to be a lot cheaper in ebook format, presumably because so much of the cost in the dead-tree version is taken up in the overheads of short-run printing and storage. But even Anathem was only £2.99 (although it seems to be rather more when I look at it now).

My wife tried out my Kindle as well, and asked for one for Christmas. When Amazon proved unable to deliver in time I bought one from the local branch of Waterstones bookshop. This turned out to be a mistake: the Kindles bought from Waterstones have the nice varied "screensaver" pictures replaced with a fixed bit of Waterstones branding that can't be replaced (I've come across some instructions for replacing that image by logging into the Kindle using TCP over USB, but that particular back door seems to have been closed now).

Wednesday, January 30, 2013

Bitcoin as a Disruptive Technology

One of the most important ways of thinking about the way that technological and commercial forces create change is Clayton Christensen's notion of a Disruptive Technology. According to Christensen, new technologies generally start by serving a niche market that is currently under-served by the incumbent technology (and the companies that sell it). The new technology is ignored by the incumbent companies because they can't see how to make money from it; it doesn't fit their big profitable customers and the niche market is too small to be interesting. So the incumbents ignore it. Meanwhile the new technology matures and improves to the point where it can be used by the big customers, and then suddenly the incumbent technology (and the companies that sell it) are obsolete.

Like any model of a complex situation this doesn't cover every angle, but its interesting to look at Bitcoin from this perspective. Christensen's approach in his book is to look at the problem from the point of view of a manager in an incumbent company (who sees a disruptive technology as a threat) or a start-up company (who sees it as an opportunity). I'm simply going to look at the major market forces, and in particular the niche markets that Bitcoin might serve better than the incumbents, and the possible paths out from those markets.

The Underworld

An obvious group of people who are served poorly by the incumbent financial system are criminals. Unlike most markets this is a matter of deliberate design. Over the past century governments have systematically regulated all forms of financial transactions in ways that make it difficult to hide ill-gotten gains, or for that matter legitimately-gotten gains that might incur taxes. Banks are legally required to know who their customers are and report any transactions that seem suspicious, or merely large. For people who move millions of dollars around there are methods to avoid such scrutiny, but these are themselves expensive; the necessary accountants and lawyers don't come cheap. Hence there is a significant group of people who have a pressing need to avoid the scrutiny that comes when you move tens of thousands of dollars around the world, but who can't afford the infrastructure used by those who move millions.

Traditionally these people have used cash, but that is hard to do across national borders because you have to physically move the stuff, making it vulnerable to interception by the authorities or thieves. So Bitcoin, with its ability to slide across national borders without trace or cost, is very attractive.

The Grey Market

A related group are those who deal in stuff that is legal in some jurisdictions but not in others. Porn and gambling are two major businesses here. Certain drugs also fit into this category, but you can't move the product over wires, so it is vulnerable to conventional methods of interdiction (although that doesn't stop everyone).

Governments trying to control porn and gambling have generally followed the money. This tends not to work well with porn because there is too much available for free. But gambling needs to move money to work, and so authorities in several countries have attacked it at this point. Hence Bitcoin is very attractive in this market as well; punters can easily convert their local currency into Bitcoins, and if they manage to win something then they can convert their winnings back almost as easily.

The Unbanked

This is a catch-all term for people who have no access to conventional bank accounts, and hence have to deal in cash or barter.

Financial solutions for these people have traditionally been expensive and piecemeal. Moving money long distance is done by wire transfer, with hefty charges and the need to physically pick it up. Cash is difficult to hide from brigands and corrupt officials. Credit can be available, but only at punitive interest.

Things have improved; across Africa variations on themes of microfinance and mobile phone banking are changing the lives of millions, but they are still vulnerable. Local laws can limit access, and accounts in local currency are vulnerable to inflation. A stable currency that can be easily hidden and transferred quickly over long distance could meet a real demand, although it still can't provide credit. Mobile phone credit is already serving this role in some places, so something that is designed for the job should be readily adopted.

Actually holding Bitcoins requires rather more computer power than the many third-world mobile phones can provide. But that is unlikely to be a problem for long. If MPESA can have an app, then so can Bitcoin.

Conclusion

Bitcoin looks like a classic disruptive technology: it has multiple niches in markets that are under-served by conventional money, and the grey market and the unbanked provide a ready path up to higher-value markets in places that are now reasonably well served by cash and credit cards. The black market will also provide market pull for Bitcoin uptake, but if that were the only early market niche then the mere use of Bitcoins would raise strong suspicion of illegal activity. The presence of legitimate, or at least lawful, uses of Bitcoin provides a rationale for those offering to extend conventional services to Bitcoin users and plausible deniability for those whose Bitcoins have in fact come from illegal operations.

Accordingly we should expect to see Bitcoin become a strong force in finance over the next decade.

Software has CivEng Envy

There is a school of thought which says that developing software should be like constructing a building. To make a building you have an architect draw blueprints, and these blueprints are then handed over to a builder who constructs what the architect has specified. According to this school of thought the problem with the software industry is that it doesn't create blueprints before it starts building the software. They look with envy at the world of civil engineering, where suspension bridges and tunnels and tall buildings routinely get finished on budget and schedule.

This is a superficially attractive idea; software is indeed difficult, and it would indeed be a foolish project manager on a building site who directed the builders to start laying the foundations before the plans had been finalised. But on a closer examination it starts to fall apart.

Suppose that a big software project does indeed need something analogous to a blueprint before starting on the coding. What, exactly, is a blueprint? What purpose does it serve? And where would that fit into the software lifecycle?

A blueprint for a building is a precise and complete specification of everything that will go into the building. The builder has the problem of assembling what the blueprint shows, but there is no ambiguity and no variation can be permitted. This is because buildings are safety critical infrastructure. The Hyatt Regency walkway collapse was a horrible example of what can happen when someone makes a seemingly innocuous change to the plans for a building. So before a building is constructed the plans have to be approved by a structural engineer who certifies that the building is indeed going to stay up, and by an electrical engineer who certifies that it isn't going to electrocute anyone or catch fire due to an electrical fault, and by a bunch of other engineers with less critical specialties, like air conditioning. The details matter, so the blueprints have to specify, literally, every nut and bolt, their dimensions, the metal they are made from and the torque to which they should be tightened (most of these things are standardised rather than being written down individually for every nut and bolt, but they are still part of the specification). Without this the structural engineer cannot tell whether the building is structurally sound. Similarly the electrical engineer must know about every piece of wire and electrical device. So by the time the blueprints are handed to the builder inventiveness and creativity are neither required nor allowed.

The only artefact in software development that specifies how the software will operate to this degree of precision is the source code. Once the source code has been written, it is to be executed exactly as written: inventiveness and creativity in its execution are neither required nor allowed. But those who promote the idea of "software blueprints" seem to think that something else, something more abstract, can be a blueprint, and that once these blueprints have been drawn the "construction" of the software (that is, turning these blueprints into source code) can proceed in an orderly and planned fashion, putting one line of code after the next in the manner of a bricklayer putting one brick on top of another.

But when you look at the artefacts that these people proffer, it is clear that they are nothing like precise enough to act as a blueprint; they are more like the artists impressions, sketched floor plans and cardboard models that architects produce during the early phases of design. These artifacts can help people understand how the building will be used and fit into its environment, but they are not blueprints.

(By the way, the old chestnut about unstable software requirements being like "deciding that a half-constructed house ought to have a basement" fails for the same reason. The problem with unstable requirements is real, but the analogy is wrong).

But if the blueprint for software is the source code, then the builder for software is the compiler. This should not be a surprise: when computer scientists encounter a task that does not require inventiveness and creativity then their first instinct is to automate it. If so then it is really civil engineering that needs to be envious of software engineering.

Software is not like buildings in other ways too:

Buildings have very few moving parts and little dynamic behaviour, whereas software is all about dynamic behaviour (and buildings with dynamic behaviour often have problems.
Novelty in buildings is rare. I work in a three storey steel frame office block, which is also on an estate of very similar three storey steel frame office blocks. Software, on the other hand, is almost always novel. If software to do a job is already available then it will be reused; I run Fedora Linux; I don't write my own operating system from scratch.

So please can we drop this half-baked analogy between writing software and civil engineering.

Friday, January 18, 2013

When Haskell is faster than C

Conventional wisdom says that no programming language is faster than C, and all higher level languages (such as Haskell) are doomed to be much slower because of their distance from the real machine.

TL;DR: Conventional wisdom is wrong. Nothing can beat highly micro-optimised C, but real everyday C code is not like that, and is often several times slower than the micro-optimised version would be. Meanwhile the high level of Haskell means that the compiler has lots of scope for doing micro-optimisation of its own. As a result it is quite common for everyday Haskell code to run faster than everyday C. Not always, of course, but enough to make the speed difference moot unless you actually plan on doing lots of micro-optimisation.

This view seems to be borne out by the Computer Language Benchmarks Game (aka the Shootout), which collects implementations of various programs written in different languages and measures their execution speed and memory usage. The winningest programs are always written in highly optimised C. The Haskell versions run anything from slightly to several times slower than the C versions. Furthermore, if you read the Haskell it is also highly optimised, to the point where it no longer looks like Haskell. Instead it reads like C translated into Haskell. Based on this, you would be justified in concluding that everyday Haskell (that is, Haskell that plays to the strengths of the language in brevity, correctness and composability) must be irredeemably slow.

But then I read this presentation by Tony Albrecht, in which he talks about how some seemingly innocent everyday C++ code is actually very inefficient. Two things in particular caught my eye:

A fetch from the main memory when there is a cache miss costs 400 cycles. Pointer indirection in particular tends to fill up cache lines, and hence increase the number of cache misses overall.
A wrong branch prediction costs over 20 cycles. A branch that skips a simple calculation can actually run slower than the calculation.

To put it another way, C is no longer close to the real machine. The real machine has 3 levels of CPU cache (some of them shared between multiple cores), long instruction pipelines, branch prediction, multiple ALUs and FPUs, and hardware data flow analysis done while the program is being executed in order to schedule all this in a way that makes it look like a simple processor executing one instruction at a time. C doesn't expose all that to the programmer, so it seems to me that the only way to write highly optimized C is to have a sophisticated mental model of the processor and its memory architecture, decide what you want this machine to do, and then reverse-engineer the C which is going to make that happen. The result is difficult to understand and hence hard to maintain. Look at the C implementations of the Shootout problems for examples of what I mean.

But most code isn't written like that. Tony Albrecht is a games programmer, an expert at squeezing cycles out of the rendering loop. Most developers do not live in that world. For them the objective is to produce code that meets the requirements, which includes being fast enough. This is not laziness or waste, but practicality. First design the optimal algorithm, then implement it in idiomatic code. Only if that does not run fast enough should you start detailed profiling and micro-optimisation. Not only is the micro-optimisation process itself expensive, but it makes the code hard to understand for future maintenance.

So I wondered: the high level of Haskell gives the compiler many more opportunities for micro-optimisation than C. Rather than comparing micro-optimised programs therefore, it seemed sensible to compare everyday programs of the sort that might be produced when readability is more important than raw speed. I wanted to compare programs that solved a problem large enough to have a useful mix of work, but small enough that I could write Haskell and C versions fairly quickly. After poking around the Shootout website I settled on the reverse-complement problem.

A potential issue was that one of my programs might inadvertently use something highly non-optimal, so I decided I would profile the code and remove anything that turned out to be pointlessly slow, but not change the structure of the program or add complexity merely for the sake of speed. With that in mind I wrote Haskell and C versions. I also downloaded the Shootout winner to get some feel for how my programs compared. You can see my code at the bottom of this post.

The first version of the Haskell took 30 seconds (compared with the Shootout time of about 0.6 seconds). As I had feared, profiling did indeed reveal something pointlessly slow in it. In order to filter out carriage returns from the input I used "isLetter", but unlike the C char type the Haskell Char covers the whole of Unicode, and determining if one of those is a letter is not trivial. I put the filter after the complement operation and compared the result with zero, which in addition to being faster is also the Right Thing if the input contains invalid characters. Once I had this fixed it dropped down to a much more respectable 4.3 seconds. Interestingly, profiling suggests that about half the time is being spent writing out the 60 character lines; merely printing out the result with no line breaks cut execution down to around 2 seconds.

The C version, meanwhile, took 8.2 seconds. Profiling did not directly reveal the cause, but it seemed to imply that the processor was spending most of its time somewhere other than my code. I strongly suspect that this time is being spent in putc(3) and getc(3). The obvious optimisation would use fread(3) and fwrite(3) to read and write characters in blocks instead of one at a time, but that would require significant changes to the code; extra bookkeeping to deal with the start of the next sequence (signalled by a ">" character) when it is found half way through a block, and to insert newlines similarly. Unlike the replacement of isLetter in Haskell this would require new variables and control structures driven by the cost of the solution rather than a simple switch to a less expensive expression

It might be argued that I have tilted the playing field against C by not making these changes, and that any halfway competent C programmer would do so when faced with code that runs too slowly. If isLetter is pointlessly slow, isn't the same thing true of putc(3) and getc(3)? But I think there is a clear difference. Both programs are written in a character-oriented way because the problem is described in terms of characters. I wrote the inner loop of the C to operate on a linked list of blocks because it looked like a faster and simpler choice than copying the whole string into a new buffer twice the size every time it overflowed (on average this algorithm copies each character once or twice; see Knuth for details). I might have considered reading or writing the characters in reverse order rather than doing the in-memory reverse in a separate function, but profiling didn't show that as a significant time sink. Overall, getting decent performance out of the C is going to take about the same amount of work as writing the code in the first place.

On the other hand the Haskell has decent performance out of the box because the compiler automates a lot of the micro-optimisation that C forces you to do manually. It may not do as good a job as a human with plenty of time might do, but it does it automatically and reasonably well.

This isn't principally about code size, but for the record the Haskell has 42 SLOC, of which 21 are executable. The C has 115 SLOC, of which 63 are executable. The Shootout C winner has 70 SLOC, of which 46 are executable (counting comma separated statements as one line per statement).

So here is the bottom line. If you really need your code to run as fast as possible, and you are planning to spend half your development time doing profiling and optimisation, then go ahead and write it in C because no other language (except possibly Assembler) will give you the level of control you need. But if you aren't planning to do a significant amount of micro-optimisation then you don't need C. Haskell will give you the same performance, or better, and cut your development and maintenance costs by 75 to 90 percent as well.

Update: here are the compilation and run commands:

ghc -main-is HaskellText -O2 -fllvm -funbox-strict-fields HaskellText.hs

gcc -O2 naive-c.c -o naive

time ./HaskellText < big-fasta.txt > /dev/null

time ./naive < big-fasta.txt > /dev/null

The Rise of the Me-Burger?

Back in February news broke that a university researcher had successfully grown bovine muscle tissue in a lab, although a prototype burger made this way would cost an estimated £200,000.

I was immediately reminded of an Arthur C. Clarke short story (spoiler alert) in which synthetic meat is routinely eaten, but its nature is disguised for marketing purposes. The latest product is the wildly popular "Ambrosia Plus", which turns out to be synthesized human meat.

Could this be about to come true? Given that synthetic meat is going to cost more than animal meat for the foreseeable future, its only niche is going to be as a status symbol for rich people. As Clarke pointed out, up until now human meat has been almost unobtainable, and in most societies the steps required to obtain it have made it taboo. But soon it seems likely that anyone with the necessary money will be able to have a sample without any ethical concerns (although the squick remains).

But then, why not go one step further? The original sample from which the synthetic human-burger is grown has to come from someone, and the identity of that someone could become a marketing point. One can envisage a particularly egocentric billionaire offering his guests burgers cloned from himself. Or perhaps famous people will find themselves being offered large sums of money for a biopsy. Would you like to nibble on Naomi Campbell? Or perhaps a bite of Usain Bolt?

Saturday, August 4, 2012

Managing Risks in Company IT

Another month, another story of corporate IT going spectacularly wrong. First we had the RBS debacle, now it seems that Knight Capital Group have lost hundreds of millions to a rogue algorithm. Similar events have happened before. And its not just banks that suffer from such events.

There are lessons to be learned here. Some of them I draw directly from the events linked above. Others are derived from watching large software-intensive organizations from the inside.

Lesson 1: It happens without warning

Your business is ticking over nicely. You have just approved the annual IT budget, and your CIO has assured you that everything is green. Then at 6am you get a phone call telling you that your annual profit for this year, and maybe your entire company, has just vanished into thin air courtesy of a computer that you own but quite possibly have never heard of. How can this happen?

The answer is that computers are non-linear; small changes can have huge consequences. Change a plus to a minus somewhere in a program with 100,000 lines of code (which is fairly typical) and if you are lucky you will get no output, and if you are unlucky you will get the wrong output.

Mistakes like that happen all the time. As a rough rule of thumb, when programmers type code they make a mistake every 10 lines or so. Everything after that is about finding and removing those mistakes. On top of that you have the mistakes that were baked in at the specification stage (assuming that your software even has a written specification; if you are relying on a programmer having an informal chat with the person who wants the program then you are in even worse shape).

The only way to prevent this stuff happening is to treat the technology as important.

Lesson 2: Do a Risk Assessment

Do an inventory of every single program used by the company in regular business. If anyone has a spreadsheet file that they regularly use, treat that as a separate program. You may have to do some digging to find these things, but that spreadsheet that some bright intern in Operations invented last year to schedule the truck drivers could be the one that paralyses your entire operation next February 29th.

When you have done this you will have a depressingly long list that fits into roughly three categories, listed here in ascending order of risk:

Commercial Off The Shelf (COTS) software. For the most part you can treat this as low priority; it tends to be reasonably well tested before it leaves the supplier. Not always, but you can be sure that you have bigger problems elsewhere. HOWEVER take a long look at any configuration or other input files supplied to this software; they tend to be less well controlled and hence more prone to error. Consider putting anything like this into category 3.
Ancient dragons. Twenty or thirty years ago your company commissioned a big piece of software which has since become a key part of your operation. Its poorly documented and maintained by a few aging programmers, but at least they do understand it (until you offer them early retirement in a round of cost-cutting).
Ad-hoc bits of stuff. These hang off the side of categories 1 and 2 like remora fish around sharks. Typically they convert between obsolete protocols, massage data formats, generate specialist reports, generate input tables and configuration files, and other odd jobs. All those spreadsheets will fit in here too, as will any particularly complicated configuration files from item 1. This stuff is risky because it was generally written on the cheap; poorly specified, documented and tested. As a result it tends to be fragile, and nobody quite knows how it all works or how to fix it when it goes wrong.

I'm going to call all this stuff "software" for simplicity, even though some of it is not normally considered to be a "computer program". From the risk point of view its all the same.

Start with the stuff in item 3, and then work upwards. For each piece of software, figure out where the output goes and how important it is. Make a short list of the biggest risks based on the initial analysis, and then produce Value At Risk figure for each bit of software if it either fails to work or produces the wrong answer (the Knight Capital Group algorithm, for instance, would have had zero impact if it had simply done nothing, but the RBS account update was a problem precisely because it did nothing). Don't forget to include reputation, regulatory and compensation costs in the analysis.

Lesson 3: Risk cannot be outsourced

You've probably got some outsource contracts already. If any of your high-risk software is outsourced, either as software maintenance or operation, then compare the penalty clauses in your contract with your VAR. You will probably find at least one order of magnitude difference, if not two or three.

In general you cannot pay a supplier enough to take on your risk. If you are not careful you will find that your supplier has all the power to control the service while you have all the responsibility for their failures.

Consider running your own acceptance tests on external software development. Yes, the supplier has already run a bunch of tests, but you still need to check that it works in your context before it goes live. If that means you need a whole duplicate IT set-up for testing then so be it. I haven't seen an analysis of the Knight Capital Group failure, but I'll bet dollars to doughnuts that lack of test infrastructure was an important element.

Take a look at Section 2.2.4 in Nancy Leveson's book Engineering A Safer World. On one level, the Bhopal disaster was about engineering and procedural failure; an essential safety step was omitted during a routine procedure. But behind that was a long history of cost reduction and outsourcing; skilled staff were replaced by external contractors and training programs were cut back over several years until a serious accident became inevitable.

(Aside: the system safety world has faced similar issues, and tends to have better documented case histories because loss of life generally triggers public scrutiny: you could learn a lot from reading about system safety).

IT has many of the same properties as a chemical plant; it is complicated, it requires ongoing maintenance and operation by skilled personnel, and a small mistake can cause a disaster. Companies manage IT as a cost centre, and cost centres exist to be minimised. Hence there is always pressure to replace expensive staff with cheaper, lower skilled replacements, or to outsource the whole thing to someone who promises to do it cheaper, maybe in another country. Over time an accident becomes more and more likely, and eventually inevitable.

Lesson 4: Pay attention to process

Process, meaning the steps your people go through to carry out part of the operation, needs to be considered as a component of the overall system, and treated almost as if it were a piece of software. The difference is that any complicated process will, sooner or later, include a mistake. This was probably a significant component in the RBS debacle.

Complicated manual processes can be automated. This often means creating another bit of remora software, but that is actually preferable to a manual process.

Systematise your processes; make sure they are written down and followed. Keep them up to date.

Pay attention to change control and configuration management. Lots of mistakes stem from the wrong version of some file being used. You should know the version of every piece of software you are using, but equally any kind of configuration file should also be under the same kind of control.

Lesson 5: Listen to your engineers

Neither managers nor engineers can see the whole story; if you listen only to the managers then you will sleepwalk into disaster. If you listen only to engineers then you will wind up commissioning another Concorde or Advanced Gas Cooled Reactor (both of which were created by senior engineers who wouldn't listen to the accountants). The trick is to listen to both.

The managers will tell you about how to trim costs. The engineers will tell you why that is a bad idea. They will also tell you about the current problems and hazards.

Do not trust your management reporting chain to tell you this stuff. No manager wants to take a problem to his boss, so the instinctive response to an engineer or lower level manager with a problem is to solve it quickly, or failing that, put a lid on it. An engineer who says "this is fragile" will usually be told "We haven't got time for that now, but we'll fix it when the current urgent project is finished". Of course there is always another urgent project, and so the fix is postponed indefinitely, and no information about the problem percolates upwards.

A related issue is "technical debt". This is incurred whenever a project makes an expedient short-term decision (usually to meet a project deadline) that has long term costs. Examples include engineering kludges (such as adding one of those remora boxes I talked about earlier) or skimping on documentation. The analogy is with financial loan; you get a short term productivity boost (the loan), but a long term productivity cost (the interest) until you go back and fix the original short-cut (paying off the loan). Imagine the financial chaos if every software project in your company was allowed to borrow money off the books. Then imagine the technical chaos caused by uncontrolled accumulation of technical debt across all your different systems.

Conclusion

You can stop your corporate IT blowing up in your face, but it takes attention to the details. If you treat IT as a dumb cost centre (like the staff canteen or building maintenance) then you won't just have a slightly shabby IT service, you will have an unstable foundation for your company that could collapse at any moment.

Sunday, June 24, 2012

A secure bitcoin device

Bitcoins seem to be here to stay. They are being used an increasing amount, in ways both legitimate and illegitimate. But security is a problem for any system where irrevocable and (almost) untraceable transactions can move significant value. The conventional banking system has evolved a system of traceability that lets it wind back fraudulent or simply erroneous transactions, but bitcoins lack such safeguards. Added to this is the fact that any bitcoin wallet system has to be connected to other untrusted computers in order to be useful (and in practice that usually means the Internet). Malware exists that automatically hunts for bitcoin wallets and empties them.

In short, keeping BTC on your home PC is about as secure as keeping physical cash in a pot on the mantelpiece while having your house redecorated.

So what would it take to make a bitcoin wallet secure? The answer is, quite a lot.

Threat Analysis

Step one of security is a threat analysis: what are you protecting, who are the threat, and how well funded are they?

What? In this case lets assume that we want to protect a bitcoin wallet for common transactions, but the user has conventional bank accounts, pension fund and so forth holding the majority of their non-physical wealth. So the wallet typically only has the equivalent of $100-$200 in it, enough for a week's groceries. Very occasionally it may have enough for a bigger purchase, say $20,000 to buy a car. Lets also assume that bitcoins are in widespread use (suppose Amazon accepted them) and hence pretty much anyone with sense will have done the usual things to protect their wallets. (If that means buying our solution, then our solution is going to be protecting a lot of money, more on this later). This is also not going to protect people who want to keep large amounts of cash outside of a bank: they will need to take stronger measures; the bitcoin equivalent of a safe bolted to a wall rather than a cashbox in a drawer.

This leaves out a lot of use-cases: Amazon, in particular, are going to need to keep a float worth many thousands of dollars, if not millions. And behind them are going to be financial institutions with substantial holdings. But at that point custom security becomes feasible. This post is about protecting Joe Sixpack's wallet.

Who? Lets assume that Joe and Jane Sixpack know enough to keep their wallet physically protected, and can trust the people they let into the house, at least to the point of not picking their pockets. That's not always the case of course, but its a good starting assumption. Similarly we are not going to try to prevent them from transferring money to confidence tricksters. So that limits the threat to the digital equivalent of burglary or pick-pocketing; an untrusted outsider gains access to the wallet and steals the coins from it. In this case that would be various forms of digital intruder, either using real-time hacking or malware.

How well funded? Not all crime is rational, but it can still be a useful starting point to assume that the threat is a hypothetical rational criminal willing to invest resources in the expectation of a return on their investment. In other words we can assume that the resources available to roughly match the rewards on offer.

The two strategies available to an attacker are to take whatever cash happens to be in the wallet at the time, or to wait until a substantial sum is transferred in and take that. Given the likely time to wait for Joe and Jane to buy a new car (and even assuming that they pay for it using BTC instead of a debit card), its probably better to take the available cash immediately.

So the most lucrative form of theft would be a "class break" against all wallets of a particular type, followed by a swift emptying of those wallets before countermeasures could be taken. That would be very lucrative indeed. If you could compromise a million wallets with $100 worth of BTC each, you could take $100,000,000. The actual yield would be smaller due to the need to hide, launder and extract value from the cash. But clearly Joe and Jane Sixpack are going to have to be protected against some extremely well-funded adversaries.

A Dedicated Device

Rootkits that compromise virtual machines are already available and doing the rounds. So trying to wall off the wallet from the rest of a PC is not going to work. A secure bitcoin wallet has to be based on a dedicated platform. For the same reason this platform is going to need its own physical user interface: having it take orders to transfer money from an untrusted PC is as bad as having the wallet on the PC. So we need a device with enough computing power to send and receive BTC, plus a screen and a numeric keyboard for entering PINs and confirming transactions. When you want to transfer BTC to someone your computer sends the amount and ID of the destination wallet to your device, and the device then asks for independent confirmation of the transaction on its screen. As long as the device security is not compromised it is impossible to extract BTC from the device without a human being agreeing to it.

This implies a small device with a modest processor, a couple of gigabytes of flash, a keypad, a low-resolution LCD screen and a USB port. This is about the same specification as a cheap mobile phone, suggesting that such a device could be mass produced and sold for a few tens of dollars.

Clearly such a device is going to need a very high degree of internal security, but, given a well defined protocol for transaction requests from outside, this should not be a problem. There will also need to be a secure path for updated software and corresponding upstream security: the digital signature for software updates in particular would be a very tempting target for an attacker.

Backups

Clearly the wallet device may be damaged or suffer corruption. One solution would simply be to accept the risk, in the same way we accept that money is lost if a physical wallet gets destroyed in a fire. But computers fail rather more often than that, so a backup is probably necessary.

The problem is that a backup is also an attack avenue: because of the way bitcoin works, if you can get hold of a copy of someone's wallet then you can empty it using any PC. So any backup has to be just as secure, and yet kept reasonably up to date at the same time.

One option would be to keep the wallet on two independent SD flash cards configured for RAID 1: if one card fails it can be securely destroyed and replaced, and if the device fails then the cards can be moved to a new one. That just leaves data corruption and physical damage as risks. Corruption risk can be minimised by careful design of the software, such as keeping a known-good copy of the wallet as backup during a transaction and running validity checks before committing to the new one. Physical damage is a sufficiently remote possibility to be tolerable in this application.

Other Attacks

The kind of device described so far can be used to verify that money is being sent to a particular wallet, identified by a string of digits. As long as the user knows the destination wallet ID they can be sure that they have sent the right amount to the right person. But this creates a possible attack: malware on the user's PC could systematically replace receiving wallet IDs with those of the attacker. Thus when Joe Sixpack buys a laptop on Ebay he would see payment details specifying a wallet ID and, not realising that this was not the wallet of the vendor, unwittingly send payment to the thief.

In theory this can be avoided by a digital certificate tying the wallet to a particular person, to be verified and displayed by the wallet device as part of the transaction authorisation. But that requires a wider public key infrastructure that has so far proven expensive and fragile; it might work for large vendors, but not for small ones.

In the Meantime

For now most of these measures are not necessary. Bitcoin wallets are sufficiently rare that simple measures, such as keeping your wallet on a dedicated virtual machine, are probably sufficient. But if bitcoins become a widespread and popular form of payment then standardised security solutions will become necessary, and any standardised security will be a target for class breaks that will enable many users to be attacked at once.

If I were to use bitcoins today I would probably put them on a dedicated Raspberry Pi set to make regular encrypted backups to a share on my regular PC, and write down the (very long and random) key somewhere safe. But not everybody is going to be able to set up such a system themselves.