Paul's Pontifications: 2012

Monday, August 6, 2012

The Rise of the Me-Burger?

Back in February news broke that a university researcher had successfully grown bovine muscle tissue in a lab, although a prototype burger made this way would cost an estimated £200,000.

I was immediately reminded of an Arthur C. Clarke short story (spoiler alert) in which synthetic meat is routinely eaten, but its nature is disguised for marketing purposes. The latest product is the wildly popular "Ambrosia Plus", which turns out to be synthesized human meat.

Could this be about to come true? Given that synthetic meat is going to cost more than animal meat for the foreseeable future, its only niche is going to be as a status symbol for rich people. As Clarke pointed out, up until now human meat has been almost unobtainable, and in most societies the steps required to obtain it have made it taboo. But soon it seems likely that anyone with the necessary money will be able to have a sample without any ethical concerns (although the squick remains).

But then, why not go one step further? The original sample from which the synthetic human-burger is grown has to come from someone, and the identity of that someone could become a marketing point. One can envisage a particularly egocentric billionaire offering his guests burgers cloned from himself. Or perhaps famous people will find themselves being offered large sums of money for a biopsy. Would you like to nibble on Naomi Campbell? Or perhaps a bite of Usain Bolt?

Saturday, August 4, 2012

Managing Risks in Company IT

Another month, another story of corporate IT going spectacularly wrong. First we had the RBS debacle, now it seems that Knight Capital Group have lost hundreds of millions to a rogue algorithm. Similar events have happened before. And its not just banks that suffer from such events.

There are lessons to be learned here. Some of them I draw directly from the events linked above. Others are derived from watching large software-intensive organizations from the inside.

Lesson 1: It happens without warning

Your business is ticking over nicely. You have just approved the annual IT budget, and your CIO has assured you that everything is green. Then at 6am you get a phone call telling you that your annual profit for this year, and maybe your entire company, has just vanished into thin air courtesy of a computer that you own but quite possibly have never heard of. How can this happen?

The answer is that computers are non-linear; small changes can have huge consequences. Change a plus to a minus somewhere in a program with 100,000 lines of code (which is fairly typical) and if you are lucky you will get no output, and if you are unlucky you will get the wrong output.

Mistakes like that happen all the time. As a rough rule of thumb, when programmers type code they make a mistake every 10 lines or so. Everything after that is about finding and removing those mistakes. On top of that you have the mistakes that were baked in at the specification stage (assuming that your software even has a written specification; if you are relying on a programmer having an informal chat with the person who wants the program then you are in even worse shape).

The only way to prevent this stuff happening is to treat the technology as important.

Lesson 2: Do a Risk Assessment

Do an inventory of every single program used by the company in regular business. If anyone has a spreadsheet file that they regularly use, treat that as a separate program. You may have to do some digging to find these things, but that spreadsheet that some bright intern in Operations invented last year to schedule the truck drivers could be the one that paralyses your entire operation next February 29th.

When you have done this you will have a depressingly long list that fits into roughly three categories, listed here in ascending order of risk:

Commercial Off The Shelf (COTS) software. For the most part you can treat this as low priority; it tends to be reasonably well tested before it leaves the supplier. Not always, but you can be sure that you have bigger problems elsewhere. HOWEVER take a long look at any configuration or other input files supplied to this software; they tend to be less well controlled and hence more prone to error. Consider putting anything like this into category 3.
Ancient dragons. Twenty or thirty years ago your company commissioned a big piece of software which has since become a key part of your operation. Its poorly documented and maintained by a few aging programmers, but at least they do understand it (until you offer them early retirement in a round of cost-cutting).
Ad-hoc bits of stuff. These hang off the side of categories 1 and 2 like remora fish around sharks. Typically they convert between obsolete protocols, massage data formats, generate specialist reports, generate input tables and configuration files, and other odd jobs. All those spreadsheets will fit in here too, as will any particularly complicated configuration files from item 1. This stuff is risky because it was generally written on the cheap; poorly specified, documented and tested. As a result it tends to be fragile, and nobody quite knows how it all works or how to fix it when it goes wrong.

I'm going to call all this stuff "software" for simplicity, even though some of it is not normally considered to be a "computer program". From the risk point of view its all the same.

Start with the stuff in item 3, and then work upwards. For each piece of software, figure out where the output goes and how important it is. Make a short list of the biggest risks based on the initial analysis, and then produce Value At Risk figure for each bit of software if it either fails to work or produces the wrong answer (the Knight Capital Group algorithm, for instance, would have had zero impact if it had simply done nothing, but the RBS account update was a problem precisely because it did nothing). Don't forget to include reputation, regulatory and compensation costs in the analysis.

Lesson 3: Risk cannot be outsourced

You've probably got some outsource contracts already. If any of your high-risk software is outsourced, either as software maintenance or operation, then compare the penalty clauses in your contract with your VAR. You will probably find at least one order of magnitude difference, if not two or three.

In general you cannot pay a supplier enough to take on your risk. If you are not careful you will find that your supplier has all the power to control the service while you have all the responsibility for their failures.

Consider running your own acceptance tests on external software development. Yes, the supplier has already run a bunch of tests, but you still need to check that it works in your context before it goes live. If that means you need a whole duplicate IT set-up for testing then so be it. I haven't seen an analysis of the Knight Capital Group failure, but I'll bet dollars to doughnuts that lack of test infrastructure was an important element.

Take a look at Section 2.2.4 in Nancy Leveson's book Engineering A Safer World. On one level, the Bhopal disaster was about engineering and procedural failure; an essential safety step was omitted during a routine procedure. But behind that was a long history of cost reduction and outsourcing; skilled staff were replaced by external contractors and training programs were cut back over several years until a serious accident became inevitable.

(Aside: the system safety world has faced similar issues, and tends to have better documented case histories because loss of life generally triggers public scrutiny: you could learn a lot from reading about system safety).

IT has many of the same properties as a chemical plant; it is complicated, it requires ongoing maintenance and operation by skilled personnel, and a small mistake can cause a disaster. Companies manage IT as a cost centre, and cost centres exist to be minimised. Hence there is always pressure to replace expensive staff with cheaper, lower skilled replacements, or to outsource the whole thing to someone who promises to do it cheaper, maybe in another country. Over time an accident becomes more and more likely, and eventually inevitable.

Lesson 4: Pay attention to process

Process, meaning the steps your people go through to carry out part of the operation, needs to be considered as a component of the overall system, and treated almost as if it were a piece of software. The difference is that any complicated process will, sooner or later, include a mistake. This was probably a significant component in the RBS debacle.

Complicated manual processes can be automated. This often means creating another bit of remora software, but that is actually preferable to a manual process.

Systematise your processes; make sure they are written down and followed. Keep them up to date.

Pay attention to change control and configuration management. Lots of mistakes stem from the wrong version of some file being used. You should know the version of every piece of software you are using, but equally any kind of configuration file should also be under the same kind of control.

Lesson 5: Listen to your engineers

Neither managers nor engineers can see the whole story; if you listen only to the managers then you will sleepwalk into disaster. If you listen only to engineers then you will wind up commissioning another Concorde or Advanced Gas Cooled Reactor (both of which were created by senior engineers who wouldn't listen to the accountants). The trick is to listen to both.

The managers will tell you about how to trim costs. The engineers will tell you why that is a bad idea. They will also tell you about the current problems and hazards.

Do not trust your management reporting chain to tell you this stuff. No manager wants to take a problem to his boss, so the instinctive response to an engineer or lower level manager with a problem is to solve it quickly, or failing that, put a lid on it. An engineer who says "this is fragile" will usually be told "We haven't got time for that now, but we'll fix it when the current urgent project is finished". Of course there is always another urgent project, and so the fix is postponed indefinitely, and no information about the problem percolates upwards.

A related issue is "technical debt". This is incurred whenever a project makes an expedient short-term decision (usually to meet a project deadline) that has long term costs. Examples include engineering kludges (such as adding one of those remora boxes I talked about earlier) or skimping on documentation. The analogy is with financial loan; you get a short term productivity boost (the loan), but a long term productivity cost (the interest) until you go back and fix the original short-cut (paying off the loan). Imagine the financial chaos if every software project in your company was allowed to borrow money off the books. Then imagine the technical chaos caused by uncontrolled accumulation of technical debt across all your different systems.

Conclusion

You can stop your corporate IT blowing up in your face, but it takes attention to the details. If you treat IT as a dumb cost centre (like the staff canteen or building maintenance) then you won't just have a slightly shabby IT service, you will have an unstable foundation for your company that could collapse at any moment.

Sunday, June 24, 2012

A secure bitcoin device

Bitcoins seem to be here to stay. They are being used an increasing amount, in ways both legitimate and illegitimate. But security is a problem for any system where irrevocable and (almost) untraceable transactions can move significant value. The conventional banking system has evolved a system of traceability that lets it wind back fraudulent or simply erroneous transactions, but bitcoins lack such safeguards. Added to this is the fact that any bitcoin wallet system has to be connected to other untrusted computers in order to be useful (and in practice that usually means the Internet). Malware exists that automatically hunts for bitcoin wallets and empties them.

In short, keeping BTC on your home PC is about as secure as keeping physical cash in a pot on the mantelpiece while having your house redecorated.

So what would it take to make a bitcoin wallet secure? The answer is, quite a lot.

Threat Analysis

Step one of security is a threat analysis: what are you protecting, who are the threat, and how well funded are they?

What? In this case lets assume that we want to protect a bitcoin wallet for common transactions, but the user has conventional bank accounts, pension fund and so forth holding the majority of their non-physical wealth. So the wallet typically only has the equivalent of $100-$200 in it, enough for a week's groceries. Very occasionally it may have enough for a bigger purchase, say $20,000 to buy a car. Lets also assume that bitcoins are in widespread use (suppose Amazon accepted them) and hence pretty much anyone with sense will have done the usual things to protect their wallets. (If that means buying our solution, then our solution is going to be protecting a lot of money, more on this later). This is also not going to protect people who want to keep large amounts of cash outside of a bank: they will need to take stronger measures; the bitcoin equivalent of a safe bolted to a wall rather than a cashbox in a drawer.

This leaves out a lot of use-cases: Amazon, in particular, are going to need to keep a float worth many thousands of dollars, if not millions. And behind them are going to be financial institutions with substantial holdings. But at that point custom security becomes feasible. This post is about protecting Joe Sixpack's wallet.

Who? Lets assume that Joe and Jane Sixpack know enough to keep their wallet physically protected, and can trust the people they let into the house, at least to the point of not picking their pockets. That's not always the case of course, but its a good starting assumption. Similarly we are not going to try to prevent them from transferring money to confidence tricksters. So that limits the threat to the digital equivalent of burglary or pick-pocketing; an untrusted outsider gains access to the wallet and steals the coins from it. In this case that would be various forms of digital intruder, either using real-time hacking or malware.

How well funded? Not all crime is rational, but it can still be a useful starting point to assume that the threat is a hypothetical rational criminal willing to invest resources in the expectation of a return on their investment. In other words we can assume that the resources available to roughly match the rewards on offer.

The two strategies available to an attacker are to take whatever cash happens to be in the wallet at the time, or to wait until a substantial sum is transferred in and take that. Given the likely time to wait for Joe and Jane to buy a new car (and even assuming that they pay for it using BTC instead of a debit card), its probably better to take the available cash immediately.

So the most lucrative form of theft would be a "class break" against all wallets of a particular type, followed by a swift emptying of those wallets before countermeasures could be taken. That would be very lucrative indeed. If you could compromise a million wallets with $100 worth of BTC each, you could take $100,000,000. The actual yield would be smaller due to the need to hide, launder and extract value from the cash. But clearly Joe and Jane Sixpack are going to have to be protected against some extremely well-funded adversaries.

A Dedicated Device

Rootkits that compromise virtual machines are already available and doing the rounds. So trying to wall off the wallet from the rest of a PC is not going to work. A secure bitcoin wallet has to be based on a dedicated platform. For the same reason this platform is going to need its own physical user interface: having it take orders to transfer money from an untrusted PC is as bad as having the wallet on the PC. So we need a device with enough computing power to send and receive BTC, plus a screen and a numeric keyboard for entering PINs and confirming transactions. When you want to transfer BTC to someone your computer sends the amount and ID of the destination wallet to your device, and the device then asks for independent confirmation of the transaction on its screen. As long as the device security is not compromised it is impossible to extract BTC from the device without a human being agreeing to it.

This implies a small device with a modest processor, a couple of gigabytes of flash, a keypad, a low-resolution LCD screen and a USB port. This is about the same specification as a cheap mobile phone, suggesting that such a device could be mass produced and sold for a few tens of dollars.

Clearly such a device is going to need a very high degree of internal security, but, given a well defined protocol for transaction requests from outside, this should not be a problem. There will also need to be a secure path for updated software and corresponding upstream security: the digital signature for software updates in particular would be a very tempting target for an attacker.

Backups

Clearly the wallet device may be damaged or suffer corruption. One solution would simply be to accept the risk, in the same way we accept that money is lost if a physical wallet gets destroyed in a fire. But computers fail rather more often than that, so a backup is probably necessary.

The problem is that a backup is also an attack avenue: because of the way bitcoin works, if you can get hold of a copy of someone's wallet then you can empty it using any PC. So any backup has to be just as secure, and yet kept reasonably up to date at the same time.

One option would be to keep the wallet on two independent SD flash cards configured for RAID 1: if one card fails it can be securely destroyed and replaced, and if the device fails then the cards can be moved to a new one. That just leaves data corruption and physical damage as risks. Corruption risk can be minimised by careful design of the software, such as keeping a known-good copy of the wallet as backup during a transaction and running validity checks before committing to the new one. Physical damage is a sufficiently remote possibility to be tolerable in this application.

Other Attacks

The kind of device described so far can be used to verify that money is being sent to a particular wallet, identified by a string of digits. As long as the user knows the destination wallet ID they can be sure that they have sent the right amount to the right person. But this creates a possible attack: malware on the user's PC could systematically replace receiving wallet IDs with those of the attacker. Thus when Joe Sixpack buys a laptop on Ebay he would see payment details specifying a wallet ID and, not realising that this was not the wallet of the vendor, unwittingly send payment to the thief.

In theory this can be avoided by a digital certificate tying the wallet to a particular person, to be verified and displayed by the wallet device as part of the transaction authorisation. But that requires a wider public key infrastructure that has so far proven expensive and fragile; it might work for large vendors, but not for small ones.

In the Meantime

For now most of these measures are not necessary. Bitcoin wallets are sufficiently rare that simple measures, such as keeping your wallet on a dedicated virtual machine, are probably sufficient. But if bitcoins become a widespread and popular form of payment then standardised security solutions will become necessary, and any standardised security will be a target for class breaks that will enable many users to be attacked at once.

If I were to use bitcoins today I would probably put them on a dedicated Raspberry Pi set to make regular encrypted backups to a share on my regular PC, and write down the (very long and random) key somewhere safe. But not everybody is going to be able to set up such a system themselves.

Tuesday, May 15, 2012

Online lectures, early movies, and the Open University

There is a lot of excitement floating around the blogosphere right now about various experiments in remote learning. Some of the big name US universities are putting substantial amounts of material on line: lectures, course material, basically everything except the diploma. This post describes how an 11 year old boy took a Stanford University course in game theory (written by his proud Dad, but still...)

One thing the boy said to his father was "I think the concepts are interesting but the presentation is dull. Couldn’t they have done animations and things to make it better?". The presentation in question was Powerpoint slides with an occasional lecturer's head.

This reminds me of the early days of movie making. When the movie camera was first invented it seemed obvious how to film a drama: put some actors on a stage to perform a play, put a camera about where the best seat would be, and film the action. Of course this combined the worst features of film (monochrome, no sound, low resolution) with the worst features of theatre (action a long way away, small stage, static scenery). It didn't take movie makers long to realise that taking the actors out of the theatre and putting the camera in amongst them gave better results. The same goes for video lectures.

The sad thing is, none of this is new. When I was young there was (and still is) a UK institution called the Open University. It was created back in 1969 by the Labour government as part of its anti-elitist education-for-all vision. From the beginning it was designed as a mass-market concept: very few students would be at its physical campus; almost all its work would be done by distance learning, with lectures delivered by television, audio cassette (link for too young to know what they were) and any other form of high technology communications medium. So for many years you could get up at 6:00 am and watch a couple of hours of half hour lectures on anything from Mathematical Modelling to Sociology 101. In my teenage years I finally got a TV in my bedroom and set up a timeswitch to turn it on at 6:00am. I still remember the theme tune to Sociology 101: "We socialise and we vandalise, We lock the sick awaaayy, Politicians promises, keep changing every daaayy...".

The presentation was similar to a normal documentary, albeit with a drastically reduced budget; a combination of talking heads, pictures of the thing being described (Sociology 101 showed lots of deprived multi-storey housing estates of the kind the Americans call "projects"), and various visual aids. Here are some examples from 1989 (Youtube).

One I remember in particular; it was explaining trigonometry. First it showed a circle with a radius line revolving around it, and the right angle triangle that resulted. Then it turned the circle end on, so all you could see was a vertical line with the blob at the end of the radius moving up and down. Then it moved the line to the right, and the blob traced out a sine wave. And the lightbulb went on in my head. I knew about sine and cosine for right-angled triangles, and I'd seen that shape called a "sine wave", but until that point I hadn't understood the connection. Now I understood perfectly.

MIT, Stanford and the rest need to get back to the future (although they can leave out the flares and courderoys).

Paul's Pontifications