I work in IT, computers, geeky stuff. Having been a programmer I now manage a team of them, building new systems and supporting old ones. Having a science PhD is quite a common background in my line of business. Not the best use of my scientific training perhaps, but at least I don’t work for an investment bank.
Computer programs are sets of instructions written by humans, who get things wrong. The instructions are then followed by computers, which are stupid. Throw into the mix the fact that the people who pay for software often don’t understand IT and sometimes don’t really know what they want and you get an explanation for the high cost of the average IT project. [Or so I’ve heard, obviously my current and any future employers are exceptions from this depressing picture *cough*]. A big part of the cost of software is testing it to remove bugs. Removing them all is nearly impossible (you’ve used software, so you know what I mean). Simply removing the embarrassing ones is hard enough. Many bugs are where the person writing the code has made an obvious mistake. These are easy to diagnose and fix.
The bugs that the manager in me most fears (and the geek in me quite enjoys) are the hard ones. The ones where the fault is intermittent, strange and where you don’t even know where the fault lies.
In the old days, computers were big humming boxes in the corner of the room. There was only one set of code to go wrong. This old mainframe code is fantastically successful and is still running in many companies. If you perform a financial transaction it is likely that some of the code involved was written in the 1960s or 70s by a quietly spoken man with a beard and flared trousers.
Now life is more interesting. Imagine this web-page has gone wrong in some way. Where is the fault? It could be the blogging software, or the database software the blogging software uses, or the operating system of the server hosting this site. Or it could be a router, DNS server, piece of cable, piece of wire, internet cache or network accelerator anywhere between the server and your computer. Or it could be your browser, or your operating system, or you. Yes you! You might raise a bug saying this post should be more interesting: clearly nonsense, user error!
You take my point. Most of the applications my team supports are web-based. Often if a user raises an bug we don’t know where the problems lies. I work for a big company and our systems are much more complicated than my example.
Inexperienced developers when confronted by an urgent bug often dive into the code. They feel safe there, they feel like they are doing something. Wiser heads take a step back and start formulating hypotheses and collect data to falsify them. In order to fix a complex problem you must first understand it. This requires you to engage with a large and complicated situation, decide what you think is going on and then collect more information to see if you are right or not. A geological training is a good preparation for this. We know that there are variety of ways of gathering more information: a literature search, mapping, thin-sections, microprobe, isotopic analysis and so on. Deciding between more logging, sniffing the network, finding out how many users the problems affects or getting details of a user’s desktop configuration is a very similar process.
With Geology, understanding something is the goal. When fixing a bug, this is only the first stage, but it is usually the hardest. If the problem is in your code, a fix may be easy, it may be hard but at least the solution is in your own hands. If you need a fix quickly, then there is always some dirty hack tactical enhancement available. If you are unlucky and the problem is in someone else’s code then having an explanation complete with evidence is vital. Getting someone else to take responsibility for ‘your’ problem is basically against human nature. The trick is to have compelling evidence and to use it to make your case clearly and concisely. Again, my geo soft skills come into play.
Just to close the conceptual loop, consider the ‘faster than light’ neutrino situation. The last I read the ‘impossible’ result has been traced back to a fault in a cable in a GPS device. Surely this is a case of scientists using the scientific method to find bugs in systems? To be fair though, I’ve never yet invoked Einstein’s theory of relativity to solve one of mine.