The day my world stood still

Error lies at the heart of science; but there are a number of different kinds of “wrong”. An error in hindsight, where a past hypothesis of yours – perfectly reasonable at the time – is disproven by more accurate or complete measurements, or the discovery of unappreciated complexities in the system you’re studying, is just part of the scientific process, even if many scientists have trouble acknowledging such errors. An interpretive error – making questionable inferences from your experimental data, or wrongly using extant knowledge from other sources – is more serious, but is something that peer reviewers are usually only too happy to point out to you (there is, of course, a fine and fairly hazy line between a truly erroneous conclusion and a disagreement over how particular data should be interpreted). By far the worst breed of wrong, however, is the data error, where some sort of experimental or analytical error invalidates your results, and undermines any conclusions you draw from them. Discovering such an error means that corrections have to be issued, or in the worst cases, entire papers have to be withdrawn. It’s Not A Good Thing.
So you can imagine my horror last Friday, when I was confronted with the serious possibility that a correction error had invalidated all of the palaeomagnetic data I collected for my PhD, and hence the work which represented the bulk of my current publication record.

It all started with a puzzling discrepancy. As I’ve discussed before, a number of corrections are required to put directional palaeomagnetic data into the appropriate spatial reference frame. Last week, as I was starting to play around with the preliminary data from my first South African samples, I noticed that the software installed on the lab computer here and the software I’d used during my PhD produced significantly different corrected directions from the same drill core orientations. Both programs work their geometric magic by rotating the sample x, y and z axes to coincide with their original orientations, derived from your measurements in the field.


In both cases I was inputting the z-axis direction – the trend and plunge of the original drill holes – but getting completely different answers. Rather worryingly, this indicated that one of the two programs thought that I was inputting the direction of one of the other sample axes instead. At first I assumed that I was somehow mishandling the new and unfamiliar software, but at the end of last week – after lots of stereonet sketching – I realised that whilst the Jo’burg lab software was correctly rotating the z-axis to align with the measured core orientation, the software that I had used my PhD was rotating the x-axis to align with it instead. This resulted in a direction which was a large angular distance from where it should have been – and I had used exactly the same correction routine for all of the samples I analysed during my PhD. If I was right, this was no simple systematic error, either; not only were all my samples in the wrong reference frame, but each sample was in it’s own wrong reference frame. This would invalidate all my mean directions, all my statistical analyses, everything.
It was at about this point that an entire PhD’s worth of stress hormones decided to dump themselves into my bloodstream. It looked like all the cool and exciting conclusions from my PhD research were built on an exceedingly unsafe foundation. Also, if I was honest with myself, such a glaring mistake galloped far beyond the unfortunate, deep into the realm of the toweringly incompetent. Writing and publishing the necessary corrections and retractions would be equivalent to standing up in a room packed with everyone who might ever hire me and yelling “Hey! I’m a dumbass!”
It was a good job, then, that I turned out to be mistaken – although it took me until Monday to realise it. My salvation lay in the fact that samples were not placed in my old magnetometer in the ‘right’ orientation: what the machine measured as the x-axis was in fact the z-axis of the sample. This substitution meant that when the correction software rotated what it thought was the x-axis, it was actually rotating the z-axis of the sample, which meant that I was supplying the right direction and everything had ended up in the appropriate reference frame. ‘Phew’ is an understatement.
Stressful as it was for me, this whole little escapade brings up some interesting issues regarding the nature of the error-correcting machinery of science. It is often said that individual mistakes don’t particularly matter in the long run, because repeated experiments and testing of conclusions will progressively weed out bad data, mistaken hypotheses, and flawed models. But it has occurred to me that the sort of error that I briefly thought had scuppered my data would be extremely difficult to identify from what is in the public record. The data as presented in a paper are always cleaned up, processed, and corrected; ideally, then, the steps which culminated in the nice clean plots you reveal to the world are also described, so that people can confirm that you’ve treated the data properly. But this is generally a confirmation in principle, rather than taking your raw data and undertaking a time-consuming replication of each processing step. In other words, whereas people might notice if you don’t perform standard procedure x, provided that your numbers don’t look too spectacularly odd they are far less likely to notice if you perform procedure x wrongly.
It’s somewhat sobering to think how much everyone takes on trust that your experimental procedures have been performed correctly and as described. Of course, because it is still exceedingly rare for anyone outside of the author list to have easy access to all the raw data or code from a particular study, people have to trust that is the case. By choosing to place these things away from the scrutiny of the wider scientific community (the pros and cons of which are a whole other argument), the authors have an increased responsibility to audit their own data and procedures, and where mistakes are found, publicise corrections as widely as possible – however personally embarrassing that may be. I’m just glad that in this case, I don’t have to.

Categories: academic life

Comments (12)

  1. Bob O'H says:

    There was a case recently where someone in biology had a bug in their programme, so their published results were screwed.
    There was also a paper in Phytopathology about 10 years ago where the authors pointed out that the odd and interesting behaviour of a model in a previous paper was entirely due to rounding errors.
    Ah well, on the first line of my first ever paper, I got the name of my organism wrong.

  2. Kim says:

    It sounds as though the software and the magnetometer were set up to work together – as if someone corrected for a problem by redefining Z to be recorded as X?
    I would say that, in field-based geology at least, there are other ways of finding errors than re-crunching all the data. If the conclusions don’t make sense given other things that are known about the area, and if enough people care about the area, then it’s possible that someone will collect new paleomag data (or will re-map an area, or will date the rocks again, or analyze the geochemistry again, or…). And the paper with the data that didn’t make sense might be forgotten, even if it is never corrected.
    (I’m glad to hear that the data and your conclusions are ok, though!)

  3. Natalia S. says:

    Knowing how I myself sometimes handle experimental work, it makes me scared for all of science. So much of it is taken on faith. Maybe too much?
    Natalia S.

  4. Monado says:

    It’s a corollary of Murphy’s law that, when the new document comes back from the printer, you will see a typographical error in the first place you look.

  5. andy.s says:

    “The Immortal Cell” by Michael West has an example of this; West found a error that invalidated some of his own experiments when he was doing his thesis.
    Since his work was based on the work of his thesis advisor, he looked up the raw data for his advisor’s, too (without permission and when his advisor was out of town). Same error. It invalidated his advisor’s work and that of entire research program based on it. The upshot of it was…
    …well, just read the book.
    Glad to hear your’s was a near miss.

  6. Hank Roberts says:

    We’ve got Google Space, Google Moon, Google Mars and Google Superficial Earth …. isn’t it time you paleo scientists invited Google to take copies of your images and data sets so you all can look up anything anyone has done?
    Gavin at RC just mentioned this lack, on coming back from a China paleo meeting.

  7. travc says:

    I spent some time in a lab (as an undergrad) which did paleomagnetic studies (I was using the SQUID to look at biologically produced magnetic crystals)… It did bother me a great deal even as a naive undergrad how much of the measurements were “plug and pray” with respect to the equipment and the software. Biology is if anything worse, way too many black boxes where the researcher doesn’t really know what is going on.
    A quick example, I rewrote a mildly popular spacial statistics program because it was way way too slow (just intending to parallelizing it.) I had the source code and the original papers, and went to work… what should have taken a day took a week because the code (written by the authors of the paper no less) was horribly convoluted and ended up not actually implementing the math / algorithms from the paper. Cleaning up the code made the actual mistakes glaringly obvious, and actually sped up the program by better than a factor of 10 on reasonably large data sets without even parallelizing it. Anyway, fortunately the error introduced a relatively minor second-order effect for most data sets, but still everything published using it is still suspect. Worse, I communicated with the original author (much more of a ‘big name’ than little ol me) and got no response.
    I guess my overall point (other than taking the opportunity to tell stories) is that software is not to be ‘trusted’. Hell, even ‘standard methods’ shouldn’t be employed without seriously thinking about what they actually do and why. Personally, I tend to just write my own analysis software and double check with off-the-shelf stuff. Occasionally for truly hard stuff, it isn’t worth the effort to reinvent the wheel, but double checking with other methods / software is critical… of course, maybe this is why I don’t publish very often ;)

  8. Kim says:

    isn’t it time you paleo scientists invited Google to take copies of your images and data sets so you all can look up anything anyone has done?

    That would be really fantastic.
    I know there has been some work on compiling various sorts of geologic data into databases (maybe spatial databases?). I sent a bunch of my old geochronology to the state of Alaska this summer – apparently they’re trying to compile everything that’s been done in the state. It’s a start, at least.
    (I hope that nobody takes my data as straightforward ages, though, without reading all the caveats from the paper on it. It would be really easy to overinterpret/misinterpret my old data.)

  9. Chris says:

    It’s always vaguely frightening to me just how crufty expensive scientific equipment tends to be. My brief attempt to learn HPLC was mostly spent fighting with a truly awful software interface and retightening fiddly little pressure joins that didn’t really look like they were meant to be re-used.
    It’s a shame you can’t get funding to write software, because it would make an enourmous amount of sense for someone to develop an open source “analytical interface” that we could force manufacturers of fancy trinkets to use, instead of their proprietary inhouse rubbish. Same with statistics packages (I think you could argue that we should all be using open source stats programs if we’re serious about peer reviewed methods).

  10. BrianR says:

    Chris…I read this post knowing that it ended up good (or else you might not have blogged about it so nonchalantly), but still felt a deep nervousness as you described your situation…that horrible, sinking feeling of being not just wrong, but, as you said, incompetent.
    I’m glad it worked out…what a week it must’ve been!
    Hank, you say “isn’t it time you paleo scientists invited Google to take copies of your images and data sets so you all can look up anything anyone has done?”
    In principle, yes, but which images and datasets do you speak of? For example, if I made a paleogeographic map showing the position and configuration of an ancient river system, it would be extremely difficult to separate this map from all the underlying evidence and interpretations that led to drawing. Not only would all the observations and data have to be embedded in such an image, but there would have to be an explanation of how the conclusions were reached. Essentially, you’d have to attach the paper to the image. If it were done like that, I would be very excited to be able to access such a database…but only if ALL the foundation were attached.

  11. Bob O'H says:

    …it would make an enourmous amount of sense for someone to develop an open source “analytical interface”… Same with statistics packages…

    We’re trying to get funding to continue the BUGS project. It’s the main software for Bayesian analysis, but academic funding isn’t set up for things like this. I agree though, common interfaces and packages would improve matters immensely (the R crowd are trying that for bioinformatics, with the Bioconductor project). In practice, such a package would have to be open source to allow people to check the results.
    And an email came round this week saying that Excel 2007 can’t even multiply now!

  12. Mr. Gunn says:

    This doesn’t really hit at the heart of peer-review validation, at least not in biomedical science, because if the replication of work. If you say you isolated cells using surface marker X, Y, and Z, someone else had damn well be able to do the same. If you reported a different isolation strategy than what you actually used, they won’t be able to reproduce your results, and that will keep your mistake from propagating. Of course, much time can go by between your publication and the external validation.