Home » Blog

Recording the fiddly bits of experimental and data analysis work

9 December 2008 12 Comments

We are in the slow process of gearing up within my group at RAL to adopting the Chemtools LaBLog system and in the process moving properly to an Open Notebook status. This has taken much longer than I had hoped but there have been some interesting lessons along the way. Here I want to think a bit about a problem that has been troubling me for a while.

I haven’t done a very good job of recording what I’ve been doing in the times that I have been in a lab over the past couple of months. Anyone who has been following along will have seen small bursts of apparently unrelated activity where nothing much ever seems to come to a conclusion. This has been divided up mainly into a) a SANS experiment we did in early November which has now moved into a data analysis phase, b) some preliminary, and thus far fairly unconvincing experiments, attempting to use a very new laser tweezers setup at the Central Laser Facility to measure protein-DNA interactions at the single molecule level and c) other random odds and sods that have come by. None of these have been very well recorded for a variety of reasons.

Data analysis, particularly when it uses a variety of specialist software tools, is something I find very challenging to record. A common approach is to take some relatively raw data, run it through some software, and repeat, while fiddling with parameters to get a feel for what is going on. Eventually the analysis is run “for real” and the finalised (at least for the moment) structure/number/graph is generated. The temptation is obviously just to formally record the last step but while this might be ok as a minimum standard if only one person is involved, when more people are working through data sets it makes sense to try and keep track of exactly what has been done and which data has been partially processes in which ways. This helps us both in terms of being able to quickly track where we are with the process but also reduces the risk of replicating effort.

The laser tweezers experiment involves a lot of optimising of buffer conditions, bead loading levels, instrumental parameters and whatnot. Essentially a lot of fiddling, rapid shifts from one thing to another and not always being too sure exactly what is going on. We are still at the stage of getting a feel for things rather than stepping through a well ordered experiment. Again the recording tends to be haphazard as you try on thing  and then another. We’re not even absolutely sure what we should be recording for each “run” or indeed really what a “run” is yet.

The common theme here is “fiddling” and the difficulty of recording it efficiently, accurately, and usefully. What I would prefer to be doing is somehow capturing the important aspects of what we’re doing as we do it. What is less clear is what the best way to do that is. In the case of data analysis we have good model for how to do this well. Good use of repositories and the use of versioned scripts for handling data conversions, in the way the Michael Barton in particular has talked about provide an example of good practice. Unfortunately it is good practice that is almost totally alien to experimental biochemists and is also not easily compatible with a lot of the software we use.

The ideal would be a work bench using a graphical representation of data analysis tools and data repositories that would automatically generate scripts and deposit these and versioned data files into an appropriate repository. This would enable the “docking” of arbitrary web services, software packages and whatever, as well as connection to shared data stores. The purpose of the workbench would be to record what is done, although it might also provide some automation tools. In many ways this is what I think of when look at work flow engines like Taverna and platforms for sharing workflows like MyExperiment.

Its harder in the real world. Here the workbench is, well, the workbench but the idea of recording everything along with contextual metadata is pretty similar. The challenge lies in recording enough different aspects of what is going on to capture the important stuff without generating a huge quantity or data that can never be searched effectively. It is possible to record multiple video streams, audio, screencast any control computers , but it will be almost impossible to find anything in these data streams.

A challenge that emerges over and over again in laboratory recording is that you always seem to not be recording the thing that you really now need to have. Yet if you record everything you still won’t have it because you won’t be able to find it. Video, image, and audio search will one day make a huge difference to this but in the meantime I think we’re just going to keep muddling on.


12 Comments »

  • Jonathan said:

    I think there’s the danger of going to the other extreme and recording too much. Take data analysis: many programs such as FISH (SAS analysis software) record a log file of every action taken, including all fiddling. Yet no-one’s going to wade through all of that to find the few useful steps. It requires the person carrying out the analysis to use their discretion as to when it’s worth recording a step. This smaller amount of information could be a much more useful record for other people wishing to carrying out similar analysis.

  • Jonathan said:

    I think there’s the danger of going to the other extreme and recording too much. Take data analysis: many programs such as FISH (SAS analysis software) record a log file of every action taken, including all fiddling. Yet no-one’s going to wade through all of that to find the few useful steps. It requires the person carrying out the analysis to use their discretion as to when it’s worth recording a step. This smaller amount of information could be a much more useful record for other people wishing to carrying out similar analysis.

  • Cameron Neylon said:

    Hi Jonathan, that’s kind of what I was trying to say, but approaching it from a slightly different direction. My thinking was that, as you say, there is a need to find a balance between not recording anything and recording so much that you can’t find anything. The real beauty would lie in recording everything in a such a cleverly contextually marked up way that in fact you could find things easily. Now that would be very nice. But I think a lot of it is down to adopting good habits like regularly checking in code.

  • Cameron Neylon said:

    Hi Jonathan, that’s kind of what I was trying to say, but approaching it from a slightly different direction. My thinking was that, as you say, there is a need to find a balance between not recording anything and recording so much that you can’t find anything. The real beauty would lie in recording everything in a such a cleverly contextually marked up way that in fact you could find things easily. Now that would be very nice. But I think a lot of it is down to adopting good habits like regularly checking in code.

  • Steve Koch said:

    Hi Cameron,

    I’m using the winter break to get on board with RSS and stuff. Found your blog again, and in particular this post. We’re getting our OTs online here at U. New Mexico (coincidentally for studying protein-DNA interactions) and with our interest in open notebook science, we share these issues you’re talking about. We’ll be doing our data acquisition and analysis in LabVIEW, so, we have a good way of storing “scripts” (so to speak). It would be easy to store all analysis attempts…but like you say, it would require very good habits as far as recording comments. So, I like your post a lot, but no good ideas yet.

  • Steve Koch said:

    Hi Cameron,

    I’m using the winter break to get on board with RSS and stuff. Found your blog again, and in particular this post. We’re getting our OTs online here at U. New Mexico (coincidentally for studying protein-DNA interactions) and with our interest in open notebook science, we share these issues you’re talking about. We’ll be doing our data acquisition and analysis in LabVIEW, so, we have a good way of storing “scripts” (so to speak). It would be easy to store all analysis attempts…but like you say, it would require very good habits as far as recording comments. So, I like your post a lot, but no good ideas yet.

  • Cameron Neylon said:

    Steve, Great to hear from you and good luck with having a go at this. I’ll be very interested to see how it works out for you. Currently working up a new post on connecting all of this stuff up but it may not get written properly until I get on my next plane ride…

  • Cameron Neylon said:

    Steve, Great to hear from you and good luck with having a go at this. I’ll be very interested to see how it works out for you. Currently working up a new post on connecting all of this stuff up but it may not get written properly until I get on my next plane ride…

  • Jim Procter said:

    hope the next post comes soon! It seems that theres a usable interface design for a new app lurking somewhere beyond the periphery at the moment.

  • Jim Procter said:

    hope the next post comes soon! It seems that theres a usable interface design for a new app lurking somewhere beyond the periphery at the moment.

  • Cameron Neylon said:

    Jim, thanks for the reminder. I’m well and truly behind on writing at the moment. Maybe a little will come tomorrow!

  • Cameron Neylon said:

    Jim, thanks for the reminder. I’m well and truly behind on writing at the moment. Maybe a little will come tomorrow!