More on the discussion of structured vs unstructured experiment descriptions. Frank has put up a description of the Minimal Information about a Neuroscience Investigation standard at Nature Precedings which comes out of the CARMEN project. Neil Saunder’s has also made some comments on the resistance amongst the lab monkeys to think about structure. Lots of good points here. I wanted to pick out a couple in particular;
From Neil;
My take on the problem is that biologists spend a lot of time generating, analysing and presenting data, but they don’t spend much time thinking about the nature of their data. When people bring me data for analysis I ask questions such as: what kind of data is this? ASCII text? Binary images? Is it delimited? Can we use primary keys? Not surprisingly this is usually met with blank stares, followed by “well…I ran a gel…â€.
Part of this is a language issue. Computer scientists and biologists actually mean something quite different when they refer to ‘data’. For a comp sci person data implies structure. For a biologist data is something that requires structure to be made comprehensible. So don’t ask ‘what kind of data is this?’, ask ‘what kind of file are you generating?’. Most people don’t even know what a primary key is, including me as demonstrated by my misuse of the term when talking about CAS numbers which lead to significant confusion.
I do believe that any experiment [CN – my emphasis] can be described in a structured fashion, if researchers can be convinced to think generically about their work, rather than about the specifics of their own experiments. All experiments share common features such as: (1) a date/time when they were performed; (2) an aim (â€generate PCR productâ€, “run crystal screen for protein Xâ€); (3) the use of protocols and instruments; (4) a result (correct size band on a gel, crystals in well plate A2). The only free-form part is the interpretation.
Here I disagree, but only at the level of detail. The results of any experiment can probably be structured after the event. But not all experiments can be clearly structured either in advance, or as they happen. Many can, and here Neil’s point is a good one, by making some slight changes in the way people think about their experiment much more structure can be captured. I have said before that the process of using our ‘unstructured’ lab book system has made me think and plan my experiments more carefully. Nonetheless I still frequently go off piste, things happen. What started as an SDS-PAGE gel turns into something else (say a quick column on the FPLC).
Without wishing to pick a fight, most people with a computer science background who lean towards the heavily semantic end of the spectrum are dealing with the wet lab scientists after the data has been taken and partially processed. I don’t disagree that it would help the comp sci people if the experimenters worked harder at structuring the data as they generate it, and I do think in general this is a good thing. The problem is that it doesn’t map well onto how the work is actually carried out. The solution I think is a mixture of the free form approach combined with useful tools and widgets that do two things: firstly they make the process of capturing the process easier; secondly the encourage the collection and structuring of data as it comes off. This is what the templates in our system do, and there is no reason in principle why they couldn’t be driven by agreed data models.
Actually the Frey group (who have done the development of the LaBLog system) already have a highly semantic lab book system developed during the MyTea project. One of our future aims is to take the best of both forward into a ‘semi-semantic’ or ‘freely semantic’ system. One of the main problems with implementing the MyTea notebook is that it requires data models. It was developed for synthetic chemistry but it would make sense, in expanding it into the biochemistry/molecular biology area to utilise existing data models with FuGE the obvious main source.
One more point: we need to teach students that every activity leading to a result is an experiment. From my time as a Ph.D. student in the wet lab, I remember feeling as though my day-to-day activities: PCR reactions, purifications, cloning weren’t really experiments […] Experiments were clever, one-shot procedures performed by brilliant postdocs to answer big questions […] Break your activities into steps and ways to describe them as structured data should suggest themselves.
This is very true, and harks back to my comment about language. A lot of the issues here are actually because we mean very different things by ‘experiment’. We probably should use better words, although I think procedure and protocol are similarly loaded with conflicting meanings. Control of language is important and agreement on meaning is, after all, at the root of semantics (or is that semiotics, I’m never sure…)
Excellent discussion.
One of the problems with the construct of “the experiment” is that it is really difficult to know when it is “done”. I try to get my students to bring them to completion as quickly as possible but new information can come many weeks or months later. This can be new characterization data (x-ray crystal structure for example) or an error was discovered (the starting material turned out to be bad for example).
Also the most interesting result may have nothing to do with the stated objective – so the hypothesis model of science doesn’t always play out neatly.
This is why I favor a results-oriented approach, where each result can be plucked out and archived on its own merits, completely independent of what happens later on in the experiment. In my lab that is typically an image of the reaction in progress but it could also be an NMR, the mass of product isolated, etc. That way results can be used for any type of analysis, without waiting for other parts of the experiment to be completed.
Excellent discussion.
One of the problems with the construct of “the experiment” is that it is really difficult to know when it is “done”. I try to get my students to bring them to completion as quickly as possible but new information can come many weeks or months later. This can be new characterization data (x-ray crystal structure for example) or an error was discovered (the starting material turned out to be bad for example).
Also the most interesting result may have nothing to do with the stated objective – so the hypothesis model of science doesn’t always play out neatly.
This is why I favor a results-oriented approach, where each result can be plucked out and archived on its own merits, completely independent of what happens later on in the experiment. In my lab that is typically an image of the reaction in progress but it could also be an NMR, the mass of product isolated, etc. That way results can be used for any type of analysis, without waiting for other parts of the experiment to be completed.
Jean-Claude, I think you’ve hit the nail on the head. That has really crystallised something for me. The point is that we can’t actually define the boundaries of the experiment until after it is done. We don’t know where it stops so applying a data model in advance is very challenging (although not impossible as long as it is sufficiently extendable).
I think this maps onto what I wrote last night. I imagine a ‘LabStream’ which the experimentalist then structures into a description of the ‘experiment’ after they’ve decided how to frame it.
Jean-Claude, I think you’ve hit the nail on the head. That has really crystallised something for me. The point is that we can’t actually define the boundaries of the experiment until after it is done. We don’t know where it stops so applying a data model in advance is very challenging (although not impossible as long as it is sufficiently extendable).
I think this maps onto what I wrote last night. I imagine a ‘LabStream’ which the experimentalist then structures into a description of the ‘experiment’ after they’ve decided how to frame it.