How best to do the open notebook thing…a nice specific example

Peter Murray-Rust is going to take an Open Notebook Science approach to a project on checking whether NMR spectra match up with the molecules they are asserted to represent. The question he poses is how best to organise this. The form of an open notebook seems to be a theme at the moment with both discussions between myself and Jean-Claude Bradley (see also the ONS session at SFLO and associated comments) as well as an initiative on OpenWetWare to develop their Wiki notebook platform with more features. There are many ideas around at the moment so Peter’s question is a good specific example to think about.

As I understand Peter’s project the plan is as follows;

  1. Obtain NMR spectra from a public database and carry out a high level QM calculation to see whether this appears consistent with the molecule that the spectra is supposed to represent.
  2. Expose the results of this analysis useful form.
  3. Identify and prioritise examples where the spectrum appears to be ‘wrong’. The spectrum could be misassigned, the actual molecule could be wrong, or the calculation could be wrong.
  4. Obtain feedback on the ‘wrong’ cases and attempt to correct them through a process of discussion and refinement

So there are several requirements. The raw data needs to be presented in a coherent and organised fashion. Specific examples need to be ‘pushed out’ or ‘alerted’ so that knowledgeable and interested people are made aware and can comment and (and this is separate from commenting) further detailed discussion is enabled and recorded for the record. In addition there are the usual requirements for a notebook or a scientific record. The raw data must remain inviolate and any modifications must be recorded along with the process that generated the data. There will also presumably be a requirement to record thought processes and realisations as the process goes forward.

My suggestion is as follows:

  • The raw data is generated by a computational and repititive process so I imagine it is highly structured. I would use a template web page, possibly sitting within a Wiki but not editable, to expose these. This would include details of what was run and how and when. This would be machine generated as part of the analysis. Obviously appropriate tagging will play an important role in allowing people to parse this data.
  • A blog to provide two things. An informal running commentary of what is going on, what the current thought processes are, and what is being run and ‘alerts’ of specific examples which are interesting (or ‘wrong’). This is largely human generated, although the ‘alerts’ could be automated.
  • A wiki to enable discussion of specific examples and detailed comparisons by outside and inside observers. As Peter suggests in his draft paper, specific groups, both functional and academic, may show up as problems but predicting these in advance is challenging. A wiki provides a free form way of letting people identify and collate these. It may be appropriate to (automatically or manually) post comments from the blog into the wiki (which would also provide reliable time stamps and histories, not available in most standard blog engines).

So my answer to Peter’s question which might have been paraphrased as ‘Which engine is the best to use?’ is all of them. They all provide functionality that is important for the project as I understand it but none of them provide enough functionality on their own. An interesting question which would arise from this combination of approaches is ‘where is the notebook?’ to which I will admit I don’t have an answer. But I’m not sure that it matters.

This doubling up mirrors current practise both in Jean-Claude’s group where the UsefulChem wiki is the core notebook but the Blog is used for high level discussion. Similarly I am moving towards using this Blog for higher level discussion of results but the chemtools blog as more of a data repository. At Southampton we are thinking about the notion of ‘publishing’ from the Blog to a Wiki once a protocol or set of results is sufficiently established as Step 1 on the way to the paper.

Finally a throw away suggestion. Peter, if you want to get a lot of spectra with a lot of associated molecules, without any concerns about publisher copyrights, then consider opening this up as a service for graduate students to check their NMR assignments. I bet you get inundated…

6 Replies to “How best to do the open notebook thing…a nice specific example”

  1. In a “dry lab” the appearance of the notebook will surely be different – the key point of ONS is transparency and “no insider information”. It will certainly be interesting to see how all this evolves.
    Concerning the experimental data, in this case I think Peter is using self-reported lists of C NMR peaks. He will be able to check for plausability of peak assignments but in the cases where there are problems it will be hard to know why. Hopefully chemists will start publishing their entire spectra in an expandable format like JCAMP. That way anomalous spectra can be investigated by anyone – to check for the reference peak and how much noise there is. Quaternary carbons can be tricky to distinguish from noise.

  2. In a “dry lab” the appearance of the notebook will surely be different – the key point of ONS is transparency and “no insider information”. It will certainly be interesting to see how all this evolves.
    Concerning the experimental data, in this case I think Peter is using self-reported lists of C NMR peaks. He will be able to check for plausability of peak assignments but in the cases where there are problems it will be hard to know why. Hopefully chemists will start publishing their entire spectra in an expandable format like JCAMP. That way anomalous spectra can be investigated by anyone – to check for the reference peak and how much noise there is. Quaternary carbons can be tricky to distinguish from noise.

  3. I am not sure that the difference is between wet lab and dry lab but down to the differences in how structured the data are. I want to follow up on this but I am thinking that one of the big differences between synthetic chemistry and molecular biology is that in synthetic chemistry the data (and procedures) are much more structured. Essentially inputs (molecules), outputs (molecules and data), linked by a procedure where you can (and do) use a template for the majority of procedures.

    In Mol Biol the actual procedures are more stereotyped but there are more of them. I think this is what is making our blog is harder to read and organise than UsefulChem and why it is more important for us to have a ‘higher level’ description or visualisation.

  4. I am not sure that the difference is between wet lab and dry lab but down to the differences in how structured the data are. I want to follow up on this but I am thinking that one of the big differences between synthetic chemistry and molecular biology is that in synthetic chemistry the data (and procedures) are much more structured. Essentially inputs (molecules), outputs (molecules and data), linked by a procedure where you can (and do) use a template for the majority of procedures.

    In Mol Biol the actual procedures are more stereotyped but there are more of them. I think this is what is making our blog is harder to read and organise than UsefulChem and why it is more important for us to have a ‘higher level’ description or visualisation.

Comments are closed.