PMR’s Open Notebook Project continued

This is reply continuing the conversation with Peter Murray-Rust on his plans for an Open Notebook Science based project. I have cut a lot of the context to keep the post size to a manageable level so if you want to track back see the original two posts from Peter, my response, and Peter’s response to that in full.

I should add that I am not a coder in any form so where this gets technical I am proposing things in principle (or hand waving as some might put it :).

PMR: We’ll create plots for ALL molecules and spectra. However it may not be always to identify what is “wrong”. Thus a bad TMS value (e.g. if the solvent is wrong) will shift all the values. So we may give a revised line (y = x –> y = x + c).

…and…

PMR: Yes. We’ll probably do this by RMS deviation and we could colour the table of contents or something similar. It may not be easy to make generic corrections over several thousand files. (Hang on – the files are in CML so it’s trivial).

…and…

PMR: Yes. It may not bbe trivial to correct them – we shan’t have a chemical editor in the Wiki, so it may be an idea to have a molecule upload. However the details often bite hard.

The most practical approach may simply be to let people flag things or suggest other solutions, either through comments on a blog or as additions on the wiki (which could all be aggregated into an RSS feed) and then to re-run the spectra with the new molecule/assignment/solvent by hand (or rather put it in the queue by hand). I imagine having a ‘Wrong Spectra Blog’ which has both a conventional comment button as well as a ‘propose correction’ button which still posts a comment but flags this for easy aggregation and possibly prompts people to drop in an InChi/CML/Smiles code (aggregation from multiple RSS feeds and filtering is do-able e.g. via Yahoo Pipes – grab RSS feed(s) from comments and then filter for a specific tag code, the comment is then in effect an RDF triple ).

Although if there is an entry page then presumably someone could run an existing spectra against a new assignment/molecule. Mainly a question of providing a link to make this easy for people. This could be in the template. Then how do you associate an attempted correction with the original ‘wrong’ spectra? Well that is where the details bite I suspect.

There is also the issue of implementation of all of this:

PMR: Yes. I am not yet sure how to insert machine-generated pages into a Wiki and we’ll value help here. The pages will certainly NOT be editable. Any refinement of the protocol or correction will generate a NEW job, not overwrite the last one.

…and…

PMR: I think we are clearly going to have a new blog. What I’m not clear is how we post comments from the blog to the Wiki and alert the Wiki from the blog.

This is not a world away from the blogging instruments developed in Jeremy Frey’s group at Southampton Uni. In that case the instruments themselves post a blog entry each time a sample is analysed. Here we are talking about a computational analysis but the principal is the same. Both Word Press and MediaWiki have (I believe – this is where I get out of my depth) quite sophisticated APIs that could enable automated posting.

There is some information at OpenWetWare (see particularly Julius Luck’s comments) about interfacing with MediaWiki that came up during the recent discussion on lab books). I believe there was an intention at some point to attempt to integrate the OpenWetWare blogs (like this one) with the OpenWetWare wiki at some level but I don’t think this is currently a priority for them.

I imagine it ought to be possible to write plugins for both MediaWiki and WordPress that would provide a button for each post/comment to ‘Publish to Wiki/Blog’. I don’t think this needs to be automated as I would see this as precisely the point at which human intervention is helping things along. The researcher may wish to publish a set of Blog comments to the Wiki to encourage more detailed discussion or conversely may wish to post a good solution from the Wiki to the Blog to alert people that something interesting has popped up.

CN: An interesting question which would arise from this combination of approaches is ‘where is the notebook?’ to which I will admit I don’t have an answer. But I’m not sure that it matters.

PMR: I am not worried about where the notebook is (though it could be difficult to “lift it up” by a single root.

I think this shows that by expanding the functionality of the ‘lab notebook’ we are starting to break our underlying idea of what that notebook is. Experimental scientists think very much in terms of a monolithic object bound in nice hard covers (even though this bears very little resemblance to reality)  and the idea that it can become a diffuse object distributed through a series of different repositories and journals is a bit discomforting. What is becoming clear to me is that we are starting to capture much more than just the raw data or procedures that go into the lab book itself. Keeping an electronic lab notebook is more work than a paper one, primarily because most of us don’t use paper notebooks very effectively.

Postscript: I’ve edited this slightly as of 10:36 GMT 14-October because somethings were unclear. I’ve used the tag pmr-ons to collect these posts.