Reflections from a parallel universe

On Wednesday and Thursday this week I was lucky to be able to attend a conference on Electronic Laboratory Notebooks run by an organization called SMI. Lucky because the registration fee was £1500 and I got a free ticket. Clearly this was not a conference aimed at academics. This was a discussion of the capabilities and implications for Electronic Laboratory Notebooks used in industry, and primarily in big pharma.

For me it was very interesting to see these commercial packages. I am often asked how what we do compares to these packages and I have always had to answer that I simply don’t know, I’ve never had the chance to look at one because they are way to expensive. Having now seen them I can say that they have very impressive user interfaces with lots of integrated tools and widgets. They are fundamentally built around specific disciplines and this allows them to be reasonably structured in their presentation and organisation. I think we would break them in our academic research setting but it might take a while. More importantly we wouldn’t be able to afford the customisation that it looks as though you need to get a product that does just what you want it to. Deployment costs of around around £10,000 per person were being bandied around with total contracts costs clearly in the millions of dollars.

Coming out of various recent discussions I would say that I think the overall software design of these products is flawed going forward. The vendors are being paid a lot by companies who want things integrated into their systems so there is no motivation for them to develop open platforms with data portability and easy integration of web services etc. All of these systems run on thick clients against a central database. Going forward these have to go into web portals as a first step before working towards a full  customisable interface with easily collectable widgets to enable end-user configured integration.

But these were far from the most interesting things at the meeting. We commonly assume that keeping, preserving, and indexing data is a clear good. And indeed many of the attendees were assuming the same thing. Then we got a talk on ‘Compliance and ELNs’ by Simon Coles of Amphora Research Systems. The talk can be found here. In this was an example of just how bizarre the legal process for patent protection can make industrial process. In the process for preparing for a patent suit you will need to pay your lawyers to go through all the relevant data and paperwork. Indeed if you lose you will probably pay for the oppositions lawyers to go through all the relevant paperwork. These are not just lawyers, they are expensive lawyers. If you have a whole pile of raw data floating around this is not just going to give the lawyers a field day finding something to pin you to the wall on, it is going to burn through money like nobody’s business. The simple conclusion: It is far cheaper to re-do the experiment than it is to risk the need for lawyers to go through raw data. Throw the raw data away as soon as you can afford to! Like I said, a parallel universe where you think things are normal until they suddenly go sideways on you.

On a more positive sense there were some interesting talks on big companies deploying ELNs. Now we can look at this at some level as a model of a community adopting open notebooks. At least within the company (in most cases) everyone can see everyone else’s notebook. A number of speakers mentioned that this had caused problems and a couple said that it had been necessary to develop and promulgate standards of behaviour. This is interesting in the light of the recent controversy over the naming of a new dinosaur (see commentary at Blog around the Clock) and Shirley Wu’s post on One Big Lab. It reinforces the need for generally accepted standards of behaviour and the growing importance of these as data becomes more open.

The rules? The first two came from the talk, the rest are my suggestion. Basically they boil down to ‘Be Polite’.

  1. Always ask before using someone else’s data or results
  2. User beware: if you rely on someone else’s results its your problem if it blows up in your face (especially if you didn’t ask them about it)
  3. If someone asks if they can use your data or results you say yes. If you don’t want them to, give them a clear timeline on which they can or specific reasons why you can’t release the data. Give clear warnings about any caveats or concerns
  4. If someone asks you not to use their results (whether or not they are helpful or reasonable about it) think very carefully about whether you should ignore their request. If having done this you still feel you are being reasonable in using them, then think again.
  5. Any data that has not been submitted for peer review after 18 months is fair game
  6. If you incorporate someone else’s data within a paper discuss your results with them. Then include them as an author.
  7. Always, without fail and under any cicrumstances, acknowledge any source of information and do so generously and without conditions.

Sharing is caring…and not sharing can be reprehensible

Sometimes you read things that just make you angry. I’m not sure I can add much to this eloquent article written by Andrew Vickers in the New York Times (via Neil Saunders and the 23andme blog).

Shirley Wu has recently written on the fears and issues of being scooped and whether this is field dependent or not. Her discussion, and the NYT article seems to suggest that these fears are greatest in precisely those disciplines where sharing could lead to advances with direct implications for people, their survival, and their quality of life.

I, to be honest, have been getting more and more depressed about the fact that this keeps coming back as the focus of any discussion about Open Notebooks or Open Science. Why is the assumption that by sharing we are going to be cheated? Surely we should be debating about the balance between benefits and risks. And about how this compares to the balance of  benefits and risks in not being open. Particularly when those risks relate to people’s chance of survival.

Picture this…

There has been a bit of discussion recently about identifying and promoting ‘wins’ for Open Science and Open Notebook Science. I was particularly struck by a comment made by Hemai Parthasarathy at the ScienceBlogging Meeting that she wasn’t aware of any really good examples that illustrate the power of open approches. I think sometimes we miss the most powerful examples right under our nose because they are such a familiar part of the landscape that we have forgotten they are there. So let us imagine two alternate histories; I have to admit I am very ignorant of the actual history of these resources but I am not sure that matters in making my point.

History the first…

In the second half of the twentieth century scientists developed methods for sequencing proteins and DNA. Not long after this the decades of hard work on developing methods for macromolecular structure determination started to bear fruit and the science of protein crystallography was born. There was a great feeling that in understanding the molecular detail of biological systems that disease was a beatable problem, that it was simply a matter of understanding the systems, to know how to treat any disease. Scientists, their funders, pharmaceutical companies, and publishers could see this was an important area for development, both in terms of the science and also with significant commercial potential.

There was huge excitement and a wide range of proprietary databases containing this information proliferated. Later there came suggestions that the NIH and EMBL should fund public databases with mandated deposition of data but a broad coalition of scientists, pharmaceutical companies, and publishers objected saying that this would hamper their ability to exploit their research effort and would reduce their ability to turn research into new drugs. Besides, the publishers said, all he important information is in the papers…By the mid-noughties a small group of scientists calling themselves ‘bioinformaticians’ started to appear and began to look at evolution of genetic sequences using those pieces of information they could legally scrape from the, now electronically available, published literature. One scientist was threatened with legal action for taking seven short DNA sequences from a published paper…

Imagine a world with no GenBank, no PDB, no SwissProt, and no culture growing out of these of publically funded freely available databases of biological information like Brenda, KEGG, etc etc. Would we still be living in the 90s, the 80s, or even the 70s compared to where we have got to?

History the second…

In the second half of the twentieth century synthetic organic chemistry went through an enormous technical revolution. The availability of modern NMR and Mass spectrometry radically changed the whole approach to synthesis. Previously the challenging problem had been figuring out what it was you had made. Careful degradation, analysis, and induction was required to understand what a synthetic procedure had generated. NMR and MS made this part of the process much easier shifting the problem to developing new synthetic methdology. Organic chemistry experienced a flowering as creative scientists flocked to develop new approaches that might bear their names if they were lucky.

There was tremendous excitement as people realised that virtually any molecule could be made, if only the methodology could be figured out. Diseases could be expected to fall as the synthetic methodology was developed to match the advances in the biological understanding. The new biological databases were providing huge quantities of information that could aid in the targeting of synthetic approaches. However it was clear that quality control was critical and sharing of quality control data was going to make a huge difference to the rate of advance. So many new compounds were being generated that it was impossible for anyone to check on the quality and accuracy of characterisation data. So, in the early 80s, taking inspiration from the biological community a coalition of scientists, publishers, government funders, and pharmaceutical companies developed public databases of chemical characterisation data with mandatory deposition policies for any published work. Agreed data formats were a problem but relatively simple solutions were found fast enough to solve these problems.

The availability of this data kick started the development of a ‘chemoinformatics’ community in the mid 80s leading to the development of sophisticated prediction tools that aided the synthetic chemists in identifying and optimising new methodology. By 1990, large natural products were falling to the synthetic chemists with such regularity that new academics moved into developing radically different methodologies targeted at entirely new classes of molecules. New databases containing information on the activity of compounds as substrates, inhibitors, and activators (with mandatory deposition policies for published data) provided the underlying datasets for validation that meant by the mid 90s structure based drug discovery was a solved problem. By the late 90s the chemoinformatic tools available made the development of tools for identifying test sets of small molecules to selectively target any biological process relatively straightforward.

Ok. Possibly a little utopian, but my point is this. Imagine how far behind we would be without Genbank, PDB, and without the culture of publically available databases that this embedded in the biological sciences. And now imagine how much further ahead chemical biology, organic synthesis, and drug discovery might have been with NMRBank, the Inhibitor Data Bank…

Some New Year’s resolutions

I don’t usually do New Year’s resolutions. But in the spirit of the several posts from people looking back and looking forwards I thought I would offer a few. This being an open process there will be people to hold me to these so there will be a bit of encouragement there. This promises to be a year in which Open issues move much further up the agenda. These things are little ways that we can take this forward and help to build the momentum.

  1. I will adopt the NIH Open Access Mandate as a minimum standard for papers submitted in 2008. Where possible we will submit to fully Open Access journals but where there is not an appropriate journal in terms of subject area or status we will only submit to journals that allow us to submit a complete version of the paper to PubMed Central within 12 months.
  2. I will get more of our existing (non-ONS) data online and freely available.
  3. Going forward all members of my group will be committed to an Open Notebook Science approach unless this is prohibited or made impractical by the research funders. Where this is the case these projects will be publically flagged as non-ONS and I will apply the principle of the NIH OA Mandate (12 months maximum embargo) wherever possible.
  4. I will do more to publicise Open Notebook Science. Specifically I will give ONS a mention in every scientific talk and presentation I give.
  5. Regardless of the outcome of the funding application I will attempt to get funding to support an international meeting focussed on developing Open Approaches in Research.

Beyond the usual (write more papers, write more grants) I think that covers things. These should even be practical.

I hope all of those who have had a holiday have enjoyed it and that all those who have not are looking forward to one in the near future. I am looking forward to the New (Western, Calendar) Year. It promises to be an exciting one!

I am now off to cook lots of lovely Chinese food (and yes I know that is calendarically inappropriate – but it will still taste good!). Happy New Year!

e-science for open science – an EPSRC research network proposal

The UK Engineering and Physical Sciences Research Council currently has a call out for proposals to fund ‘Network Activities’ in e-science. This seems like an opportunity to both publicise and support the ‘Open Science’ agenda so I am proposing to write a proposal to ask for ~£150-200k to fund workshops, meetings, and visits between different people and groups. The money could fund people to come to meetings (including from outside the UK and Europe) but could not be used to directly support research activities. The rationale for the proposal would be as follows.

  • ‘Open Science’ has the potential to radically increase the efficiency and effectiveness of research world wide.
  • The community is disparate and dispersed with many groups working on different approaches that do not currently interoperate – agreeing some interchange or tagging standards may enable significant progress
  • Many of those driving the agenda are early career scientists including graduate students and postdocs who do not have independent travel funds and whose PI may not have resources to support attending meetings where this agenda is being developed
  • There is significant interest from academics, some publishers, software and tool developers, and research funders in making more data freely available but limited concensus on how to take this forward and thus far an insufficient committment of resources to make this possible in practice

The proposal would be to support 2-3 meetings over three years, including travel costs, and provide funds for exchange visits. What I would like from the community is an expression of interest, specifically the committment to write a letter of support saying you would like to be involved. It would be great to get these from tenured academics, early career academics, graduate students and PDRAs, publishers (NPG? PLoS?), library and repository people (UKOLN, Simile, others?) and anyone else who is relevant.

The timeline is tight (due Tuesday next week) but if there is enough interest I will push through to get this done. I propose to write the grant in the open and online so will post a Google Doc or OpenWetWare page as soon as I have something to put up. Any help people can offer on the writing would be appreciated. In the meantime please drop comments below. I will be pointing to this page in the grant proposal.

An experiment in open notebook science – Sortase mediated protein-DNA ligation

In a recent post I extolled the possible virtues of Open Notebook Science in avoiding or ameliorating the risk of being scooped. I also made a virtue of the fact that being open encourages you to take a more open approach; that there is a virtuous circle or positive feedback. However much of this is very theoretical. We don’t have good case studies to point at that show that Open Notebook Science generates positive outcomes in practice. To take a more cynical perspective where is the evidence that I am willing to take risks with valuable data? My aim with this post is to do exactly that, put something out there that is (as far as I know) new and exciting, and kick off a process that may help us to generate a positive example.

I mentioned in the previous post that we have been scooped not once, but twice, on this project. I will come back to the second scooping later but my object here is to try and avoid getting scooped a third time. As I mentioned in the previous post we are using the S. aureus Sortase enzyme to attach a range of molecules to proteins. We have found that this provides a clean, easy, and most importantly general method for attaching things to proteins. Labelling of proteins, attaching proteins to solid supports, and generating various hybrid-protein molecules has a very wide range of applications and new and easy to use methods are desperately needed. We have recently published[1] the use of this to attach proteins to solid supports and others have described the attachment of small molecules[2], peptides[3], PNA[4], PEG[5] and a range of other things.

One type of protein-conjugate that is challenging to generate is one in which a protein is linked to a DNA molecule. Such conjugates have a wide range of potential applications particularly as analytical tools where the very strong and selective binding that can often be found in a protein is linked to the wide range of extremely sensitive techniques available for DNA detection and identification[6]. Such techniques have been limited because it is difficult to find a general and straightforward technique for making such conjugates.

We have used our Sortase mediated ligation to successfully attach oligonucleotides to proteins and I have put up the data we have that supports this in my lab book (see here for an overview of what we have and here for some more specific examples with conditions). I should note that some of this is not strictly open notebook science because this is data from a student which I have put up after the event.

We are confident that it is possible to get reasonable yields of these conjugates and that the method is robust and easy to apply. This is an exciting result with some potentially exciting applications. However to publish we need to generate some data on applications of these conjugates. One obvious target here is to use a DNA array and differently coloured fluorescent proteins attached to different oligonucleotides to form an image on the array. The problem is that we are not well set up to do this in my lab and don’t have the expertise or resources to do this experiment efficiently. We could do it but it seems to me that it would be quicker and more efficient for someone else with the expertise and experience to do this. In return they obviously get an authorship on the paper.

Other experiments we are interested in doing:

  • Analytical experiment using the binding of a protein-DNA conjugate that utilises the DNA part for detection.
  • Pull down of peptide-DNA conjugates onto an array after exposure of the peptides to a protease
  • Attachment of proteins to a full length PCR product containing the gene for the protein. Select one of the protein and then re-amplifity the desired gene. (I had a quick go at this but it didn’t work)

So what I am asking is this:

  • If any reader of this blog is interested in doing these (or any other) experiments to aid us in getting the published paper then get in touch
  • If you feel so inclined then publicise this call wider on your own blog and let’s see whether using the blogosphere to make contacts can really aid the science

We will send the reagents to anyone who would like to do the experiments along with any further information required. In principle people ought to be able to figure out everything they need from the lab book but this will probably not be the case in practise. The idea here is to see whether this notion of a loose collaboration of groups with different resources and expertise that is driven by the science can work and whether it is a competitive way of doing science.

My criteria in accepting collaborators will be as follows:

  1. Willingness to adopt an Open Notebook Science approach for this experiment (ideally using our lab book system but not necessarily)
  2. Interest in and willingness to engage in the development of the published paper (including proposing and/or carrying out any new experiments that would be cool to include)
  3. Ability to actually carry out the experiment in reasonable time (ideally looking for a couple of months here)

So this is notionally a win-win situation for me. We will be getting on and doing our own thing as well but by working with other groups we may be able to get this paper out more efficiently and effectively. Maybe others will come up with clever experiments that would add to the value of the paper. The worst case scenario is that someone comes along and sees this, copies the results, and publishes ahead of us. The best case scenario is that someone else already working in a similar direction may come across this and propose working together on this.

In any case, the results promise to be interesting…

References:

[1] Chan et al, 2007, Covalent attachment of proteins to solid supports via Sortase-mediated ligation, PLoS ONE, e1164

[2] Popp et al, 2007, Sortagging: a versatile method for protein labelling, Nat Chem Biol, 3:707

[3] Mao et al, 2004, Sortase-mediated protein ligation: a new method for protein engineering, J Am Chem Soc, 126:2670

[4] Pritz et al, 2007, Synthesis of biologically active peptide nucleic acid-peptide conjugates by sortase-mediated ligation, J Org Chem, 72:3909

[5] Parasarathy et al, 2007, Sortase A as a novel molecular “stapler” for sequence specific protein conjugation, Bioconj Chem, 18:469

[6] Barbulis et al, 2005, Using protein-DNA chimeras to detect and count small numbers of molecules, Nature Methods, 2:31

Getting scooped…

I have been waiting to write this post for a while. The biggest concern expressed when people consider taking on an Open Notebook Science approach is that of being ‘scooped’. I wanted to talk about this potential risk using a personal example where my group was scooped but I didn’t want to talk about someone else’s published paper until the paper on our work was available for people to compare. Our paper has just gone live at PLoS ONE so you will be able to compare the two sets of results.

Attaching proteins in a site selective manner to solid supports is a challenging problem. A general approach to attaching proteins to resin beads or planar surfaces while retaining function would have applications in chemical catalysis, analytical devices, and the generation of protein microarrays.

We established in my laboratory that the Sortase enzyme of S. aureus was an effective way of attaching functional proteins to solid supports in about March 2006. This was before I started taking up ONS and as the student is finishing up the project has not been moved onto an ONS basis so the data was not made available when we had it. We delayed publishing this as we attempted to generate a ‘pretty picture’ in which we would create the Southampton University logo in fluorescent protein on a glass surface. The idea of this was to make it more likely that we would get the paper into a higher ranked journal but ultimately we were unsuccessful.

In March 2007 we were scooped by a paper in Bioconjugate Chemistry (1). This paper, amongst other things, included an experiment that was very similar to the core experiment in our data (2). I should emphasise that there is absolutely no suggestion that this group ‘stole’ our data. They were working independently and were probably doing their experiments at about the same time as we did ours.

The first point here is that in the vast majority of cases being scooped is not about theft but about the fact that a good idea is an idea that is likely to occur to more than one person. It is essentially about not being first to get to publication. I can argue that I had the idea some years ago – but we didn’t get on to the work until 2006 and we’ve only just managed to get it published.

The second point is that our work is clearly different enough from Parthasarathy et al to be published. This is often the case. Indeed we have recently been scooped again on a different aspect of this project (3) but I expect we will still be able to publish as our data is again complementary to that reported.

So, from the perspective of traditional publication we were scooped because we didn’t publish fast enough. We can’t claim any precedence because we weren’t taking an ONS approach that would support this claim. But let us consider what would have happened if we had taken an ONS approach. I think there are a series of possible outcomes;

  1. It is possible, or even likely, that the other group may not have noticed our results at all. Under these circumstances we would at least be able to claim precedence.
  2. The other group may have seen our results and been spurred into more rapid publication. Again we would have been able to claim precedence but also there would be a record of the visit. I suspect this is the most common route to being scooped. In most cases results are not ‘copied’ from e.g. conference presentations but much more often the fact that someone is close to publication spurs another group to get their work published first.
  3. The most positive outcome is that, having seen we had some similar results, the other group may have got in contact and we could have put the results together to make a better paper.

Outcome 3) may seem unlikely but it really is the best outcome for everyone. Pathasarathy et al published in Bioconjugate Chemistry and we will publish in PLoS ONE after chasing around a number of other journals. If we had combined the results and, possibly more importantly, the resources to hand we probably could have put together a much better paper. This could possibly have gone to a significantly higher ranked journal. Apart from possible arguments over first and corresponding authorship everyone would have been better off.

This is the promise of being open as well as practising Open Notebook Science. By cooperating we can do a lot better. Being open has its risks but equally there are significant potential benefits including doing better science, better publications, and better career prospects as a result.

But let us now put the shoe on the other foot. What if the other group had made their data available? Would I have rushed out our paper to prevent them getting in first? It is one thing to advocate openness but would I really have gotten in touch with them myself? The answer is that 12 months ago I probably wouldn’t have got in contact. I would have pushed the student to work 24 hours a day and got our own paper out as fast as possible with whatever data we had to hand. I probably would not have contacted the other group. And we may have cut corners to get the data together, missed out controls that we know would work but didn’t have time to do and glossed over any possible issues.

But today, faced with the same dilemma I would get in touch with them and propose combining our data. Why the change? Partly because I have spent the past 12 months considering the issues around being open. But a strong contributor is that if I didn’t I would be exposing myself to criticism as a hypocrite. I have come to think that one of the real benefits of ‘being open’ is that being exposed means you hold yourself to higher standards precisely because being out in the open means that people have the evidence to judge you on.

I find that as I do my experiments and record them I take more care, I describe them more clearly, and I take more care to preserve and index the data properly. More generally I feel more inclined to share my ideas and preliminary results with others. And part of this is because I am aware that double standards will be obvious to anyone who is looking. Standards and discipline in maintaining them make for better science and for better people. Anyone who is honest with themselves knows that sometimes, somewhere, there is a temptation to cut corners. We all need help in maintaining discipline and being open is a very effective way of doing it.

It may sound a bit over the top but I actually feel like a better person for taking this approach. So for all the sceptics out there, and particularly for those academics with blood pressure issues, I recommend you try throwing the doors open. The fresh air is a bit bracing but it will do you the world of good.

  1. Parthasarathy R, Subramanian S, Boder ET (2007) Sortase A as a novel molecular “stapler” for sequence-specific protein conjugation. Bioconjug Chem 18:469-76
  2. Chan L, Cross HF, She JK, Cavalli G, Martins HFP, Neylon C (2007) Covalent attachment of proteins to solid supports and surfaces via Sortase-mediated ligation, PLoS ONE 2(11): e1164 doi:10.1371/journal.pone.0001164
  3. Popp et al., (2007) Sortagging: a versatile method for protein labelling. Nat Chem Biol Sep 23 (Epub ahead of print)

Sourceforge for science

I got to meet Jeremiah Faith this morning and we had an excellent wide ranging discussion which I will try to capture in more detail later. However I wanted to get down some thoughts we had at the end of the discussion. We were talking about how to publicise and generate more interest and activity for Open Notebook Science. Jeremiah suggested the idea of a Sourceforge for science; a central clearing house somewhere on the web where projects could be described and people could opt in to contribute. There have been some ideas in this direction such as Totally retrosynthetic but I don’t think there has been a lot of uptake there.

This was all tied into the idea of making lab books findable and indexed in places where people might look for them. I have been taken with the way PostGenomic and ChemicalBlogSpace aggregate blogs, particularly blog posts on the peer reviewed literature and in the case of ChemicalBlogSpace aggregate comments on molecules, based on trawling for InChi Keys (I think). So can we propose that one of (both of?) these sites start aggregating online notebook posts? If we could make these point at peer reviewed papers online it would also be possible to use a modified version of the Blue Obelisk Grease Monkey that would popup whenever you were looking at a paper for which there was raw data online.

It wouldn’t be necessary, or perhaps even advisable, to limit these to people strictly practising Open Notebook Science. People could put up data once a paper was published or after a delay. Perhaps we could not even require that all the raw data be put up. If the barriers are lowered more people may do it. A range of appropriate tags (‘Partial Raw Data is available for this paper’, ‘Full raw data is available for this paper’, ‘Full raw data and associated data is available as an open notebook’,) would distinguish between what people are making available. Data could be dropped anywhere online and by aggregation it gains more visibility encouraging people to move from making specific data available towards making all their data available.

Any thoughts?

PMR’s Open Notebook Project continued

This is reply continuing the conversation with Peter Murray-Rust on his plans for an Open Notebook Science based project. I have cut a lot of the context to keep the post size to a manageable level so if you want to track back see the original two posts from Peter, my response, and Peter’s response to that in full.

I should add that I am not a coder in any form so where this gets technical I am proposing things in principle (or hand waving as some might put it :).

PMR: We’ll create plots for ALL molecules and spectra. However it may not be always to identify what is “wrong”. Thus a bad TMS value (e.g. if the solvent is wrong) will shift all the values. So we may give a revised line (y = x –> y = x + c).

…and…

PMR: Yes. We’ll probably do this by RMS deviation and we could colour the table of contents or something similar. It may not be easy to make generic corrections over several thousand files. (Hang on – the files are in CML so it’s trivial).

…and…

PMR: Yes. It may not bbe trivial to correct them – we shan’t have a chemical editor in the Wiki, so it may be an idea to have a molecule upload. However the details often bite hard.

The most practical approach may simply be to let people flag things or suggest other solutions, either through comments on a blog or as additions on the wiki (which could all be aggregated into an RSS feed) and then to re-run the spectra with the new molecule/assignment/solvent by hand (or rather put it in the queue by hand). I imagine having a ‘Wrong Spectra Blog’ which has both a conventional comment button as well as a ‘propose correction’ button which still posts a comment but flags this for easy aggregation and possibly prompts people to drop in an InChi/CML/Smiles code (aggregation from multiple RSS feeds and filtering is do-able e.g. via Yahoo Pipes – grab RSS feed(s) from comments and then filter for a specific tag code, the comment is then in effect an RDF triple ).

Although if there is an entry page then presumably someone could run an existing spectra against a new assignment/molecule. Mainly a question of providing a link to make this easy for people. This could be in the template. Then how do you associate an attempted correction with the original ‘wrong’ spectra? Well that is where the details bite I suspect.

There is also the issue of implementation of all of this:

PMR: Yes. I am not yet sure how to insert machine-generated pages into a Wiki and we’ll value help here. The pages will certainly NOT be editable. Any refinement of the protocol or correction will generate a NEW job, not overwrite the last one.

…and…

PMR: I think we are clearly going to have a new blog. What I’m not clear is how we post comments from the blog to the Wiki and alert the Wiki from the blog.

This is not a world away from the blogging instruments developed in Jeremy Frey’s group at Southampton Uni. In that case the instruments themselves post a blog entry each time a sample is analysed. Here we are talking about a computational analysis but the principal is the same. Both Word Press and MediaWiki have (I believe – this is where I get out of my depth) quite sophisticated APIs that could enable automated posting.

There is some information at OpenWetWare (see particularly Julius Luck’s comments) about interfacing with MediaWiki that came up during the recent discussion on lab books). I believe there was an intention at some point to attempt to integrate the OpenWetWare blogs (like this one) with the OpenWetWare wiki at some level but I don’t think this is currently a priority for them.

I imagine it ought to be possible to write plugins for both MediaWiki and WordPress that would provide a button for each post/comment to ‘Publish to Wiki/Blog’. I don’t think this needs to be automated as I would see this as precisely the point at which human intervention is helping things along. The researcher may wish to publish a set of Blog comments to the Wiki to encourage more detailed discussion or conversely may wish to post a good solution from the Wiki to the Blog to alert people that something interesting has popped up.

CN: An interesting question which would arise from this combination of approaches is ‘where is the notebook?’ to which I will admit I don’t have an answer. But I’m not sure that it matters.

PMR: I am not worried about where the notebook is (though it could be difficult to “lift it up” by a single root.

I think this shows that by expanding the functionality of the ‘lab notebook’ we are starting to break our underlying idea of what that notebook is. Experimental scientists think very much in terms of a monolithic object bound in nice hard covers (even though this bears very little resemblance to reality)  and the idea that it can become a diffuse object distributed through a series of different repositories and journals is a bit discomforting. What is becoming clear to me is that we are starting to capture much more than just the raw data or procedures that go into the lab book itself. Keeping an electronic lab notebook is more work than a paper one, primarily because most of us don’t use paper notebooks very effectively.

Postscript: I’ve edited this slightly as of 10:36 GMT 14-October because somethings were unclear. I’ve used the tag pmr-ons to collect these posts.