Open methods vs open data – might the former be even harder?

Continuing the discussion set off by Black Knight and continued here and by Peter Murray-Rust I was interested in the following comment in Black Knight’s followup post (my emphasis and I have quoted slightly out of context to make my point).

But all that is not really what I wanted to write about now. The OpenWetWare (have you any idea how difficult it is to type that?) project is a laudable effort to promote collaboration within the life sciences. And this is cool, but then I realize that the devil is in the details.

Share my methods? Yeah! Put in some technical detail? Yea–hang on.

A lot of the debate has been about posting results and the risk of someone stealing them or otherwise using them. But in bioscience the competitive advantage that a laboratory has can lie in the methods. Little tricks that don’t necessarily make it into the methods sections of papers, that sometimes researchers aren’t even entirely aware of, but which form part of the culture of the lab.

The case for sharing methods is, at least on the surface, easier to make than sharing data. A community can really benefit from having all those tips and tricks available. You put yours up and I’ll put mine up means everyone benefits. But if there is something that gives you a critical competitive advantage then how easy is that going to be to give up? An old example is the ‘liquid gold’ transformation buffer developed by Doug Hanahan (read the story in Sambrook and Russell, third edition, p1.105 or online here – I think; its not open access). Hanahan ‘freely and generously distributed the buffer to anyone whose experiments needed high efficiencies…’ (Sambrook and Russell) but he was apparently less keen to make the recipe available. And when it was published (Hanahan, 1983) many labs couldn’t achieve the same efficiencies, again because of details like a critical requirement for absolutely clean glassware (how clean is clean?). How many papers these days even include or reference the protocol used for transformation of E. coli? Yet this could, and did, give a real competitive advantage to particular labs in the early 1980s.

So, if we are to make a case for making methodology open we need to tackle this. I think it is clear that making this knowledge available is good for the science community. But it could be a definite negative for specific groups and people. The challenge lies in making sure that altruistic behaviour that benefits the community is rewarded. And this won’t happen unless metrics of success and community stature are widened to include more than just publications.

A great example of ‘fun’

I wrote the other day about the idea of fun being a motivating factor to taking up open notebook science. Sometimes something is just cool and you want to share it. Then along comes a great example.

Via petermr’s blog:

At ‘Life of a Lab Rat‘:

This has got to be in the running for the coolest cloning experiment ever.

Last Tuesday a grad student in the reciprocal space cadet lab, let’s call him Fu Manchu, asked me if I had any GFP. ‘GFP’ expands to ‘green fluorescent protein’……[]

As petermr says this is just very cool. The molecular biology is fairly conventional. But that’s not the point. The point is that Black Knight did a fun experiment and felt it was worth sharing with the world. We might argue that there isn’t enough methodological detail to tell us exactly what was done here but that’s a minor quibble. The important thing is that it was fun and its out there!

Incidentally, I am writing this in the lab while waiting for PCR primers to melt so I can set up some PCR reactions (here if you want to look – though you may not be able to access this at the moment, still need to get this one to public access). I am beginning to think that one of the main issues of open notebook science for biochemistry/molecular biology may be the difficulty in using a track pad with nitrile gloves on!

The case for open notebook science

The reasons for pursuing more openess in science from the perspective of the science and funding communities have been well rehearsed and described elsewhere (see 3 Quarks Daily 1,2, and 3 for an excellent overview). There are excellent discussions of where this might take us in terms of capability and in terms of the efficient re-use of government or charity funded research. These are the reasons why many funding organisations and government bodies are beginning to mandate open access publication and making data publicly available.

Most scientists are, I think, reasonably happy with the concept of making raw data publicly available after publication. The main reasons for resistance are more to do with the hassle involved than with an in principle objection. However people are much less happy with making data available before publication. The reasons why many people are worried about the push towards making data publicly available have also been discussed (see for instance comments on Corie Lok’s Nature Network blog and the discussion at WYDIRD). The primary one is the fear of ‘being scooped’ or being ‘uncompetitive’. I think this is mostly (but not entirely) a fallacy and will come back to this in the future (after we get a paper accepted, oh the irony).

All the reasons for Open Science that others have described are good and noble. But much of this involves extra work. In particular making open notebook science work requires investment in time and tool development to get it off the ground. So what is the motivation for getting over these hurdles? Why is it worth the effort? What’s in it for me?

Well I think the answer to this may be quite simple. It could be fun. I got into science because I like talking to people about science. I worked in two great groups as an undergraduate and as a postgraduate where we argued over the details of results, the literature, and anything else of interest constantly. This was great fun. Wouldn’t it be so much more fun to talk with a global community of people who are interested in what we are doing? Alright, in practise no-one may be reading, but if it is up there and available then it’s surely more likely that someone might read it than if it’s stuck in a notebook on my desk.

There are lots of other good self-serving reasons why making stuff available could be good. Giving people the raw data makes procedures more repeatable. Methods papers are highly cited so getting your methods out there means your papers will get cited more (certainly putting the data out does). It will probably get you media coverage and help in profile building. This makes your papers more likely to be accepted, your grants more likely to be funded, your promotion more likely and all those things that make up the core of how science works in practice. But to be frank, it is worrying about all these things that I find takes a lot of the fun out of doing science. So I think it would be good to put some of it back in and this might be a good way to start to do it.

Open (adjective)

Open [oh-puhn ] (adjective) not closed…having no means of closing or barring…relatively free of obstruction…without restrictions as to who may participate…undecided; unsettled… (from Dictionary.com)

There is a great deal of confusion out there as to what ‘Open’ means, especially in science. The definitions above seem particularly apposite ‘…relatively free of obstruction…’. Certainly undecided or unsettled seems appropriate in some cases. The claims of a journal to be ‘Open Access’ can set off a barrage of comment in the blogosphere. Whether this makes any difference to the journal is unclear but definitions are clearly important. If my aim here is talk about Open Science then it is sensible to be clear what I mean.

So the following stand as definitions until they need to be changed;

Open Access (of journals, data, or anything else really): Means freely available and accesible to use, re-use, re-distribute, re-mix subject only to a requirement to attribute the work. Essentially as described in the Berlin and Bethesda declarations. Well summarised by Chris Surridge on his blog at PLoS ONE.

Freely accesible: On the web, indexed by search engines, in a useable format, with no requirement to pay for access and no exclusion of any potential users (except perhaps for antisocial behaviour).

Open Notebook Science: This is Jean-Claude Bradley‘s term which I think encompasses much of what I am interested in doing and has been pretty clearly defined (see here and here). To summarise this means that every experiment that is done and every piece of data that is collected is placed online in a freely accessible repository in a timely manner. I would add to this something which I don’t think is explicit in previous definitions but I think is implicit in the way his group works and make their data available. That is that there must be space for interaction, comments, and questions from the outside world.

Open Science is really too woolly a term to mean anything much but it encompasses the movement that is working towards more of the above throughout the science community. Its a good phrase, it captures the imagination, is evocative, and memorable. Its just too big to be pinned down. But its a big set of ideas, so let’s see where it leads us.