Home » Blog

Open methods vs open data – might the former be even harder?

17 August 2007 13 Comments

Continuing the discussion set off by Black Knight and continued here and by Peter Murray-Rust I was interested in the following comment in Black Knight’s followup post (my emphasis and I have quoted slightly out of context to make my point).

But all that is not really what I wanted to write about now. The OpenWetWare (have you any idea how difficult it is to type that?) project is a laudable effort to promote collaboration within the life sciences. And this is cool, but then I realize that the devil is in the details.

Share my methods? Yeah! Put in some technical detail? Yea–hang on.

A lot of the debate has been about posting results and the risk of someone stealing them or otherwise using them. But in bioscience the competitive advantage that a laboratory has can lie in the methods. Little tricks that don’t necessarily make it into the methods sections of papers, that sometimes researchers aren’t even entirely aware of, but which form part of the culture of the lab.

The case for sharing methods is, at least on the surface, easier to make than sharing data. A community can really benefit from having all those tips and tricks available. You put yours up and I’ll put mine up means everyone benefits. But if there is something that gives you a critical competitive advantage then how easy is that going to be to give up? An old example is the ‘liquid gold’ transformation buffer developed by Doug Hanahan (read the story in Sambrook and Russell, third edition, p1.105 or online here – I think; its not open access). Hanahan ‘freely and generously distributed the buffer to anyone whose experiments needed high efficiencies…’ (Sambrook and Russell) but he was apparently less keen to make the recipe available. And when it was published (Hanahan, 1983) many labs couldn’t achieve the same efficiencies, again because of details like a critical requirement for absolutely clean glassware (how clean is clean?). How many papers these days even include or reference the protocol used for transformation of E. coli? Yet this could, and did, give a real competitive advantage to particular labs in the early 1980s.

So, if we are to make a case for making methodology open we need to tackle this. I think it is clear that making this knowledge available is good for the science community. But it could be a definite negative for specific groups and people. The challenge lies in making sure that altruistic behaviour that benefits the community is rewarded. And this won’t happen unless metrics of success and community stature are widened to include more than just publications.


13 Comments »

  • Keith said:

    At least one advantage of posting your science online is that the internet has a good memory. If Darwin and Wallace had conveyed their ideas on a blog or through gmail, there would be no question as to who came up with the ideas first- Google doesn’t forget.

    Likewise, if you post the details to some novel strategy or technique online, you will have a solid record showing that you were the first to discuss those ideas.

  • Keith said:

    At least one advantage of posting your science online is that the internet has a good memory. If Darwin and Wallace had conveyed their ideas on a blog or through gmail, there would be no question as to who came up with the ideas first- Google doesn’t forget.

    Likewise, if you post the details to some novel strategy or technique online, you will have a solid record showing that you were the first to discuss those ideas.

  • Cameron Neylon said:

    Very much so, and even better, if someone does steal your idea you may well have a record of them visiting. Stealing someone’s ideas or methods and getting away might actually become harder.

  • Cameron Neylon said:

    Very much so, and even better, if someone does steal your idea you may well have a record of them visiting. Stealing someone’s ideas or methods and getting away might actually become harder.

  • Bill said:

    Stealing someone’s ideas or methods and getting away might actually become harder.

    Yes, absolutely! Jean-Claude’s been making this point for a while about the third-party timestamps in his wikispaces notebook. I think it’s a crucial answer to one of the most common objections to opening up science (“what if some bastard robs my stuff?”), and I don’t know why it hasn’t got more traction. Maybe it’s just a question of critical intellectual mass, so I’m very pleased to see others coming up with the same idea.

  • Bill said:

    Stealing someone’s ideas or methods and getting away might actually become harder.

    Yes, absolutely! Jean-Claude’s been making this point for a while about the third-party timestamps in his wikispaces notebook. I think it’s a crucial answer to one of the most common objections to opening up science (“what if some bastard robs my stuff?”), and I don’t know why it hasn’t got more traction. Maybe it’s just a question of critical intellectual mass, so I’m very pleased to see others coming up with the same idea.

  • Cameron Neylon said:

    I don’t think I can claim to have come up with the idea independently of Jean-Claude. I’m pretty sure I read it in something he has posted. But I agree it is a crucial point to keep making. If science is about establishing priority then having a proper public datestamp has got be a good thing.
    Another question I don’t know the answer to is whether such a datestamp could be used to establish priority for the purpose of a patent (at least a US one). That would answer the other comment I made over at ‘<a href=”http://nsaunders.wordpress.com/2007/08/12/a-post-that-says-it-all/” rel=”nofollow”>What you’re doing is…</a>’

  • Cameron Neylon said:

    I don’t think I can claim to have come up with the idea independently of Jean-Claude. I’m pretty sure I read it in something he has posted. But I agree it is a crucial point to keep making. If science is about establishing priority then having a proper public datestamp has got be a good thing.
    Another question I don’t know the answer to is whether such a datestamp could be used to establish priority for the purpose of a patent (at least a US one). That would answer the other comment I made over at ‘<a href=”http://nsaunders.wordpress.com/2007/08/12/a-post-that-says-it-all/” rel=”nofollow”>What you’re doing is…</a>’

  • Science in the open » Followup on ‘open methods’ said:

    […] Open methods vs open data – might the former be even harder? […]

  • Alethea said:

    One limiting factor for me in putting up open methods has just been the time needed to make them wiki-friendly (with respect to OWW) or even to type up a protocol to begin with. It’s often easier to work for years from an ancient photocopy that has been manually marked up, and then to photocopy that for the colleague down the hall who comes to ask you how you do it. Few journals – even methodological ones – allow you the room to get to the level of detail that specifies just how clean the glassware must be (eg.). We found that out to our detriment for our application of Serial Analysis of Gene Expression. And now we have finally been successful, we’re just tired. There is indeed, little incentive to go back three years’ earlier and remember the failed attempts and how we got past them.

    On the other hand, I’ve definitely come across scientists who, when you say “it doesn’t work like in the methods section of your paper” spend a lot of time sending e-mails back and forth to help trouble-shoot.

    In my field (covering some aspects of genomics), open data will come when we get our first publication of the database and one of its applications published. There is so much one could do with it, that it would be downright silly to not make it publicly available as soon as possible. I know another genomics consortium that generated so much data on dozens of expression banks that they were disappointed that no one else has taken the ball and run with it – even though only one of the banks had been exploited for one application by the original scientists who made the banks after four years.

    I’ve often noticed that most people prefer to generate their own data (even if possibly inferior) than to mine that which is already publicly available.

  • Alethea said:

    One limiting factor for me in putting up open methods has just been the time needed to make them wiki-friendly (with respect to OWW) or even to type up a protocol to begin with. It’s often easier to work for years from an ancient photocopy that has been manually marked up, and then to photocopy that for the colleague down the hall who comes to ask you how you do it. Few journals – even methodological ones – allow you the room to get to the level of detail that specifies just how clean the glassware must be (eg.). We found that out to our detriment for our application of Serial Analysis of Gene Expression. And now we have finally been successful, we’re just tired. There is indeed, little incentive to go back three years’ earlier and remember the failed attempts and how we got past them.

    On the other hand, I’ve definitely come across scientists who, when you say “it doesn’t work like in the methods section of your paper” spend a lot of time sending e-mails back and forth to help trouble-shoot.

    In my field (covering some aspects of genomics), open data will come when we get our first publication of the database and one of its applications published. There is so much one could do with it, that it would be downright silly to not make it publicly available as soon as possible. I know another genomics consortium that generated so much data on dozens of expression banks that they were disappointed that no one else has taken the ball and run with it – even though only one of the banks had been exploited for one application by the original scientists who made the banks after four years.

    I’ve often noticed that most people prefer to generate their own data (even if possibly inferior) than to mine that which is already publicly available.

  • Cameron Neylon said:

    Alethea, I can definitely see this would be an issue. I think a partial answer to this is that if you are already recording your methodology electronically then it would be easier. I guess you probably have a folder of pieces of paper that it would be a huge effort to convert? I know that personally I would lose the pieces of paper and having it online means I can find it.

    The other answer is that by putting the methods up it is more likely that people will go back to your papers and then cite you. On the other hand, you mention SAGE, so people are more likely to cite the original paper. Now if people would cite your method on openwetware (or Nature protocols) would that be a benefit to you?

    On open data, yes absolutely. I think there are two issues, one is actually knowing exactly what people have done. Microarray data is often given as an example but what is made available is often actually not the raw data (we are trying to get some real raw data to do some analysis on at the moment as it happens and it is not so easy). If you’re not sure exactly how the experiment was done then you may not feel comfortable with using the data. But also I think there is a sense for some people that if you’re not generating the data yourself you’re not doing ‘proper’ science. Both from the researchers, and the reviewers.

  • Cameron Neylon said:

    Alethea, I can definitely see this would be an issue. I think a partial answer to this is that if you are already recording your methodology electronically then it would be easier. I guess you probably have a folder of pieces of paper that it would be a huge effort to convert? I know that personally I would lose the pieces of paper and having it online means I can find it.

    The other answer is that by putting the methods up it is more likely that people will go back to your papers and then cite you. On the other hand, you mention SAGE, so people are more likely to cite the original paper. Now if people would cite your method on openwetware (or Nature protocols) would that be a benefit to you?

    On open data, yes absolutely. I think there are two issues, one is actually knowing exactly what people have done. Microarray data is often given as an example but what is made available is often actually not the raw data (we are trying to get some real raw data to do some analysis on at the moment as it happens and it is not so easy). If you’re not sure exactly how the experiment was done then you may not feel comfortable with using the data. But also I think there is a sense for some people that if you’re not generating the data yourself you’re not doing ‘proper’ science. Both from the researchers, and the reviewers.