Picture this…

There has been a bit of discussion recently about identifying and promoting ‘wins’ for Open Science and Open Notebook Science. I was particularly struck by a comment made by Hemai Parthasarathy at the ScienceBlogging Meeting that she wasn’t aware of any really good examples that illustrate the power of open approches. I think sometimes we miss the most powerful examples right under our nose because they are such a familiar part of the landscape that we have forgotten they are there. So let us imagine two alternate histories; I have to admit I am very ignorant of the actual history of these resources but I am not sure that matters in making my point.

History the first…

In the second half of the twentieth century scientists developed methods for sequencing proteins and DNA. Not long after this the decades of hard work on developing methods for macromolecular structure determination started to bear fruit and the science of protein crystallography was born. There was a great feeling that in understanding the molecular detail of biological systems that disease was a beatable problem, that it was simply a matter of understanding the systems, to know how to treat any disease. Scientists, their funders, pharmaceutical companies, and publishers could see this was an important area for development, both in terms of the science and also with significant commercial potential.

There was huge excitement and a wide range of proprietary databases containing this information proliferated. Later there came suggestions that the NIH and EMBL should fund public databases with mandated deposition of data but a broad coalition of scientists, pharmaceutical companies, and publishers objected saying that this would hamper their ability to exploit their research effort and would reduce their ability to turn research into new drugs. Besides, the publishers said, all he important information is in the papers…By the mid-noughties a small group of scientists calling themselves ‘bioinformaticians’ started to appear and began to look at evolution of genetic sequences using those pieces of information they could legally scrape from the, now electronically available, published literature. One scientist was threatened with legal action for taking seven short DNA sequences from a published paper…

Imagine a world with no GenBank, no PDB, no SwissProt, and no culture growing out of these of publically funded freely available databases of biological information like Brenda, KEGG, etc etc. Would we still be living in the 90s, the 80s, or even the 70s compared to where we have got to?

History the second…

In the second half of the twentieth century synthetic organic chemistry went through an enormous technical revolution. The availability of modern NMR and Mass spectrometry radically changed the whole approach to synthesis. Previously the challenging problem had been figuring out what it was you had made. Careful degradation, analysis, and induction was required to understand what a synthetic procedure had generated. NMR and MS made this part of the process much easier shifting the problem to developing new synthetic methdology. Organic chemistry experienced a flowering as creative scientists flocked to develop new approaches that might bear their names if they were lucky.

There was tremendous excitement as people realised that virtually any molecule could be made, if only the methodology could be figured out. Diseases could be expected to fall as the synthetic methodology was developed to match the advances in the biological understanding. The new biological databases were providing huge quantities of information that could aid in the targeting of synthetic approaches. However it was clear that quality control was critical and sharing of quality control data was going to make a huge difference to the rate of advance. So many new compounds were being generated that it was impossible for anyone to check on the quality and accuracy of characterisation data. So, in the early 80s, taking inspiration from the biological community a coalition of scientists, publishers, government funders, and pharmaceutical companies developed public databases of chemical characterisation data with mandatory deposition policies for any published work. Agreed data formats were a problem but relatively simple solutions were found fast enough to solve these problems.

The availability of this data kick started the development of a ‘chemoinformatics’ community in the mid 80s leading to the development of sophisticated prediction tools that aided the synthetic chemists in identifying and optimising new methodology. By 1990, large natural products were falling to the synthetic chemists with such regularity that new academics moved into developing radically different methodologies targeted at entirely new classes of molecules. New databases containing information on the activity of compounds as substrates, inhibitors, and activators (with mandatory deposition policies for published data) provided the underlying datasets for validation that meant by the mid 90s structure based drug discovery was a solved problem. By the late 90s the chemoinformatic tools available made the development of tools for identifying test sets of small molecules to selectively target any biological process relatively straightforward.

Ok. Possibly a little utopian, but my point is this. Imagine how far behind we would be without Genbank, PDB, and without the culture of publically available databases that this embedded in the biological sciences. And now imagine how much further ahead chemical biology, organic synthesis, and drug discovery might have been with NMRBank, the Inhibitor Data Bank…

More on the PSB proposal

Shirley Wu has followed up on her original proposal to submit a session proposal for PSB. She asks a series of important questions about going forward on this and I thought I would reply to these here to widen exposure.

I think it is worth going for a session and I am happy to lead the application but there may well be better people; Jean-Claude, Antony Williams, Peter Murray-Rust, Egon Willighagen to get to lead it depending on focus. I think the important question to ask is whether we can generate enough research papers to justify a session. I believe we can and should and I will commit to generating one if we go ahead, but we need at least another 3-4 to go ahead I think.

So, to answer Shirley’s questions:

1. What should be the focus of this session on Open Science? (first, frame it as a traditional PSB session, then perhaps as a “creative” session)
2. What kind of substantial/technical/research papers can be written about Open Science?
3. Who are the major players in the field? Who would the session chair invite to submit a paper?
4. Who is willing to help write/organize the actual proposal and session?

Given it is a computing symposium I would say that it should focus on tools and standards and how they effect what we can, or would like to do. This also gives us a chance to provide research type papers describing such tools and standards and investigating their implementation. So we could write papers describing different implementations of Open Notebooks and critical analysis of the differences, the organisation of Open Data, standards for describing data, and social and cultural aspects of what is happening etc etc.

People to invite to write papers include Jean-Claude Bradley, OpenWetWare group, Egon, Peter MR, Deepak Singh (willing to write a review/scoping type paper?), Antony Williams (ChemSpider), Simile Group (www.simile.mit.edu), other repository, data archival groups, Nature Publishing/PLoS/PMC/UK-PMC to describe systems, Heather Piwowar to analyse what happens, and social sciences groups that are becoming interested in what is going on.

Finally, as I say, I am willing to help, but as you can see time becomes a constraint for me and things have a habit if getting left to the last minute. If anyone else would like to step in to lead then I am more than happy to be a co-chair.  If no one else is available I am happy to lead. I at least have the advantage that I can probably source the resources so that I can get there!

I am going to tag this “Open Science PSB09” if that seems a good tag to aggregate around.

Open Science Session at PSB 2009?

Shirley Wu from Stanford left a comment on my New Years Resolutions post suggesting the possibility of a session on Open Science at the PSB meeting in Hawaii in 2009 which I wanted to bring to front for peoples attention.

[…] Since you mentioned organizing an international meeting on the subject and publicizing open science, I’m curious what your thoughts (and anyone else’s who reads this!) would be on participating in a session on Open Science at the Pacific Symposium on Biocomputing at PSB. They don’t traditionally cover non-primary research/methods tracks, but they do pride themselves on being at the cutting edge of biology and biocomputing, so I am hoping they will be amenable to the idea. If there was support from, shall we say, the founders of this movement, I think it would help a great deal towards making it happen. […]

She also has a post on her new blog One Big Lab where she fleshes out the idea in a bit more detail and which is probably the best place to continue the discussion.

Hi Shirley! Great to have more people out there blogging and commenting. I am not sure whether I really qualify as a ‘founder of the movement’. I know things are moving fast, but I don’t think having been around for nine months or so makes me that venerable!

This sounds broadly like a good idea to me. I was considering trying to organise a meeting in the UK towards October – November this year but the timelines are tight and really dependent on money coming through. I would be happy to push back to Jan 2009 in Hawaii if people felt this was a good idea; if the grant comes through we could use this as the first annual meeting. My only concern is that Hawaii probably increases average costs for people as more people have to come further and book accomodation than if it is either Western Europe or East Coast US. The other issues is how and whether to focus such a session. I also don’t see a problem with having two meetings ~6 months apart. What do people think?

Open Science and the developing world: Good intentions, bad implementation?

I spent last week in Cuba. I was there on holiday but my wife (who is a chemistry academic) was on a work trip to visit collaborators. This meant I had the opportunity to talk to a range of scientists and to see the conditions they work under. One of the strong arguments for Open Science (literature access, data, methods, notebooks) is that it provides access to scientists in less priviledged countries to both peer reviewed research as well as to the details of methodology that can enable them to carry out their science. I was therefore interested to see both what was available to them and whether they viewed our efforts in this area as useful or helpful. I want to emphasise that these people were doing good science in difficult circumstances by playing to their strengths and focussing on achievable goals. This is not second rate science, just science that is limited by access to facilities, reagents, and information.

Access to the literature

There is essentially no access to the subscriber-only literature.  Odd copies of journal issues are highly valued and many people get by by having visiting positions at institutes in the developed world. I talked to a few people about our protein ligation work and they were immensely grateful that this was published in an open access journal. However they were uncertain about publishing in open access journals due to the perceived costs.  While it is likely that they could get such costs waived I believe there is an issue of pride here in not wishing to take ‘charity’. Indeed, in the case of Cuba it may be illegal for US based open access publishers to provide such assistance. It would be interesting to know whether this is the case.

Overall though, it is clear that acccess to the peer reviewed literature is a serious problem for these people.  Open Access publishing provides a partial solution to this problem. I think to be effective it is important that this not be limited to self archving, as for reasons I will come back to, it is difficult for them to find such self archived papers. It is clear that mandating archival on a free access repository can help.

Access to primary data

Of more immediate interest to me was whether people with limited access to the literature saw value in having free access to the primary data in open notebooks. Again, people were grateful for the provision of access to information as this has the potential to make their life easier. When you have limited resources it is important to make sure that things work and that they produce publishable results. Getting details information on methodology of interest is therefore very valuable. Often the data that we take for granted is not available (fluorescence spectra, NMR, mass spectrometry) but details like melting points, colours, retention times can be very valuable.

There were two major concerns; one is a concern we regularly see, that of information overload. I think this is less of a concern as long as search engines make it possible to find information that is of interest. Work needs to be done on this but I think it is clear that some sort of cross between Google Scholar and Amazon’s recommendation system/Delicious etc. (original concept suggested by Neil Saunders) can deal with this.  The other concern, relating to them adopting  such approaches, was one that we have seen over and over again, that of ‘getting scooped’. Here though the context is subtley different and there is a measure of first world-developing world politics thrown in. These scientists are, understandably, very reluctant to publicise initial results because the way they work is methodical and slow. Very often the key piece of data required to make up a paper can only be obtained on apparatus that is not available in house or requires lengthy negotiations with potential overseas collaborators. By comparison it would often be trivially easy for a developed world laboratory to take the initial results and turn out the paper.

The usual flip side argument holds here; by placing an initial result in the public domain it may be easier for them to find a collaborator who can finish of the paper but I can understand their perspective. These are people struggling against enormous odds to stake out a place for themselves in the scientific community. The first world does not exactly have an outstanding record on acknowledging or even valuing work in developing countries so I can appreciate a degree of scepticism on their part. I hope that this may be overcome eventually but given that the assumption of most people in my own community is that by being open we are bound to be shafted I suspect we need to get our own house in order first.

The catch…

All of this is well and good. There are many real and potential benefits for scientists in the developing world if we move to more open styles of science communication. This is great, and I think it is a good argument for more openness. However there is a serious problem with the way we present this information and our reliance on modern web tools to do it. Its a very simple problem: bandwidth.

All of our blogs, our data, and indeed the open access literature is very graphics heavy. I actually tried to load up the front page of openwetware.org while sitting at the computer of the head of the department my wife was visiting (the department has two networked computers). Fifteen minutes later it was still loading.  The PLoS One front page was similarly sluggish. I get irritated if my download speeds drop below 500K/second, at home, and I will give up if they go down to 100K. We were seeing download rates of 44 bytes/second at the worst point. In some cases this can even make search engines unuseable making it near impossible to track down the self-archived versions of papers. Cuba is perhaps a special case because the US embargo means they have no access to the main transatlantic and North American cables, in effect the whole country is on a couple of bundles of phone lines, but I suspect that even while access is becoming more pervasive the penetration of reasonable levels of bandwidth is limited in the developing world.

The point of this is that access is about more than just putting stuff up, it is also about making it accessible. If we are serious about providing access, and expanding our networks to include scientists who do not have the advantages that we have, then this necessarily includes thinking about low bandwidth versions of the pages that provide information. I looked through PLoS One, openwetware, BioMedCentral, and couldn’t find a ‘text only version’ button on any of them (to be fair there isn’t one on our lab blog either).  I appreciate the need to present things in an appealling and useful format, and indeed the need to place advertising to diversify revenue streams. I guess the main point is not to assume that by making it available, that you are necessarily making it accessible. If universal accessibility is an important goal then some thought needs to go into alternative presentations.

Overall I think there are real benefits for these scientists when we make things available. The challenges shouldn’t put us off doing it but perhaps it is advisable to bear in mind the old saw; If you want to help people, make sure you find out what they need first.

Some New Year’s resolutions

I don’t usually do New Year’s resolutions. But in the spirit of the several posts from people looking back and looking forwards I thought I would offer a few. This being an open process there will be people to hold me to these so there will be a bit of encouragement there. This promises to be a year in which Open issues move much further up the agenda. These things are little ways that we can take this forward and help to build the momentum.

  1. I will adopt the NIH Open Access Mandate as a minimum standard for papers submitted in 2008. Where possible we will submit to fully Open Access journals but where there is not an appropriate journal in terms of subject area or status we will only submit to journals that allow us to submit a complete version of the paper to PubMed Central within 12 months.
  2. I will get more of our existing (non-ONS) data online and freely available.
  3. Going forward all members of my group will be committed to an Open Notebook Science approach unless this is prohibited or made impractical by the research funders. Where this is the case these projects will be publically flagged as non-ONS and I will apply the principle of the NIH OA Mandate (12 months maximum embargo) wherever possible.
  4. I will do more to publicise Open Notebook Science. Specifically I will give ONS a mention in every scientific talk and presentation I give.
  5. Regardless of the outcome of the funding application I will attempt to get funding to support an international meeting focussed on developing Open Approaches in Research.

Beyond the usual (write more papers, write more grants) I think that covers things. These should even be practical.

I hope all of those who have had a holiday have enjoyed it and that all those who have not are looking forward to one in the near future. I am looking forward to the New (Western, Calendar) Year. It promises to be an exciting one!

I am now off to cook lots of lovely Chinese food (and yes I know that is calendarically inappropriate – but it will still taste good!). Happy New Year!

A big few weeks for open (notebook) science

So while I have been buried in the paper- and lab-work there has been quite a lot of interesting stuff going on. Pedro Beltrao has started an Open Notebook style project at Google Code which he describes in a post on Public Ramblings. This in interesting, because once again someone is using a different system as an Open Notebook. We have Wiki’s, Blogs, TeX based documents, and now, software version repositories being used. As Jean-Claude Bradley has said and we have discussed we have a lot to learn from exploring different systems, both in terms of understanding the benefits and limitations of specific systems on the way to designing and implementing better ones, but also from the perspective of what this tells us about how we do our science, and how this differs from discipline to discipline. Indeed, there already seems to be a place where this discussion has started in Pedro’s system. It is great to see this going forward and also great to see other members of the community, including Bill Hooker and Michael Barton already getting in and getting their hands dirty. I only wish I could contribute a bit more on the science itself.

Also good is the publicity that Open Notebooks and Open Notebook Science are getting. An article in Chemistry World, the member’s journal of the Royal Society of Chemistry, features UsefulChem, and discussion from Peter Murray-Rust, Steve Bachrach and others. Our efforts at Southampton even get a mention! What is good about this is not so much the personal publicity but that the mainstream ‘industry’ journals are increasingly starting to pick up the story. Not so long ago there was the article in Wired; Chemistry World has also recently discussed the issues associated with openness in a reasonably balanced manner (see also Peter Suber and Peter Murray-Rust’s commentaries).

In addition there is good coverage on the web. Rosie Redfield’s lab pages got featured by David Ng on World’s Fair on Science Blogs which was also picked up at BoingBoing (thanks to Neil Saunders for bringing this to my attention). Momentum is building as Neil says. The issues are becoming mainstream and the benefits are starting to flow through in specific cases. This is how things start to change. The challenge is in maintaining this forward momentum as it builds.

e-science for open science – an EPSRC research network proposal

The UK Engineering and Physical Sciences Research Council currently has a call out for proposals to fund ‘Network Activities’ in e-science. This seems like an opportunity to both publicise and support the ‘Open Science’ agenda so I am proposing to write a proposal to ask for ~£150-200k to fund workshops, meetings, and visits between different people and groups. The money could fund people to come to meetings (including from outside the UK and Europe) but could not be used to directly support research activities. The rationale for the proposal would be as follows.

  • ‘Open Science’ has the potential to radically increase the efficiency and effectiveness of research world wide.
  • The community is disparate and dispersed with many groups working on different approaches that do not currently interoperate – agreeing some interchange or tagging standards may enable significant progress
  • Many of those driving the agenda are early career scientists including graduate students and postdocs who do not have independent travel funds and whose PI may not have resources to support attending meetings where this agenda is being developed
  • There is significant interest from academics, some publishers, software and tool developers, and research funders in making more data freely available but limited concensus on how to take this forward and thus far an insufficient committment of resources to make this possible in practice

The proposal would be to support 2-3 meetings over three years, including travel costs, and provide funds for exchange visits. What I would like from the community is an expression of interest, specifically the committment to write a letter of support saying you would like to be involved. It would be great to get these from tenured academics, early career academics, graduate students and PDRAs, publishers (NPG? PLoS?), library and repository people (UKOLN, Simile, others?) and anyone else who is relevant.

The timeline is tight (due Tuesday next week) but if there is enough interest I will push through to get this done. I propose to write the grant in the open and online so will post a Google Doc or OpenWetWare page as soon as I have something to put up. Any help people can offer on the writing would be appreciated. In the meantime please drop comments below. I will be pointing to this page in the grant proposal.

An experiment in open notebook science – Sortase mediated protein-DNA ligation

In a recent post I extolled the possible virtues of Open Notebook Science in avoiding or ameliorating the risk of being scooped. I also made a virtue of the fact that being open encourages you to take a more open approach; that there is a virtuous circle or positive feedback. However much of this is very theoretical. We don’t have good case studies to point at that show that Open Notebook Science generates positive outcomes in practice. To take a more cynical perspective where is the evidence that I am willing to take risks with valuable data? My aim with this post is to do exactly that, put something out there that is (as far as I know) new and exciting, and kick off a process that may help us to generate a positive example.

I mentioned in the previous post that we have been scooped not once, but twice, on this project. I will come back to the second scooping later but my object here is to try and avoid getting scooped a third time. As I mentioned in the previous post we are using the S. aureus Sortase enzyme to attach a range of molecules to proteins. We have found that this provides a clean, easy, and most importantly general method for attaching things to proteins. Labelling of proteins, attaching proteins to solid supports, and generating various hybrid-protein molecules has a very wide range of applications and new and easy to use methods are desperately needed. We have recently published[1] the use of this to attach proteins to solid supports and others have described the attachment of small molecules[2], peptides[3], PNA[4], PEG[5] and a range of other things.

One type of protein-conjugate that is challenging to generate is one in which a protein is linked to a DNA molecule. Such conjugates have a wide range of potential applications particularly as analytical tools where the very strong and selective binding that can often be found in a protein is linked to the wide range of extremely sensitive techniques available for DNA detection and identification[6]. Such techniques have been limited because it is difficult to find a general and straightforward technique for making such conjugates.

We have used our Sortase mediated ligation to successfully attach oligonucleotides to proteins and I have put up the data we have that supports this in my lab book (see here for an overview of what we have and here for some more specific examples with conditions). I should note that some of this is not strictly open notebook science because this is data from a student which I have put up after the event.

We are confident that it is possible to get reasonable yields of these conjugates and that the method is robust and easy to apply. This is an exciting result with some potentially exciting applications. However to publish we need to generate some data on applications of these conjugates. One obvious target here is to use a DNA array and differently coloured fluorescent proteins attached to different oligonucleotides to form an image on the array. The problem is that we are not well set up to do this in my lab and don’t have the expertise or resources to do this experiment efficiently. We could do it but it seems to me that it would be quicker and more efficient for someone else with the expertise and experience to do this. In return they obviously get an authorship on the paper.

Other experiments we are interested in doing:

  • Analytical experiment using the binding of a protein-DNA conjugate that utilises the DNA part for detection.
  • Pull down of peptide-DNA conjugates onto an array after exposure of the peptides to a protease
  • Attachment of proteins to a full length PCR product containing the gene for the protein. Select one of the protein and then re-amplifity the desired gene. (I had a quick go at this but it didn’t work)

So what I am asking is this:

  • If any reader of this blog is interested in doing these (or any other) experiments to aid us in getting the published paper then get in touch
  • If you feel so inclined then publicise this call wider on your own blog and let’s see whether using the blogosphere to make contacts can really aid the science

We will send the reagents to anyone who would like to do the experiments along with any further information required. In principle people ought to be able to figure out everything they need from the lab book but this will probably not be the case in practise. The idea here is to see whether this notion of a loose collaboration of groups with different resources and expertise that is driven by the science can work and whether it is a competitive way of doing science.

My criteria in accepting collaborators will be as follows:

  1. Willingness to adopt an Open Notebook Science approach for this experiment (ideally using our lab book system but not necessarily)
  2. Interest in and willingness to engage in the development of the published paper (including proposing and/or carrying out any new experiments that would be cool to include)
  3. Ability to actually carry out the experiment in reasonable time (ideally looking for a couple of months here)

So this is notionally a win-win situation for me. We will be getting on and doing our own thing as well but by working with other groups we may be able to get this paper out more efficiently and effectively. Maybe others will come up with clever experiments that would add to the value of the paper. The worst case scenario is that someone comes along and sees this, copies the results, and publishes ahead of us. The best case scenario is that someone else already working in a similar direction may come across this and propose working together on this.

In any case, the results promise to be interesting…

References:

[1] Chan et al, 2007, Covalent attachment of proteins to solid supports via Sortase-mediated ligation, PLoS ONE, e1164

[2] Popp et al, 2007, Sortagging: a versatile method for protein labelling, Nat Chem Biol, 3:707

[3] Mao et al, 2004, Sortase-mediated protein ligation: a new method for protein engineering, J Am Chem Soc, 126:2670

[4] Pritz et al, 2007, Synthesis of biologically active peptide nucleic acid-peptide conjugates by sortase-mediated ligation, J Org Chem, 72:3909

[5] Parasarathy et al, 2007, Sortase A as a novel molecular “stapler” for sequence specific protein conjugation, Bioconj Chem, 18:469

[6] Barbulis et al, 2005, Using protein-DNA chimeras to detect and count small numbers of molecules, Nature Methods, 2:31

Getting scooped…

I have been waiting to write this post for a while. The biggest concern expressed when people consider taking on an Open Notebook Science approach is that of being ‘scooped’. I wanted to talk about this potential risk using a personal example where my group was scooped but I didn’t want to talk about someone else’s published paper until the paper on our work was available for people to compare. Our paper has just gone live at PLoS ONE so you will be able to compare the two sets of results.

Attaching proteins in a site selective manner to solid supports is a challenging problem. A general approach to attaching proteins to resin beads or planar surfaces while retaining function would have applications in chemical catalysis, analytical devices, and the generation of protein microarrays.

We established in my laboratory that the Sortase enzyme of S. aureus was an effective way of attaching functional proteins to solid supports in about March 2006. This was before I started taking up ONS and as the student is finishing up the project has not been moved onto an ONS basis so the data was not made available when we had it. We delayed publishing this as we attempted to generate a ‘pretty picture’ in which we would create the Southampton University logo in fluorescent protein on a glass surface. The idea of this was to make it more likely that we would get the paper into a higher ranked journal but ultimately we were unsuccessful.

In March 2007 we were scooped by a paper in Bioconjugate Chemistry (1). This paper, amongst other things, included an experiment that was very similar to the core experiment in our data (2). I should emphasise that there is absolutely no suggestion that this group ‘stole’ our data. They were working independently and were probably doing their experiments at about the same time as we did ours.

The first point here is that in the vast majority of cases being scooped is not about theft but about the fact that a good idea is an idea that is likely to occur to more than one person. It is essentially about not being first to get to publication. I can argue that I had the idea some years ago – but we didn’t get on to the work until 2006 and we’ve only just managed to get it published.

The second point is that our work is clearly different enough from Parthasarathy et al to be published. This is often the case. Indeed we have recently been scooped again on a different aspect of this project (3) but I expect we will still be able to publish as our data is again complementary to that reported.

So, from the perspective of traditional publication we were scooped because we didn’t publish fast enough. We can’t claim any precedence because we weren’t taking an ONS approach that would support this claim. But let us consider what would have happened if we had taken an ONS approach. I think there are a series of possible outcomes;

  1. It is possible, or even likely, that the other group may not have noticed our results at all. Under these circumstances we would at least be able to claim precedence.
  2. The other group may have seen our results and been spurred into more rapid publication. Again we would have been able to claim precedence but also there would be a record of the visit. I suspect this is the most common route to being scooped. In most cases results are not ‘copied’ from e.g. conference presentations but much more often the fact that someone is close to publication spurs another group to get their work published first.
  3. The most positive outcome is that, having seen we had some similar results, the other group may have got in contact and we could have put the results together to make a better paper.

Outcome 3) may seem unlikely but it really is the best outcome for everyone. Pathasarathy et al published in Bioconjugate Chemistry and we will publish in PLoS ONE after chasing around a number of other journals. If we had combined the results and, possibly more importantly, the resources to hand we probably could have put together a much better paper. This could possibly have gone to a significantly higher ranked journal. Apart from possible arguments over first and corresponding authorship everyone would have been better off.

This is the promise of being open as well as practising Open Notebook Science. By cooperating we can do a lot better. Being open has its risks but equally there are significant potential benefits including doing better science, better publications, and better career prospects as a result.

But let us now put the shoe on the other foot. What if the other group had made their data available? Would I have rushed out our paper to prevent them getting in first? It is one thing to advocate openness but would I really have gotten in touch with them myself? The answer is that 12 months ago I probably wouldn’t have got in contact. I would have pushed the student to work 24 hours a day and got our own paper out as fast as possible with whatever data we had to hand. I probably would not have contacted the other group. And we may have cut corners to get the data together, missed out controls that we know would work but didn’t have time to do and glossed over any possible issues.

But today, faced with the same dilemma I would get in touch with them and propose combining our data. Why the change? Partly because I have spent the past 12 months considering the issues around being open. But a strong contributor is that if I didn’t I would be exposing myself to criticism as a hypocrite. I have come to think that one of the real benefits of ‘being open’ is that being exposed means you hold yourself to higher standards precisely because being out in the open means that people have the evidence to judge you on.

I find that as I do my experiments and record them I take more care, I describe them more clearly, and I take more care to preserve and index the data properly. More generally I feel more inclined to share my ideas and preliminary results with others. And part of this is because I am aware that double standards will be obvious to anyone who is looking. Standards and discipline in maintaining them make for better science and for better people. Anyone who is honest with themselves knows that sometimes, somewhere, there is a temptation to cut corners. We all need help in maintaining discipline and being open is a very effective way of doing it.

It may sound a bit over the top but I actually feel like a better person for taking this approach. So for all the sceptics out there, and particularly for those academics with blood pressure issues, I recommend you try throwing the doors open. The fresh air is a bit bracing but it will do you the world of good.

  1. Parthasarathy R, Subramanian S, Boder ET (2007) Sortase A as a novel molecular “stapler” for sequence-specific protein conjugation. Bioconjug Chem 18:469-76
  2. Chan L, Cross HF, She JK, Cavalli G, Martins HFP, Neylon C (2007) Covalent attachment of proteins to solid supports and surfaces via Sortase-mediated ligation, PLoS ONE 2(11): e1164 doi:10.1371/journal.pone.0001164
  3. Popp et al., (2007) Sortagging: a versatile method for protein labelling. Nat Chem Biol Sep 23 (Epub ahead of print)