An abstract for the International Meeting on E-social Sciences

I have said before that I think we could benefit from the involvement of social scientists in understanding the possible cultural issues involved in the move towards more open practises. To this end we are submitting an abstract for the 4th International Meeting on e-social Science to present a ‘short paper’. I’ve put the abstract below: the deadline is next Monday (4th February). If you have any comments and/or would like to be included as an author on the paper. I am a bit pressed for time this week and google services seem slow this morning so I will probably stick to using comments from here rather than using Google Docs. Any/all comments welcome.

The Effect of Network Size and Connectivity on Open Notebook Approaches to Scientific Research: The view from the inside

Cameron Neylon with contributions from the Open Notebook Science Collective

A small but growing group of researchers in the physical and biological scientists are interested in developing and applying open approaches to their research practise. The logical extreme of this approach is ‘Open Notebook Science’ a term coined by Jean-Claude Bradley to refer to the practise of making the raw data from an experimental laboratory available as soon as practicable after it is generated. The promise of such open approaches is that loose coalitions of scientists can aggregate around specific problems according to interest, expertise, and resource availability and that such an approach can allow significantly more rapid solutions to problems to be developed. Specific recent examples of such approaches include the aggregation of a group of significant size to rapidly (five days) prepare a full scale grant application, attempts, successful and unsuccessful to identify collaborators to provide specific experimental capabilities to allow the completion of experimental results, and requests for experts to examine specific chemical datasets to identify potential errors. We will describe the experience of these different examples from the inside as well as the tools and resources used; their usefulness and limitation. The key observation is that successful application of these approaches requires a critical mass of interested scientists with sufficient times to provide a large enough pool of resources to solve the problem and that the network be sufficiently well connected for requests to be routed to the those best suited to help. In most cases the record of these efforts are fully publically available and may provide useful data for social science research in this area.

Open Science and the developing world: Good intentions, bad implementation?

I spent last week in Cuba. I was there on holiday but my wife (who is a chemistry academic) was on a work trip to visit collaborators. This meant I had the opportunity to talk to a range of scientists and to see the conditions they work under. One of the strong arguments for Open Science (literature access, data, methods, notebooks) is that it provides access to scientists in less priviledged countries to both peer reviewed research as well as to the details of methodology that can enable them to carry out their science. I was therefore interested to see both what was available to them and whether they viewed our efforts in this area as useful or helpful. I want to emphasise that these people were doing good science in difficult circumstances by playing to their strengths and focussing on achievable goals. This is not second rate science, just science that is limited by access to facilities, reagents, and information.

Access to the literature

There is essentially no access to the subscriber-only literature.  Odd copies of journal issues are highly valued and many people get by by having visiting positions at institutes in the developed world. I talked to a few people about our protein ligation work and they were immensely grateful that this was published in an open access journal. However they were uncertain about publishing in open access journals due to the perceived costs.  While it is likely that they could get such costs waived I believe there is an issue of pride here in not wishing to take ‘charity’. Indeed, in the case of Cuba it may be illegal for US based open access publishers to provide such assistance. It would be interesting to know whether this is the case.

Overall though, it is clear that acccess to the peer reviewed literature is a serious problem for these people.  Open Access publishing provides a partial solution to this problem. I think to be effective it is important that this not be limited to self archving, as for reasons I will come back to, it is difficult for them to find such self archived papers. It is clear that mandating archival on a free access repository can help.

Access to primary data

Of more immediate interest to me was whether people with limited access to the literature saw value in having free access to the primary data in open notebooks. Again, people were grateful for the provision of access to information as this has the potential to make their life easier. When you have limited resources it is important to make sure that things work and that they produce publishable results. Getting details information on methodology of interest is therefore very valuable. Often the data that we take for granted is not available (fluorescence spectra, NMR, mass spectrometry) but details like melting points, colours, retention times can be very valuable.

There were two major concerns; one is a concern we regularly see, that of information overload. I think this is less of a concern as long as search engines make it possible to find information that is of interest. Work needs to be done on this but I think it is clear that some sort of cross between Google Scholar and Amazon’s recommendation system/Delicious etc. (original concept suggested by Neil Saunders) can deal with this.  The other concern, relating to them adopting  such approaches, was one that we have seen over and over again, that of ‘getting scooped’. Here though the context is subtley different and there is a measure of first world-developing world politics thrown in. These scientists are, understandably, very reluctant to publicise initial results because the way they work is methodical and slow. Very often the key piece of data required to make up a paper can only be obtained on apparatus that is not available in house or requires lengthy negotiations with potential overseas collaborators. By comparison it would often be trivially easy for a developed world laboratory to take the initial results and turn out the paper.

The usual flip side argument holds here; by placing an initial result in the public domain it may be easier for them to find a collaborator who can finish of the paper but I can understand their perspective. These are people struggling against enormous odds to stake out a place for themselves in the scientific community. The first world does not exactly have an outstanding record on acknowledging or even valuing work in developing countries so I can appreciate a degree of scepticism on their part. I hope that this may be overcome eventually but given that the assumption of most people in my own community is that by being open we are bound to be shafted I suspect we need to get our own house in order first.

The catch…

All of this is well and good. There are many real and potential benefits for scientists in the developing world if we move to more open styles of science communication. This is great, and I think it is a good argument for more openness. However there is a serious problem with the way we present this information and our reliance on modern web tools to do it. Its a very simple problem: bandwidth.

All of our blogs, our data, and indeed the open access literature is very graphics heavy. I actually tried to load up the front page of openwetware.org while sitting at the computer of the head of the department my wife was visiting (the department has two networked computers). Fifteen minutes later it was still loading.  The PLoS One front page was similarly sluggish. I get irritated if my download speeds drop below 500K/second, at home, and I will give up if they go down to 100K. We were seeing download rates of 44 bytes/second at the worst point. In some cases this can even make search engines unuseable making it near impossible to track down the self-archived versions of papers. Cuba is perhaps a special case because the US embargo means they have no access to the main transatlantic and North American cables, in effect the whole country is on a couple of bundles of phone lines, but I suspect that even while access is becoming more pervasive the penetration of reasonable levels of bandwidth is limited in the developing world.

The point of this is that access is about more than just putting stuff up, it is also about making it accessible. If we are serious about providing access, and expanding our networks to include scientists who do not have the advantages that we have, then this necessarily includes thinking about low bandwidth versions of the pages that provide information. I looked through PLoS One, openwetware, BioMedCentral, and couldn’t find a ‘text only version’ button on any of them (to be fair there isn’t one on our lab blog either).  I appreciate the need to present things in an appealling and useful format, and indeed the need to place advertising to diversify revenue streams. I guess the main point is not to assume that by making it available, that you are necessarily making it accessible. If universal accessibility is an important goal then some thought needs to go into alternative presentations.

Overall I think there are real benefits for these scientists when we make things available. The challenges shouldn’t put us off doing it but perhaps it is advisable to bear in mind the old saw; If you want to help people, make sure you find out what they need first.

Some New Year’s resolutions

I don’t usually do New Year’s resolutions. But in the spirit of the several posts from people looking back and looking forwards I thought I would offer a few. This being an open process there will be people to hold me to these so there will be a bit of encouragement there. This promises to be a year in which Open issues move much further up the agenda. These things are little ways that we can take this forward and help to build the momentum.

  1. I will adopt the NIH Open Access Mandate as a minimum standard for papers submitted in 2008. Where possible we will submit to fully Open Access journals but where there is not an appropriate journal in terms of subject area or status we will only submit to journals that allow us to submit a complete version of the paper to PubMed Central within 12 months.
  2. I will get more of our existing (non-ONS) data online and freely available.
  3. Going forward all members of my group will be committed to an Open Notebook Science approach unless this is prohibited or made impractical by the research funders. Where this is the case these projects will be publically flagged as non-ONS and I will apply the principle of the NIH OA Mandate (12 months maximum embargo) wherever possible.
  4. I will do more to publicise Open Notebook Science. Specifically I will give ONS a mention in every scientific talk and presentation I give.
  5. Regardless of the outcome of the funding application I will attempt to get funding to support an international meeting focussed on developing Open Approaches in Research.

Beyond the usual (write more papers, write more grants) I think that covers things. These should even be practical.

I hope all of those who have had a holiday have enjoyed it and that all those who have not are looking forward to one in the near future. I am looking forward to the New (Western, Calendar) Year. It promises to be an exciting one!

I am now off to cook lots of lovely Chinese food (and yes I know that is calendarically inappropriate – but it will still taste good!). Happy New Year!

Following up on data storage issues

There were lots of helpful comments on my previous post as well as some commiseration from Peter Murray-Rust. Also Jean-Claude Bradley’s group is starting to face some similar issues with the combi-Ugi project ramping up. All in the week that the Science Commons Open Data protocol is launched. I just wanted to bring out a few quick points:

The ease with which new data types can be incorporated into UsefulChem, such as the recent incorporation of a crystal structure (see also JC’s Blog Post), shows the flexibility and ease provided by an open ended and free form system in the context of the Wiki. The theory is that our slightly more structured approach provides more implicit metadata, but I am conscious that we have yet to demonstrate the extraction of the metadata back out in a useful form.

Bill comments:

…I think perhaps the very first goal is just getting the data out there with metadata all over it saying “here I am, come get me”.

I agree that the first thing is to simply get the data up there but the next question out of this comment must be how good is our metadata in practise? So for instance, can anyone make any sense out of this in isolation? Remember you will need to track back through links to the post where this was ‘made’. Nonetheless I think we need to see this process through to its end. The comparison with UsefulChem is helpful because we can decide whether the benefits of our system outweigh the extra fiddling invovled, or conversely how much do we have to make the fiddling less challenging to make it worthwhile. At the end of the day, these are experiments in the best approaches to doing ONS.

Things that do make our life easier are an automatic catalogue of input materials. This, and the ability to label things precisely for storage is making a contribution to the way the lab is running. In principal something similar can be achieved for data files. The main distinction at the moment is that we generate a lot more data files than samples so handling them is more logistically difficult.

Jean-Claude and Jeremiah have commented further on Jeremiah’s Blog on some of the fault lines between computational and experimental scientists. I just wanted to bring up a comment made by Jeremiah;

It would be easier to understand however, if you used more common command-based plotting programs like gnuplot, R, and matlab.

This is quite a common perception. ‘If you just used a command line system you could simply export the text file’. The thing is that, and I think I speak for a lot of wet biologists and indeed chemists, that we simply can’t be bothered. It is too much work to learn these packages and fighting with command lines isn’t generally something we are interested in doing – we’d rather be in the lab.

One of the very nice things about the data analysis package I use, Igor Pro, is that it has a GUI built but it also translates menu choices and mouse actions into a command line at the bottom of the screen. What is more it has a quite powerful programming language which uses exactly the same commands. You start using it by playing with the mouse, you become more adept at repeating actions by cutting and pasting stuff in the command line and then you can (almost) write a procedure by pasting a bunch of lines into a procedure file. It is, in my view, the outstanding example of a user interface that not only provides functionality for the novice and expert user in a easily accesible way, it also guides the novice into becoming a power user.

But for most applications we can’t be bothered (or more charitably don’t have the time) to learn MatLab or Perl or R or GnuPlot (and certainly not Tex!). Perhaps the fault line lies on the division between those who prefer to use Word rather than Tex. One consequence of this is that we use programs that have an irritating tendency to have proprietary file systems. Usually we can export a text file or something a bit more open. But sometimes this is not possible. It is almost always an extra step, an extra file to upload, so even more work. Open document formats are definitely a great step forward and XML file types are even better. But we are a bit stuck in the middle of slowly changing process.

None of this is to say that I think we shouldn’t put the effort in, but more to say, that from the perspective of those of us who really don’t like to code, and particularly those of us generating data from ‘beige box’ instruments the challenge of ‘No insider information’ is even harder. As Peter M-R says, the glueware is both critical, and the hardest bit to get right. The problem is, I can’t write glueware, at least not without sticking my fingers to each other.

The problem with data…

Our laboratory blog system has been doing a reasonable job of handling protocols and simple pieces of analysis thus far. While more automation in the posting would be a big benefit, this is more a mechanical issue than a fundamental problem. To re-cap our system is that every “item” has its own post. Until now these items have been samples, or materials. The items are linked by posts that describe procedures. This system provides a crude kind of triple; Sample X was generated using Procedure A from Material Z. Where we have some analytical data, like a gel, it was generally enough to drop that in at the bottom of the procedure post. I blithely assumed that when we had more complicated data, that might for instance need re-processing, we could treat it the same way as a product or sample.

By co-incidence both Jenny and I have generated quite a bit of data over the last few weeks. I did a Small Angle Neutron Scattering (SANS) experiment at the ILL on Sunday 10 December, and Jenny has been doing quite a lot of DNA sequencing for her project. To deal with the SANS data first; the raw data is a non-standard image format. This image needs a significant quantity of processing which uses at least three different background measurements. I did a contrast variation series, which means essentially repeating the experiment with different proportions of H2O and D2O, each of which require their own set of backgrounds.

Problem one is just that this creates a lot of files. Given that I am uploading these by hand you can see here, here and here (and bearing mind that I still have these ones and five others to do), that this is going to get a bit tiring. Ok, so this is an argument for some scripting. However what I need to do is create a separate post for all 50-odd data files. Then I need to describe the data reduction, involving all of these files, down to the relatively small number of twelve independent data files (each with their own post). All of this ‘data reduction’ is done on specially written software, and is generally done by the instrument scientist supporting the experiment so describing it is quite difficult.

Then I need to actually start on the data analysis. Describing this is not straightforward. But it is a crucial part of the Open Notebook Science programme. Data is generally what it is – there is not much argument about it. It is the analysis where the disagreement comes in – is it valid, was it done properly, was the data appropriate? Recording the detail of the analysis is therefore crucial. The problem is that the data analysis for this involves fiddling. Michael Barton put it rather well in a post a week or so ago;

It would be great, every week, to write “Hurrah! I’ve discovered to this new thing to do with protein cost. Isn’t it wonderful?”. However, in the real world it’s “I spent three days arguing with R to get it to order the bars in my chart how I want”.

Data analysis is largely about fiddling until we get something right. In my case I will be writing some code (desperate times call for desperate measures) to deconvolute the contributions from various things in my data. I will be battling, not with R but with a package called Igor Pro. How do I, or should I, record this process? SVN/Sourceforge/Google Code might be a good plan but I’m no proper coder – I wouldn’t really know what to do with these things. And actually this is a minor part of the problem, I can at least record the version of the code whenever I actually use it.

The bigger problem is actually capturing the data analysis itself. As I said, this is basically fiddling with parameters until they look right. Should I attempt to capture the process by which I refine the paramaters? Or just the final values? How important is it to capture the process. I think there is at core here the issue that divides the experimental scientists from the computational scientist. I’ve never met a primarily computer based scientists that kept a notebook in a form that I recognised. Generally there is a list of files, perhaps some rough notes on what they are, but there is a sense that the record is already there in those files and that all that is really required is a proper index. I think this difference was at the core of the disagreement over whether the Open NMR project is ONS – we have very different views of what we mean by notebook and what it records. All in all I think I will try to output log files of everything I do and at least put those up.

In the short term I think we just need to swallow hard and follow our system to its logical conclusion. The data we are generating makes this a right pain to do it manually but I don’t think we have actually broken the system per se. We desperately need two things to make this easier. Some sort of partly automated posting process, probably just a script, maybe even something I could figure out myself. But for the future we need to be able to run programs that will grab data themselves and then post back to blog. Essentially we need a web service framework that is easy for users to integrate into their own analysis system. Workflow engines have a lot of potential here but I am not convinced they are sufficiently useable yet. I haven’t managed to get Taverna onto my laptop yet – but before anyone jumps on me I will admit I haven’t tried very hard. On the other hand that’s the point. I shouldn’t have to.

If I have time I will get on to Jenny’s problem in another post. Here the issue is what format to save the data in and how much do we need to divide this process up?

Seeking advice and resources on Open Notebook Science

The following comment was posted to the ‘About‘ page by Sharon Sonenblum from Georgia Tech. Rather than leave it there where people might not see it I thought I would bring it to the front for everyone’s attention.

‘I’m looking for some resources or direction for diving into open notebook science. I have been interested in the concept for quite some time and recently began following this blog and a few others. I am excited to see that ONS is real and growing, but I’m not sure the best places to start. I want to find out what other folks are doing, what software they are using and what has and has not worked. I also would love to chat with anyone doing research with human subjects to figure out how IRB restrictions play out in ONS.’

Hi Sharon, great to see people interested in ONS! I am sure others will offer comments and suggestions but I will put my tuppence in first. My main suggestion would be to dive in and see what works for you, within the limitations of what you can do. Depending on the kind of work you are doing and how you are already recording it there are a range of options. As I mentioned in yesterday’s post there are as many different approaches to ONS as there are people doing it. We are definitely at the stage of exploring what is possible, what works, and there is plenty of discussion and indeed disagreement over what the best approach is.

There are really two places you could start. The easiest, and possibly the safest way to dip your toes into the water, is to start up a blog that discusses your lab work in general. There are good examples of this kind of approach with Rosie Redfield’s lab being one of the main proponents (see also Michael Barton’s blog). This can be, but is not necessarily, Open Notebook Science as defined by Jean-Claude Bradley. From what you say there may be real issues with you making your primary data available. If it involves human subjects then I would imagine it will be very difficult, if not impossible, to make the raw data available due to ethical considerations. Certainly I would expect that any review board would require that any data that was released was anonymised and that subjects understood exactly what the release conditions would be. I am no expert in ethics and we don’t (as far as I know) have anyone in the ONS community who is dealing with either human or animal subjects. This is an area that I think is important and that we have yet to explore in detail; if we believe that some science (say chemistry) should be fully open but that some (e.g small scale drug trials) cannot be then can we draw clear boundaries? I don’t know the answer but clearly some care is required with this.

If you can get clearance to go fully to ONS then there are a range of options. I would say it depends a lot on what sort of data you are dealing with. Take a look at your existing lab book and see what it looks like. Is it an electronic document already? Could you simply put that online? Is it an index to a set of data files, spreadsheets, graphs, analysis? If so a Wiki may be the best approach and using a free hosted service, either Wikispaces as used by UsefulChem, or OpenWetWare, could be a good option. Here you can add data files and then add pages that describe, and index them, as well as pages for analysing and discussing the results. Is your lab book more of a journal? Then a Blog may be the best approach, although you need to be careful here about date stamps as many blog engines allow you to change the datestamp. We use an in house developed blog at Southampton that gets around some of these problems but this is definitely an alpha to beta stage product.

Finally, make sure you discuss it with the people around you. Many scientists are deeply uncomfortable with the whole idea of making the lab notebook available. Be sure that you understand and take into account any concerns. In some cases they may not be valid concerns but as with anything there are real risks with the open notebook approach. Take the opportunity to understand any concerns and be prepared to argue where you think they are unjustified, but in a constructive way. Hopefully you can find good discussion points on this blog, at UsefulChem, Open Reading Frames (see also Bill’s excellent three part series at 3 Quark’s Daily), petermr’s blog, Jeremiah Faith’s blog, Michael Barton’s blog, What you’re doing is rather desperate, Public Ramblings, BBGM…who have I missed?

Good luck and keep us updated! The best thing about ONS is the conversations that can get started.

A big few weeks for open (notebook) science

So while I have been buried in the paper- and lab-work there has been quite a lot of interesting stuff going on. Pedro Beltrao has started an Open Notebook style project at Google Code which he describes in a post on Public Ramblings. This in interesting, because once again someone is using a different system as an Open Notebook. We have Wiki’s, Blogs, TeX based documents, and now, software version repositories being used. As Jean-Claude Bradley has said and we have discussed we have a lot to learn from exploring different systems, both in terms of understanding the benefits and limitations of specific systems on the way to designing and implementing better ones, but also from the perspective of what this tells us about how we do our science, and how this differs from discipline to discipline. Indeed, there already seems to be a place where this discussion has started in Pedro’s system. It is great to see this going forward and also great to see other members of the community, including Bill Hooker and Michael Barton already getting in and getting their hands dirty. I only wish I could contribute a bit more on the science itself.

Also good is the publicity that Open Notebooks and Open Notebook Science are getting. An article in Chemistry World, the member’s journal of the Royal Society of Chemistry, features UsefulChem, and discussion from Peter Murray-Rust, Steve Bachrach and others. Our efforts at Southampton even get a mention! What is good about this is not so much the personal publicity but that the mainstream ‘industry’ journals are increasingly starting to pick up the story. Not so long ago there was the article in Wired; Chemistry World has also recently discussed the issues associated with openness in a reasonably balanced manner (see also Peter Suber and Peter Murray-Rust’s commentaries).

In addition there is good coverage on the web. Rosie Redfield’s lab pages got featured by David Ng on World’s Fair on Science Blogs which was also picked up at BoingBoing (thanks to Neil Saunders for bringing this to my attention). Momentum is building as Neil says. The issues are becoming mainstream and the benefits are starting to flow through in specific cases. This is how things start to change. The challenge is in maintaining this forward momentum as it builds.

Research network proposal – Update III

The text of the proposal is now in a near complete form. I need to add references and a few others things but it is mostly in reasonable shape. If you would like to have your name included as a founder member of the network please drop me a comment on this post, email, or if I have given you editing rights then feel free to add yourself. If you do so please send or post some sort of document that I can take a version of and incorporate as a letter of support.

If you would like editing rights either comment on the original post or drop me an email. In principle all the other commentors should also be able to give you editing rights if I am unavailable (e.g. asleep). I will take a snapshot of the proposal text around 6am GMT tomorrow morning and will need to edit it and add some pictures offline before incorporating it into the whole proposal. The full proposal will be submitted tomorrow and I will put a complete PDF up as soon as I can get to it. If I can gain permission to do so I will also put up referees comments and any other correspondence in the fullness of time.

Thanks to everyone for their help.

e-science for open science – an EPSRC research network proposal

The UK Engineering and Physical Sciences Research Council currently has a call out for proposals to fund ‘Network Activities’ in e-science. This seems like an opportunity to both publicise and support the ‘Open Science’ agenda so I am proposing to write a proposal to ask for ~£150-200k to fund workshops, meetings, and visits between different people and groups. The money could fund people to come to meetings (including from outside the UK and Europe) but could not be used to directly support research activities. The rationale for the proposal would be as follows.

  • ‘Open Science’ has the potential to radically increase the efficiency and effectiveness of research world wide.
  • The community is disparate and dispersed with many groups working on different approaches that do not currently interoperate – agreeing some interchange or tagging standards may enable significant progress
  • Many of those driving the agenda are early career scientists including graduate students and postdocs who do not have independent travel funds and whose PI may not have resources to support attending meetings where this agenda is being developed
  • There is significant interest from academics, some publishers, software and tool developers, and research funders in making more data freely available but limited concensus on how to take this forward and thus far an insufficient committment of resources to make this possible in practice

The proposal would be to support 2-3 meetings over three years, including travel costs, and provide funds for exchange visits. What I would like from the community is an expression of interest, specifically the committment to write a letter of support saying you would like to be involved. It would be great to get these from tenured academics, early career academics, graduate students and PDRAs, publishers (NPG? PLoS?), library and repository people (UKOLN, Simile, others?) and anyone else who is relevant.

The timeline is tight (due Tuesday next week) but if there is enough interest I will push through to get this done. I propose to write the grant in the open and online so will post a Google Doc or OpenWetWare page as soon as I have something to put up. Any help people can offer on the writing would be appreciated. In the meantime please drop comments below. I will be pointing to this page in the grant proposal.

An experiment in open notebook science – Sortase mediated protein-DNA ligation

In a recent post I extolled the possible virtues of Open Notebook Science in avoiding or ameliorating the risk of being scooped. I also made a virtue of the fact that being open encourages you to take a more open approach; that there is a virtuous circle or positive feedback. However much of this is very theoretical. We don’t have good case studies to point at that show that Open Notebook Science generates positive outcomes in practice. To take a more cynical perspective where is the evidence that I am willing to take risks with valuable data? My aim with this post is to do exactly that, put something out there that is (as far as I know) new and exciting, and kick off a process that may help us to generate a positive example.

I mentioned in the previous post that we have been scooped not once, but twice, on this project. I will come back to the second scooping later but my object here is to try and avoid getting scooped a third time. As I mentioned in the previous post we are using the S. aureus Sortase enzyme to attach a range of molecules to proteins. We have found that this provides a clean, easy, and most importantly general method for attaching things to proteins. Labelling of proteins, attaching proteins to solid supports, and generating various hybrid-protein molecules has a very wide range of applications and new and easy to use methods are desperately needed. We have recently published[1] the use of this to attach proteins to solid supports and others have described the attachment of small molecules[2], peptides[3], PNA[4], PEG[5] and a range of other things.

One type of protein-conjugate that is challenging to generate is one in which a protein is linked to a DNA molecule. Such conjugates have a wide range of potential applications particularly as analytical tools where the very strong and selective binding that can often be found in a protein is linked to the wide range of extremely sensitive techniques available for DNA detection and identification[6]. Such techniques have been limited because it is difficult to find a general and straightforward technique for making such conjugates.

We have used our Sortase mediated ligation to successfully attach oligonucleotides to proteins and I have put up the data we have that supports this in my lab book (see here for an overview of what we have and here for some more specific examples with conditions). I should note that some of this is not strictly open notebook science because this is data from a student which I have put up after the event.

We are confident that it is possible to get reasonable yields of these conjugates and that the method is robust and easy to apply. This is an exciting result with some potentially exciting applications. However to publish we need to generate some data on applications of these conjugates. One obvious target here is to use a DNA array and differently coloured fluorescent proteins attached to different oligonucleotides to form an image on the array. The problem is that we are not well set up to do this in my lab and don’t have the expertise or resources to do this experiment efficiently. We could do it but it seems to me that it would be quicker and more efficient for someone else with the expertise and experience to do this. In return they obviously get an authorship on the paper.

Other experiments we are interested in doing:

  • Analytical experiment using the binding of a protein-DNA conjugate that utilises the DNA part for detection.
  • Pull down of peptide-DNA conjugates onto an array after exposure of the peptides to a protease
  • Attachment of proteins to a full length PCR product containing the gene for the protein. Select one of the protein and then re-amplifity the desired gene. (I had a quick go at this but it didn’t work)

So what I am asking is this:

  • If any reader of this blog is interested in doing these (or any other) experiments to aid us in getting the published paper then get in touch
  • If you feel so inclined then publicise this call wider on your own blog and let’s see whether using the blogosphere to make contacts can really aid the science

We will send the reagents to anyone who would like to do the experiments along with any further information required. In principle people ought to be able to figure out everything they need from the lab book but this will probably not be the case in practise. The idea here is to see whether this notion of a loose collaboration of groups with different resources and expertise that is driven by the science can work and whether it is a competitive way of doing science.

My criteria in accepting collaborators will be as follows:

  1. Willingness to adopt an Open Notebook Science approach for this experiment (ideally using our lab book system but not necessarily)
  2. Interest in and willingness to engage in the development of the published paper (including proposing and/or carrying out any new experiments that would be cool to include)
  3. Ability to actually carry out the experiment in reasonable time (ideally looking for a couple of months here)

So this is notionally a win-win situation for me. We will be getting on and doing our own thing as well but by working with other groups we may be able to get this paper out more efficiently and effectively. Maybe others will come up with clever experiments that would add to the value of the paper. The worst case scenario is that someone comes along and sees this, copies the results, and publishes ahead of us. The best case scenario is that someone else already working in a similar direction may come across this and propose working together on this.

In any case, the results promise to be interesting…

References:

[1] Chan et al, 2007, Covalent attachment of proteins to solid supports via Sortase-mediated ligation, PLoS ONE, e1164

[2] Popp et al, 2007, Sortagging: a versatile method for protein labelling, Nat Chem Biol, 3:707

[3] Mao et al, 2004, Sortase-mediated protein ligation: a new method for protein engineering, J Am Chem Soc, 126:2670

[4] Pritz et al, 2007, Synthesis of biologically active peptide nucleic acid-peptide conjugates by sortase-mediated ligation, J Org Chem, 72:3909

[5] Parasarathy et al, 2007, Sortase A as a novel molecular “stapler” for sequence specific protein conjugation, Bioconj Chem, 18:469

[6] Barbulis et al, 2005, Using protein-DNA chimeras to detect and count small numbers of molecules, Nature Methods, 2:31