Policy for Open Science – reflections on the workshop

Written on the train on the way from Barcelona to Grenoble. This life really is a lot less exotic than it sounds… 

The workshop that I’ve reported on over the past few days was both positive and inspiring. There is a real sense that the ideas of Open Access and Open Data are becoming mainstream. As several speakers commented, within 12-18 months it will be very unusual for any leading institution not to have a policy on Open Access to its published literature. In many ways as far as Open Access to the published literature is concerned the war has been won. There will remains battles to be fought over green and gold routes – the role of licenses and the need to be able to text mine – successful business models remain to be made demonstrably sustainable – and there will be pain as the inevitable restructuring of the publishing industry continues. But be under no illusions that this restructuring has already begun and it will continue in the direction of more openness as long as the poster children of the movement like PLoS and BMC continue to be successful.

Open Data remains further behind, both with respect to policy and awareness. Many people spoke over the two days about Open Access and then added, almost as an addendum ‘Oh and we need to think about data as well’. I believe the policies will start to grow and examples such as the BBSRC Data Sharing Policy give a view of the future. But there is still much advocacy work to be done here. John Wilbanks talked about the need to set achievable goals, lines in the sand which no-one can argue with. And the easiest of these is one that we have discussed many times. All data associated with a published paper, all analysis, and all processing procedures, should be made available. This is very difficult to argue with – nonetheless we know of examples where the raw data of important experiments is being thrown away. But if an experiment cannot be shown to have been done, cannot be replicated and checked, can it really be publishable? Nonetheless this is a very useful marker, and a meme that we can spread and promote.

In the final session there was a more critical analysis of the situation. A number of serious questions were raised but I think they divide into two categories. The first involves the rise of the ‘Digital Natives’ or the ‘Google Generation’. The characteristics of this new generation (a gross simplification in its own right) are often presented as a pure good. Better networked, more sharing, better equipped to think in the digital network. But there are some characteristics that ought to give pause. A casualness about attribution, a sense that if something is available then it is fine to just take it (its not stealing after all, just copying). There is perhaps a need to recover the roots of ‘Mertonian’ science, to as I think James Boyle put it, publicise and embed the attitudes of the last generation of scientists, for whom science was a public good and a discipline bounded by strict rules of behaviour. Some might see this as harking back to an elitist past but if we are constructing a narrative about what we want science to be then we can take the best parts of all of our history and use it to define and refine our vision. There is certainly a place for a return to the compulsory study of science history and philosophy.

The second major category of issues discussed in the last session revolved around the question of what do we actually do now. There is a need to move on many fronts, to gather evidence of success, to investigate how different open practices work – and to ask ourselves the hard questions. Which ones do work, and indeed which ones do not. Much of the meeting revolved around policy with many people in favour of, or at least not against, mandates of one sort or another. Mike Carroll objected to the term mandate – talking instead about contractual conditions. I would go further and say that until these mandates are demonstrated to be working in practice they are aspirations. When they are working in practice they will be norms, embedded in the practice of good science. The carrot may be more powerful than the stick but peer pressure is vastly more powerful than both.

So they key questions to me revolve around how we can convert aspirations into community norms. What is needed in terms of infrastructure, in terms of incentives, and in terms of funding to make this stuff happen? One thing is to focus on the infrastructure and take a very serious and critical look at what is required. It can be argued that much of the storage infrastructure is in place. I have written on my concerns about institutional repositories but the bottom line remains that we probably have a reasonable amount of disk space available. The network infrastructure is pretty good so these are two things we don’t need to worry about. What we do need to worry about, and what wasn’t really discussed very much in the meeting, is the tools that will make it easy and natural to deposit data and papers.

The incentive structure remains broken – this is not a new thing – but if sufficiently high profile people start to say this should change, and act on those beliefs, and they are, then things will start to shift. It will be slow but bit by bit we can imagine getting there. Can we take shortcuts. Well there are some options. I’ve raised in the past the idea of a prize for Open Science (or in fact two, one for an early career researcher and one for an established one). Imagine if we could make this a million dollar prize, or at least enough for someone to take a year off. High profile, significant money, and visible success for someone each year. Even without money this is still something that will help people – give them something to point to as recognition of their contribution. But money would get people’s attention.

I am sceptical about the value of ‘microcredit’ systems where a person’s diverse and perhaps diffuse contributions are aggregated together to come up with some sort of ‘contribution’ value, a number by which job candidates can be compared. Philosophically I think it’s a great idea, but in practice I can see this turning into multiple different calculations, each of which can be gamed. We already have citation counts, H-factors, publication number, integrated impact factor as ways of measuring and comparing one type of output. What will happen when there are ten or 50 different types of output being aggregated? Especially as no-one will agree on how to weight them. What I do believe is that those of us who mentor staff, or who make hiring decisions should encourage people to describe these contributions, to include them in their CVs. If we value them, then they will value them. We don’t need to compare the number of my blog posts to someone else’s – but we can ask which is the most influential – we can compare, if subjectively, the importance of a set of papers to a set of blog posts. But the bottom line is that we should actively value these contributions – let’s start asking the questions ‘Why don’t you write online? Why don’t you make your data available? Where are your protocols described? Where is your software, your workflows?’

Funding is key, and for me one of the main messages to come from the meeting was the need to think in terms of infrastructure, and in particular, to distinguish what is infrastructure and what is science or project driven. In one discussion over coffee I discussed the problem of how to fund development projects where the two are deeply intertwined and how this raises challenges for funders. We need new funding models to make this work. It was suggested in the final panel that as these tools become embedded in projects there will be less need to worry about them in infrastructure funding lines. I disagree. Coming from an infrastructure support organisation I think there is a desperate need for critical strategic oversight of the infrastructure that will support all science – both physical facilities, network and storage infrastructure, tools, and data. This could be done effectively using a federated model and need not be centralised but I think there is a need to support the assumption that the infrastructure is available and this should not be done on a project by project basis. We build central facilities for a reason – maybe the support and development of software tools doesn’t fit this model but I think it is worth considering.

This ‘infrastructure thinking’ goes wider than disk space and networks, wider than tools, and wider than the data itself. The concept of ‘law as infrastructure’ was briefly discussed. There was also a presentation looking at different legal models of a ‘commons’; the public domain, a contractually reconstructed commons, escrow systems etc. In retrospect I think there should have been more of this. We need to look critically at different models, what they are good for, how they work. ‘Open everything’ is a wonderful philosophical position but we need to be critical about where it will work, where it won’t, and where it needs contractual protection, or where such contractual protection is counter productive. I spoke to John Wilbanks about our ideas on taking Open Source Drug Discovery into undergraduate classes and schools and he was critical of the model I was proposing, not from the standpoint of the aims or where we want to be, but because it wouldn’t be effective at drawing in pharmaceutical companies and protecting their investment. His point was, I think, that by closing off the right piece of the picture with contractual arrangements you bring in vastly more resources and give yourself greater ability to ensure positive outcomes. That sometimes to break the system you need to start by working within it by, in this case, making it possible to patent a drug. This may not be philosophically in tune with my thinking but it is pragmatic. There will be moments, especially when we deal with the interface with commerce, where we have to make these types of decisions. There may or may not be ‘right’ answers, and if there are they will change over time but we need to know our options and know them well so as to make informed decisions on specific issues.

But finally, as is my usual wont, I come back to the infrastructure of tools. The software that will actually allow us to record and order this data that we are supposed to be sharing. Again there was relatively little on this in the meeting itself. Several speakers recognised the need to embed the collection of data and metadata within existing workflows but there was very little discussion of good examples of this. As we have discussed before this is much easier for big science than for ‘long tail’ or ‘small science’. I stand by my somewhat provocative contention that for the well described central experiments of big science this is essentially a solved problem – it just requires the will and resources to build the language to describe the data sets, their formats, and their inputs. But the problem is that even for big science, the majority of the workflow is not easily automated. There are humans involved, making decisions moment by moment, and these need to be captured. The debate over institutional repositories and self archiving of papers is instructive here. Most academics don’t deposit because they can’t be bothered. The idea of a negative click repository – where this is a natural part of the workflow can circumvent this. And if well built it can make the conventional process of article submission easier. It is all a question of getting into the natural workflow of the scientist early enough that not only do you capture all the contextual information you want, but that you can offer assistance that makes them want to put that information in.

The same is true for capturing data. We must capture it at source. This is the point where it has the potential to add the greatest value to the scientist’s workflow by making their data and records more available, by making them more consistent, by allowing them to reformat and reanalyse data with ease, and ultimately by making it easy for them to share the full record. We can and we will argue about where best to order and describe the elements of this record. I believe that this point comes slightly later – after the experiment – but wherever it happens it will be made much easier by automatic capture systems that hold as much contextual information as possible. Metadata is context – almost all of it should be possible to catch automatically. Regardless of this we need to develop a diverse ecosystem of tools. It needs to be an open and standards based ecosystem and in my view needs to be built up of small parts, loosely coupled. We can build this – it will be tough, and it will be expensive but I think we know enough now to at least outline how it might work, and this is the agenda that I want to explore at SciFoo.

John Wilbanks had the last word, and it was a call to arms. He said ‘We are the architects of Open’. There are two messages in this. The first is we need to get on and build this thing called Open Science. The moment to grasp and guide the process is now. The second is that if you want to have a part in this process the time to join the debate is now. One thing that was very clear to me was that the attendees of the meeting were largely disconnected from the more technical community that reads this and related blogs. We need to get the communication flowing in both directions – there are things the blogosphere knows, that we are far ahead on, and we need to get the information across. There are things we don’t know much about, like the legal frameworks, the high level policy discussions that are going on. We need to understand that context. It strikes me though that if we can combine the strengths of all of these communities and their differing modes of communication then we will be a powerful force for taking forward the open agenda.

Network grant proposal unsuccessful

I received the rejection letter late last week but hadn’t got as far as posting about this yet. Given the referee’s comments this was not surprising. We were ranked 20 out of 21 proposals that were considered by the panel. This is not nearly so bad as it sounds. The story as that there were over a hundred proposals so to actually get to the panel wasn’t a bad thing in its own right. The other positive thing to take from this is that the referee’s comments were very clear about what the problems were: too much discussion of the type of things we would like to do, and not enough about how we would get more people involved, or how we would disseminate information. Basically it wasn’t focussed well as a Network application, which is not suprising in light of the fact that I had never been involved in one before so I didn’t really know what is was ‘supposed’ to look like.

We are allowed the resubmit the grant in six months time and I would be inclined to do so. The original proposal document as well as the final submitted version (there are significant differences – I needed to cut a lot to make it fit) is still available for viewing or editing and it ought to be possible to re-jig it over the next six months in light of the referee’s comments.

p.s. Am using Zemanta which looks potentially like a great tool in principle for getting more consistency into the use of tags and linking the information up. Something I am very much in favour of. However it appears to have decided that this post is about Volkswagens. Go figure.

OPEN Network proposal – referees comments are in

So we have received the referees comments on the network proposal and after a bit of a delay I have received permission to make them public. You can find a pdf of the referee’s comments here. I have started to draft a reply which is published on google docs. I have given a number of people access but if you are feeling left out and would like to contribute just drop me a line.

These are broadly pretty critical comments and our chance of getting this funded is not looking at all good on the basis of these. For those who have not written grant proposals before or not dealt with these types of criticisms there is an object less on here. Many of the criticisms relate to assumptions the referees have made about how a UK Network Proposal should be written and what it should do. It is always a good idea to identify precisely what the expectations are. In this case I simply didn’t have time to do this.

However there are some good aspects of this. Many of the critical comments made by the referees are contradicted by other referees (too many meetings, not enough meetings). A couple arise from misunderstandings or perhaps a lack of clarity in the proposal. The key thing is to answer the criticisms on how we expand the network – while at the same time explaining that until we have achieved this the network doesn’t really exist in its ideal form. Also we are asking for relatively little money so once all the big networks get their slice their may be realtively few proposal left small enough to pick up the scraps as it were.

The reply is due back on Monday (UK time) and I will gratefully receive any assistance in getting this response honed to a fine point. I would also point you in the direction of Shirley Wu’s draft of the proposal for a PSB session which is due at the end of next week. We know this collaborative process can work and we also know it has weaknesses and disconnects. If we can use the good part to convince funding agencies that we need to sort out the weaknesses I think that would be a great step forward.

Open Science Session at PSB 2009?

Shirley Wu from Stanford left a comment on my New Years Resolutions post suggesting the possibility of a session on Open Science at the PSB meeting in Hawaii in 2009 which I wanted to bring to front for peoples attention.

[…] Since you mentioned organizing an international meeting on the subject and publicizing open science, I’m curious what your thoughts (and anyone else’s who reads this!) would be on participating in a session on Open Science at the Pacific Symposium on Biocomputing at PSB. They don’t traditionally cover non-primary research/methods tracks, but they do pride themselves on being at the cutting edge of biology and biocomputing, so I am hoping they will be amenable to the idea. If there was support from, shall we say, the founders of this movement, I think it would help a great deal towards making it happen. […]

She also has a post on her new blog One Big Lab where she fleshes out the idea in a bit more detail and which is probably the best place to continue the discussion.

Hi Shirley! Great to have more people out there blogging and commenting. I am not sure whether I really qualify as a ‘founder of the movement’. I know things are moving fast, but I don’t think having been around for nine months or so makes me that venerable!

This sounds broadly like a good idea to me. I was considering trying to organise a meeting in the UK towards October – November this year but the timelines are tight and really dependent on money coming through. I would be happy to push back to Jan 2009 in Hawaii if people felt this was a good idea; if the grant comes through we could use this as the first annual meeting. My only concern is that Hawaii probably increases average costs for people as more people have to come further and book accomodation than if it is either Western Europe or East Coast US. The other issues is how and whether to focus such a session. I also don’t see a problem with having two meetings ~6 months apart. What do people think?

The OPEN Research Network Proposal – update and reflections

Despite all evidence to the contrary, I have not in fact fallen off the end of the world. I have just been a little run off my feet over the last week or so. A quick weekend trip to the south of France (see here for probably rather too much detail) and a lot of other things, not least some wrangling over allowed costs for the grant, have been keeping me busy.

The research network proposal was successfully submitted on Tuesday 27th November, some six days after I proposed here the possibility of applying for this grant. To echo what Mat Todd said, I haven’t ever been involved with a grant proposal that came together so fast, and while it still involved several days with very little sleep on my part it could not have been put together without a great deal of assistance from a large number of people. The final version of the proposal is here and I will try to put up a page on OpenWetWare for further discussion. The text of the case for support is also available at Nature Precedings. Precedings were uncomfortable about hosting the financial details of the proposal and I think this is interesting in its own right and will write on it later. Here, however, I want to reflect on the process of preparing the grant and what worked well and what didn’t.

Finding the community

The use of this Blog and the subsequent diffusion of the request for help through a number of other blogs was very effective and quite rapid. Diffusion was important and the proposal was featured on a wide range of blogs (1, 2, 3, 4, 5…others?). Given the very short time scale the number of people that became involved was really very high. People are able to move much faster than organisations so on the timescale that we were working it wasn’t possible to get organisations such as PLoS, Nature Publishing Group, BioMedCentral etc. formally involved by the time of the grant submission. I am still very keen to get the involvement of organisations like these and others and it isn’t too late to send a letter of support as I can update these at any time.

It is interesting to contrast this with the response I received to my earlier request for collaborators on the protein-DNA ligation project. In the case of the network proposal I was very rapidly swamped with support whereas for the actual science based project I haven’t had a response as yet. I think this is a good demonstration that while the Open and Connected approach can be effective, it is currently working best for development and networking projects associated with open and connected practises. As a research community we work very well on our common interests, where we have critical mass. However beyond this, in the areas of our ‘real research’, we are not yet seeing the potential benefits to anywhere near the same extent. I believe this is because we don’t yet have either critical mass nor a sufficiently connected network of researchers. In my view a central aim of the Research Network should be to break out of the ghetto and start to enable and demonstrate the benefits we know and have seen in the context of a wider range of scientific disciplines.

Writing the grant

In the process of writing and editing the grant it became clear that contributors have very different ‘contribution styles’ and that different types of contribution had higher or lower chances of making it into the final document. The proposal was written in GoogleDocs based on a first draft that I put together rather rapidly. The structure changed significantly over the course of the six days. Some contributors preferred to email specific comments whereas some got right in and hacked away at the text. At times there were six or ten people simultaneously editing the document. I am particularly grateful to those who spent the last night before submission going through and finding typos (although there are still quite a few I am embarrased to admit). This made the final stages of ‘cleaning up’ much easier.

Those who directly edited the document saw a much higher chance of their changes making it into the final document. Email comments were also valuable and were included or taken account of in many cases but because they were less immediate there was a greater tendency for them to be passed over or simply lost in the rush. At all times I took the arbitrary decision that I would delete, adapt, or add text as I saw fit. While a concensus approach may have worked, if more time was available, with the time restraints imposed it seemed to me that strong ‘editorial’ guidance was required to hit the final target.

Overall, this was a relatively pleasant way to write a proposal. The fact that many eyes went over the text was a great help and made me much more confident, even when I took a final decision to remove something, that a range of views had been explored, and that there was less chance of us missing important details. The full editing record is available in GoogleDocs if you have editing rights to the document. At the moment I don’t think I can expose the history. I considered doing the writing on OpenWetWare but that would have required people getting accounts, and the extra 24 hours involved there may have meant we didn’t make it. A wiki is nonetheless probably a better framework for this kind of writing.

Submitting the grant

The mechanics of the submission process meant that essentially no-one else had access to the financial details and there was little point discussing these. I wrote the justification of resources and Workplan on my own simply due to time contraints. The logistics meant that the text had to be closed off from the GoogleDoc at a specific time and then adapated to fit the available space. You can see what was done by comparing the final GoogleDoc version with the submitted version.

Would this work for a ‘real’ grant application?

As far as I am aware this is the first time a grant application has been written ‘in the open’ like this. However this is not a conventional research project. It is not clear at the moment whether the same benefits would be seen for a conventional project. Part of the reason people contributed was that they could be directly involved in the network. This would not be the case for a conventional project – would be people who would see no personal benefit be prepared to contribute as much? Having said that, the benefits of having many eyes on the proposal were clear, and made it possible to turn around the submission much faster than would otherwise have been the case. Perhaps the question is not so much; would people contribute? as; what is the best way to encourage people to contribute?

Thanks for everyone who helped and all those who offered support. It wouldn’t have happened without the contributions and support of a lot of people. You know, this Open Science thing actually works!

Proposal submitted…

Enough said. Thanks to everyone who helped. I will reflect on the process at a later stage and will put the complete proposal up as soon as I can. If anyone wants to send letters of support or get involved don’t feel that you’ve missed the boat. Whether the money comes up or not we ought to be doing something along these lines and I can always include more material when we reply to referee’s comments.

Cheers

Cameron

Research network proposal – Update III

The text of the proposal is now in a near complete form. I need to add references and a few others things but it is mostly in reasonable shape. If you would like to have your name included as a founder member of the network please drop me a comment on this post, email, or if I have given you editing rights then feel free to add yourself. If you do so please send or post some sort of document that I can take a version of and incorporate as a letter of support.

If you would like editing rights either comment on the original post or drop me an email. In principle all the other commentors should also be able to give you editing rights if I am unavailable (e.g. asleep). I will take a snapshot of the proposal text around 6am GMT tomorrow morning and will need to edit it and add some pictures offline before incorporating it into the whole proposal. The full proposal will be submitted tomorrow and I will put a complete PDF up as soon as I can get to it. If I can gain permission to do so I will also put up referees comments and any other correspondence in the fullness of time.

Thanks to everyone for their help.

The research network proposal – update II

Thanks to all those who have sent letters of support, paragraphs of text, and made comments or modifications to the proposal. Just a quick update on where we are. The text of the proposal is up at GoogleDocs. I believe I have given anyone who has commented on the original post access to edit but if not give me a yell or just send me any comments by email or pop them in as comments here.

There are a lot of other forms to be filled for the proposal which is difficult to do in the open but once it is done I will pdf the whole thing and put it up for people to see. The proposal has to be at the research council by 4pm GMT on Tuesday which means in practise I need to hit the submit button early on Tuesday morning. So if you want to comment or send a letter of support then please do so by close of play on Monday, your time, to ensure that I get it in time.

Thanks for all those who have helped and we will see how we got on!

Oh and if anyone can come up with a catchier title? We thought maybe ETHOS (e-science to help open science) but that seemed a little lame…

Follow on to network proposal

Ok. In this morning’s post I proposed the idea of applying for some UK money to support meetings in the general area of open science. I’ve made a start with an outline on a GoogleDoc which can be viewed here. I have tried to set out some general headings and areas to be fleshed out and added a little text. This is early days but if anyone wishes to add anything then please feel free. I have given editing rights to all those people who have comments on the original post (as of around 9:30 pm GMT on Thursday 22 November) so they should now have editing rights. I have set the document so that those people with invitations can cascade them to others (I hope). I will continue to issue invitations to anyone who comments on the original post. No need to feel obliged to add anything  – I’m not asking you to write the grant for me – but if you feel so inclined then the assistance will be very welcome.

What I will request is from those who are interested is a short letter stating your current post/position/ambitions, your interest in ‘Open Science’ and why you would like to be involved in this network. Either email to me at C [dot] Neylon [at] rl.ac.uk or simply drop it in as a comment.

Thanks

Cameron

e-science for open science – an EPSRC research network proposal

The UK Engineering and Physical Sciences Research Council currently has a call out for proposals to fund ‘Network Activities’ in e-science. This seems like an opportunity to both publicise and support the ‘Open Science’ agenda so I am proposing to write a proposal to ask for ~£150-200k to fund workshops, meetings, and visits between different people and groups. The money could fund people to come to meetings (including from outside the UK and Europe) but could not be used to directly support research activities. The rationale for the proposal would be as follows.

  • ‘Open Science’ has the potential to radically increase the efficiency and effectiveness of research world wide.
  • The community is disparate and dispersed with many groups working on different approaches that do not currently interoperate – agreeing some interchange or tagging standards may enable significant progress
  • Many of those driving the agenda are early career scientists including graduate students and postdocs who do not have independent travel funds and whose PI may not have resources to support attending meetings where this agenda is being developed
  • There is significant interest from academics, some publishers, software and tool developers, and research funders in making more data freely available but limited concensus on how to take this forward and thus far an insufficient committment of resources to make this possible in practice

The proposal would be to support 2-3 meetings over three years, including travel costs, and provide funds for exchange visits. What I would like from the community is an expression of interest, specifically the committment to write a letter of support saying you would like to be involved. It would be great to get these from tenured academics, early career academics, graduate students and PDRAs, publishers (NPG? PLoS?), library and repository people (UKOLN, Simile, others?) and anyone else who is relevant.

The timeline is tight (due Tuesday next week) but if there is enough interest I will push through to get this done. I propose to write the grant in the open and online so will post a Google Doc or OpenWetWare page as soon as I have something to put up. Any help people can offer on the writing would be appreciated. In the meantime please drop comments below. I will be pointing to this page in the grant proposal.