Cameron Neylon – Page 12 – Science in the Open

January 5, 2012January 6, 2012

Update on publishers and SOPA: Time for scholarly publishers to disavow the AAP

Canute and his courtiers — Image via Wikipedia

In my last post on scholarly publishers that support the US Congress SOPA bill I ended up making a series of edits. It was pointed out to me that the Macmillan listed as a supporter is not the Macmillan that is the parent group of Nature Publishing Group but a separate U.S. subsidiary of the same ultimate holding company, Holtzbrinck. As I dug further it became clear that while only a small number of scholarly publishers were explicitly and publicly supporting SOPA, many of them are members of the Association of American Publishers, which is listed publicly as a supporter.

This is a little different to directly supporting the act. The AAP is a membership organisation that represents its members (including Nature Publishing Group, Oxford University Press, Wiley Blackwell and a number of other familiar names, see the full list at the bottom) to – amongst others – the U.S. government. Not all of its positions would necessarily be held by all its members. However, neither have any of those members come out and publicly stated that they disagree with the AAP position. In another domain Kaspersky software quit the Business Software Alliance over the BSA’s support of SOPA, even after the BSA withdrew its support.

I was willing to give AAP members some benefit of the doubt, hoping that some of them might come out publicly against SOPA. But if that was the hope then the AAP have just stepped over the line. In a spectacularlyÂ disingenuousÂ press release the AAP claims significant credit for a new act just submitted to the U.S. Congress. This, in a repeat of some previous efforts, would block any efforts on the part of U.S. federal agencies to enact open access policies, even to the extent of blocking them from continuing to run the spectacularly successful PubMedCentral. That this comes days before the deadline for a request for information on the development of appropriate and balanced policies that would support access to the published results of U.S. taxpayer-funded research is a calculated political act, an abrogation of any principled stance, and clear signal of a lack of any interest in a productive discussion on how to move scholarly communications forward into a networked future.

I was willing to give AAP members some space. Not any more. The time has come to decide whether you want to be part of the future of research communication or whether you want to legislate to try and stop that future happening. You can be part of that future or you can be washed into the past. You can look forward or you can be part of a political movement working to rip off the taxpayers and charitable donors of the world. Remember that the profits alone of Elsevier and Springer (though I should be cutting Springer a little slack as they’re not on the AAP list – the one on the list is a different Springer) could fund the publication of every paper in the world in PLoS ONE. Remember that the cost of putting a SAGE article on reserve for a decent sized classÂ or of putting a Taylor and Francis monograph on reserve for a more modest sized one at one universityÂ is more than it would cost to publish them in most BioMedCentral journals and make them available to all.

Ultimately this legislation is irrelevant – the artificial level of current costs of publication and the myriad of additional charges that publishers make for this, that, and the other (Colour charges? Seriously?) will ultimately be destroyed. The current inefficiencies and inflated markups cannot be sustained. The best legislation can do is protect them for a little longer, at the cost of damaging the competitiveness of the U.S. as a major player in global research. With PLoS ONE rapidly becoming a significant proportion of the world’s literature on its own and Nature and Science soon to be facing serious competition at the top end from an OA journal backed by three of the most prestigious funders in the world, we are moving rapidly towards a world where publishing in a subscription journal will be foolhardy at best and suicidal for researchers in many fields. This act is ultimately a pathetic rearguard action and a sign of abject failure.

But for me it is also a sign that the rhetoric of being supportive of a gradual managed change to our existing systems, a plausible argument for such organisations to make, is dead for those signed up to the AAP.Â Publishers have a choice – lobby and legislate to preserve the inefficient, costly, and largely ineffective status quo – or play a positive part in developing the future.

I don’t expect much; to be honest I expect deafening silence as most publishers continue to hope that most researchers will be too buried in their work to notice what is going on around them. But I will continue to hope that some members of that list, the organisations that really believe that their core mission is to support the most effective research communication – not that those are just a bunch of pretty words that get pulled out from time to time – will disavow the AAP position and commit to a positive and open discussion about how we can take the best from the current system and make it work with the best we can with the technology available. A positive discussion about managed change that enables us to get where we want to go and helps to make sure that we reap the benefits when we get there.

This bill is self-defeating as legislation but as a political act it may be effective in the short term. It could hold back the tide for a while. But publishers that support it will ultimately get wiped out as the world moves onÂ and they spend so time pushing back the tide that they miss the opportunity to catch up. Publishers who move against the bill have a role to play in the future and are the ones with enough insight to see the way the world is moving. And those publishers who sit on the sidelines? They don’t have the institutional capability to take the strategic decisions required to survive. Choose.

Update: An interesting parallel post from John Dupuis and a trenchant exposeÂ (we expect nothing less) from Michael Eisen. Jon Eisen calls for people at the institutions and organisations with links to AAP to get on the phone and ask for them to resign from AAP. Lots of links appearing at this Google+ post from Peter Suber.

The stupidity of SOPA in Scholarly Publishing (cameronneylon.net)
Educational Publishers Appear to be Supporting SOPA (downes.ca)
Epic says it does not “support the current version of SOPA” (vg247.com)

The List of AAP Members fromÂ http://www.publishers.org/members/psp/

Demos Medical Publishing, LLC

Elsevier
- Academic Press

Virginia Museum of Fine Arts

Yale University Press

January 3, 2012January 3, 2012

The stupidity of SOPA in Scholarly Publishing

An unskippable anti-piracy film included on mo... — Image via Wikipedia

Edit and update – I’ve been told that the Macmillan supporting SOPA is the Macmillan US and not the holding company of Nature Publishing Group. NPG are however explicitly listed as members of the Association of American Publishers who are listed as supporters. The AAP list includes American Chemical Society, American Institute of Physics along with a lot of smaller society publishers. The Springer listed is apparently not the Springer that owns BioMedCentral.

It was Michael Kuhn who pointed out to me over the holiday break that both Elsevier and Macmillan (parent company of Nature Publishing Group) were listed as supporters of the Stop Online Piracy Act. If you don’t know about SOPA and why it is one of the most politically and legislatively incompetent actions of recent years then start hereÂ and look around from there. This has now been more widely picked up, Heather Morrison points out that as well as Macmillan and Elsevier that a range of other scholarly publishers as being in support and at BoingBoing Maggie Koerth-Baker suggests a boycott similar to that being targeted at other supporters.

Here I wanted to point out how utterly and stupendously stupid SOPA is in the academic communication space. Nature, every Elsevier journal, and every other academic communication medium, are full of copyright violations. The couple of paragraphs of methods text or introduction that keeps being used, that chunk of supplementary information that has appeared in a number of such places, that figure that “everyone in the field uses” but no one has any idea who drew it, as well as those figures that the authors forgot that they’d signed over the copyright to some other publisher – or didn’t understand enough about copyright to realise that they had. And that’s before we get to plagiarism issues. Or the fact that legal position over the signing of copyright agreements by authors is fraught to say the least.

Now of course reputable publishers have in place mechanisms by which authors sign off that they’ve done the right things so the journals are ok right? No. That’s the whole point of SOPA (and its partner in the US Senate, PIPA). It gives copyright holders or interested parties the right to take down an entire site based on it being the mediumÂ by which copyright violations are transmitted. So if someone, purely as a thought experiment you understand, crowd-sourced the identification of copyright violations in papers published by supporters of SOPA, then they could legitimately take down journal websites, like Science Direct and Nature.com. That’s right, just find the plagiarised papers, raise them as a copyright violation, and you can have the journal website shut down.

This is of course, just an example of why SOPA is entirely the wrong approach to dealing with online piracy. But with supposedly technically savvy organisations lined up to support it, they should be aware of what it might cost them. A fortune in responding to take down requests, a fortune in checking over every piece of every paper? Is that figure “sufficiently different”? Enjoy. Or perhaps time for a re-think about copyright in scholarly works?

A science-centric SOPA boycott (boingboing.net)
SOPA, Stop Online Piracy Act: Hollywood Battles Internet Companies To Protect Itself From Piracy (VIDEO) (huffingtonpost.com)
#SOPA and the go-daddy boycott – what do you chaps think? [Alan Rae – Business R&D] (ecademy.com)
Could Elsevier shut down arxiv.org? (dabacon.org)

December 14, 2011

An Open Letter to David Willetts: A bold step towards opening British research

Image via Wikipedia

On the 8th December David Willetts, the Minister of State for Universities and Science, and announced new UK government strategies to develop innovation and research to support growth. The whole document is available onlineÂ and you can see more analysis at the links at the bottom of the post.Â Â A key aspect for Open Access advocates was the section that discussed a wholesale move by the UK to an author pays system to freely accessible research literature with SCOAP3Â raised as a possible model. The report refers not to Open Access, but to freely accessible content. I think this is missing a massive opportunity for Britain to take a serious lead in defining the future direction of scholarly communication. That’s the case I attempt to lay out in this open letter. This post should be read in the context of my usual disclaimer.

Minister of State for Universities and Science

Department of Business Innovation and Skills

Dear Mr Willetts,

I am writing in the first instance to congratulate you on your stance on developing routes to a freely accessible research outputs. I cannot say I am a great fan of many current government positions and I might have wished for greater protection of the UK science budget but in times of resource constraint for research I believe your focus on ensuring the efficiency of access to and exploitation of research outputs in its widest sense is the right one.

The position you have articulated offers a real opportunity for the UK to take a lead in this area. But along with the opportunities there are risks, and those risks could entrench existing inefficiencies of our scholarly communication system. They could also reduce the value for money that the public purse, and it will be the public purse one way or another, gets for its investment. In our current circumstances this would be unfortunate. I would therefore ask you to consider the following as the implementation pathway for this policy is developed.

Firstly, the research community will be buying a service. This is a significant change from the current system where the community buys a product, the published journal. The purchasing exercise should be seen in this light and best practice in service procurement applied.

Secondly the nature of this service must be made clear. The service that is being provided must provide for any and all downstream uses, including commercial use, text mining, indeed any use that might developed at some point in the future. We are paying for this service and we must dictate its terms. Incumbent publishers will say in response that they need to retain commercial rights, or text mining rights, to ensure their viability, as indeed they have done in response to the Hargreaves Review.

This, not to put to fine a point on it, is hogwash. PLoS and BioMedCentral, both operate financially viable operations in which no downstream rights beyond that of appropriate attribution are retained by the publishers and where the author charges are lower in price then many of the notionally equivalent, but actually far more limited, offerings of more traditional publishers. High quality scholarly communication can be supported by reasonable author charges without any need for publishers to retain rights beyond those protected by their trademarks. An effective market place could therefore be expected to bring the average costs of this form of scholarly communications down.

The reason for supporting a system that demands that any downstream use of the communication be enabled is that we need innovation and development within the publishing systems well as innovation and development as a result of its content. Our scholarship is currently being held back by a morass of retained rights that prevent the development of research projects, of new technology startups and potentially new industries. The government consultation document of 14 December on the Hargreaves report explicitly notes that enabling downstream uses of content, and scholarly content in particular, can support new economic activity. It can also support new scholarly activity. The exploitation of our research outputs requires new approaches to indexing, mining, and parsing the literature. The shame of our current system is that much of this is possible today. The technology exists but is prevented from being exploited at scale by the logistical impossibility of clearing the required rights. These new approaches will require money and it is entirely appropriate, indeed desirable, that some of this work therefore occurs in the private sector. Experimentation will require both freedom to act as well as freedom to develop new business models. Our content and its accessibility and its reusability must support this.

Finally I ask you to look beyond the traditional scholarly publishing industry to the range of experimentation that is occurring globally in academic spaces, non-profits, and commercial endeavours. The potential leaps in functionality as well as the potential cost reductions are enormous. We need to work to encourage this experimentation and develop a diverse and vibrant market which both provides the quality assurance and stability that we are used to while encouraging technical experimentation and the improvement of business models. What we don’t need is a five or ten year deal that cements in existing players, systems, and practices.

Your government’s philosophy is based around the effectiveness of markets. The recent history of major government procurement exercises is not a glorious one. This is one we should work to get right. We should take our time to do so and ensure a deal that delivers on its promise. The vision of a Britain that is lead by innovation and development supported by a vibrant and globally leading research community is, I believe, the right one. Please ensure that this innovation isn’t cut off at the knees by agreeing terms that prevent our research communication tools being re-used to improve the effectiveness of that communication. And please ensure that the process of procuring these services is one that supports innovation and development in scholarly communications itself.

Yours truly,

Cameron Neylon

UK to make publicly funded research free to read (newscientist.com)
UK: Results of publicly funded research will be open access (junkscience.com)
Results of publicly funded research will be open access – science minister (guardian.co.uk)

December 12, 2011

Open Access for the other 85%

A Landsat image of Cape Town overlaid on SRTM ... — Image via Wikipedia

One of the things you notice as a visitor from the UK in South Africa is how clean the toilets are. In restaurants, at the University, in public places. Sometimes a bit worn down but always clean. And then you start to notice how clear and clean the pavements are and your first response, well at least my first response, is that this is a sign of things going right. One element of the whole is working well. But of course one of the main reasons for this is that labour is cheap and plentiful.

And suddenly it seemed less positive. The temptation is to offer advice, experience of first world development programs, but it became clear very quickly how little of the context I was aware. The problems on the ground are often not the obvious ones â€“ the solutions rarely easy visible from the privileged perspective of the global north. The thing I learnt was to offer advice that was tactical, not strategic, routes towards the desired goal, not the direction of travel.

So Heather Morrisonâ€™s post on the use of Creative Commons for open access, arriving as it did in the middle of my visit to Cape Town, troubled me.Â Those who read here will know I am a strong partisan of the OA = CC-BY view and indeed tend to the view that we should just place things in the public domain. So my first response was a rejection. But there is an argument in there that non-commercial and share-alike terms are appropriate for the developing world, because they can protect access to the results of text mining of relevant research. These arguments are always worth taking apart because they help to illuminate the practicalities of how we take scholarly communication and make it valuable to people. It helps to pull apart the issues and raise important use cases, and the effective use of research to aid development is a key use case. Heather is also, as someone who has thought deeply about these issues, a person I will always disagree with with caution.

But here I do have to disagree. And I disagree on two levels. I gave a presentation at UCT where I talked in part about some of our open science work and a question was asked â€œhave you thought about how accessible this is in rural Africaâ€. My lame answer was that we have thought about it and worried about it but not actually done anything. But the better answer for me was that it is far better for the people on the ground who really know the infrastructure, and the need, to decide what is required and do the appropriate format conversion, the printing and distribution than for us in the UK or the US to presume to know what it is of most value. This, after all is the principle of open access, allowing others to re-use as appropriate for their context precisely because we donâ€™t know what those uses may be.

And NC terms will break this in the developing world. Right where that research is needed the communications infrastructure is patchy and in many cases non-existent. Getting that research to people, whether the raw communication, or processed or text mined material can require paper, and trucks to transport. It may require translation in format or in language or in form. All of these things cost money and in choosing NC terms we would be condemning those who could use this material to relying on charity, or which is in some cases worse, governments. A service industry that charges someone, somewhere, for production, translation, and transport is ruled out as a possible business model. Exploitation is a value-laden term, particularly in the context of Africa but in its pure and ideal sense it is neutral. We want to see research exploited for good but in choosing NC terms we are dictating the way that exploitation can occur.Â The risks of exploitation in the bad sense are also there. But I don’t think licences are the way to deal with it.Â By using legal instruments we take those key choices out of the hands of those best placed to make them.

And this is the more insidious issue from my perspective. The thing I learned in a week in Cape Town was that the last thing scholars in the developing world need is for us to make decisions for them. What we need to do, as a community, is to ensure that people can make the widest possible range of choices. Donâ€™t get me wrong, using CC-BY has some risks in that regard, and ccZero perhaps some more, but if we act to preserve the principle of giving people the space to make their own decisisons based on local knowledge and local needs, then that is the biggest contribution we can make.

Itâ€™s not just licensing. One of the biggest holes in the entire fabric of the OA movement, is the lack of a principled and rational stance on author payment charges. Just patting people on the head and saying â€œoh donâ€™t worry we wonâ€™t make you payâ€ is not just patronising, it is damaging to our credibility, and damaging to our own progress. One of the other key lessons I learnt from a week listening to people in Cape Town is how far ahead of the traditional centres of scholarship they are on some issues. I spend a lot of time in the UK and the US trying to convince people that there is an issue; that we need to look at how our research matters to the wider community. In South Africa the needs are clear, and scholars want to make a difference. The question is how, and how to tell when it is working.Â They are damn good at this. We could learn a lot from these people about how to balance the pursuit of prestige and â€œresearch with impactâ€ that seems to be such a struggle for us.

We need to think of the global scholarly community, not as made up of â€œusâ€ and â€œthose who are catching upâ€, but as made up of different groups with different priorities and different needs, and crucially with different experiences and value to bring to a global endeavour. The creation of a â€œspecial OAâ€ for the developing world runs the risk of perpetuating a view in which 85% of the world will always be catching up. Yes we need to try and build systems that ensure access to all scholarship, primary and derived. How to do that is an important debate, and one I hope will be at the centre of the Berlin10 conference to be held in Stellenbosch in South Africa next year. To not use that meeting to address the issues and challenges of business models and safeguards that help preserve access and optimise impact for the developing world would be a terrible loss.

OA is about enabling people, enabling business, and enabling development. There is a global community of scholars who get that, who want to be part of the wider community, and have their own skills and expertise to bring. They also want to share and contribute to the resource needs of scholarly communication in an appropriate and equitable way. We need to enable them to do that and get out of their way. After all, we might learn something.

November 23, 2011

Good practice in research coding: What are the targets and how do we get there…?

EN{code}D Exhibition, The Building Centre, Sto... — Image by olliepalmer.com via Flickr

The software code that is written to support and manage research sits at a critical intersection of our developing practice of shared, reproducible, and re-useble research in the 21st century. Code is amongst the easiest things to usefullyÂ share, being both made up of easily transferable bits and bytes but also critically carrying its context with it in a way that digital data doesn’t do. Code at its best is highly reproducible: it comes with the tools to determine what is required to run it (make files, documentation of dependencies) and when run should (ideally) generate the same results from the same data. Where there is a risk that it might not, good code will provide tests of one sort or another than you can run to make sure that things are ok before proceeding. Testing, along with good documentation is what ensures that code is re-usable, that others can take it and efficiently build on it to create new tools, and new research.

The outside perspective, as I have written before, is that software does all of this better than experimental research. In practice the truth is that there are frameworks that make it possible for software to do a very good job on these things, but that in reality doing a good job takes work; work that is generally not done. Most software for research is not shared, is not well documented, generates results that are not easily reproducible, and does not support re-use and repurposing through testing and documentation. Indeed much like most experimental research. So how do we realise the potential of software to act as an exemplar for the rest of our research practice?

Nick Barnes of the Climate Code Foundation developed the Science Code Manifesto, a statement of how things ought to be (I was very happy to contribute and be a founding signatory) and while for many this may not go far enough (it doesn’t explicitly require open source licensing) it is intended as a practical set of steps that might be adopted by communities today. This has already garnered hundreds of endorsers and I’d encourage you to sign up if you want to show your support. The Science Code Manifesto builds on work over many years of Victoria Stodden in identifying the key issues and bringing them to wider awareness with both researchers and funders as well as the work of John Cook, Jon Claerbout, and Patrick Vanderwalle at ReproducibleResearch.net.

If the manifesto and the others work are actions that aim (broadly) to set out the principles and to understand where we need to go then Open Research Computation is intended as a practical step embedded in today’s practice. Researchers need the credit provided by conventional papers, so if we can link papers in a journal that garners significant prestige, with high standards in the design and application of the software that is described we can link the existing incentives to our desired practice. This is a high wire act. How far do we push those standards out in front of where most of the community is. We explicitly want ORC to be a high profile journal featuring high quality software, for acceptance to be a mark of quality that the community will respect. At the same time we can’t ask for the impossible. If we set standards so high that no-one can meet them then we won’t have any papers. And with no papers we can’t start the process of changing practice. Equally, allow too much in and we won’t create a journal with a buzz about it. That quality mark has to be respected as meaning something by the community.

I’ll be blunt. We haven’t had the number of submissions I’d hoped for. Lots of support, lots of enquiries, but relatively few of them turning into actual submissions. The submissions we do have I’m very happy with. When we launched the call for submissions I took a pretty hard line on the issue of testing. I said that, as a default, we’d expect 100% test coverage.Â In retrospect that sent a message that many people felt they couldn’t deliver on.Â Now what I meant by that was that when testing fell below that standard (as it would in almost all cases) there would need to be an explanation of what the strategy for testing was, how it was tackled, and how it could support people re-using the code. The language in the author submission guidelines has been softened a bit to try and make that clearer.

What I’ve been doing in practice is asking reviewers and editors to comment on how the testing framework provided can support others re-using the code. Are the tests provided adequate to help someone getting started on the process of taking the code, making sure they’ve got it working, and then as they build on it, giving them confidence they haven’t broken anything. For me this is the critical question, does the testing and documentation make the code re-usable by others, either directly in its current form, or as they build on it. Along the way we’ve been asking whether submissions provide documentation and testing consistent with best practice. But that always raises the question of what best practice is. Am I asking the right questions? And were should we ultimately set that bar?

Changing practice is tough, getting the balance right is hard. But the key question for me is how do we set that balance right? And how do we turn the aims of ORC, to act as a lever to change the way that research is done, into practice?

5 Reasons To Write Tests For Your Code (masternerd.lucianocheng.com)
Nature: why scientific programming does not compute (nature.com)
Programming != Computer Science (matt-welsh.blogspot.com)
Slides for Reproducible Research Talk at Interface 2011 (r-bloggers.com)

November 11, 2011

Reflections on research data management: RDM is on the up and up but data driven policy development seems a long way off.

Data Represented in an Interactive 3-D Form — Image by Idaho National Laboratory via Flickr

I wrote this post for the Digital Curation Centre blog following the Research Data Management Forum meetingÂ run in Warwick a few weeks back. If you feel moved to comment I’d ask you to do it over there.

The Research Data Management movement is moving on apace. Tools are working and adoption is growing. Policy development is starting to back up the use of those tools and there are some big ambitious goals set out for the next few years. But has the RDM movement taken the vision of data intensive research to its heart? Does the collection, sharing, and analysis of data about research data management meet our own standards? And is policy development based on and assessed against that data? Can we be credible if it is not?

Watching the discussion on research data management over the past few years has been an exciting experience. The tools, that have been possible for some years, now show real promise as the somewhat raw and ready products of initial development are used and tested.

Practice is gradually changing, if unevenly across different disciplines, but there is a growing awareness of data and that it might be considered important. And all of this is being driven increasingly by the development of policies on data availability, data management, and data archiving that stress the importance of data as a core output of public research.

The vision of the potential of a data rich research environment is what is driving this change. It is not important whether individual researchers, or even whole community, gets how fundamental a change the capacity to share and re-use data really is. The change is driven by two forces fundamentally external to the community.

The first is political, the top down view from government that publicly funded research needs to gain from the benefits they see in data rich commerce.Â A handful of people really understand how data works at these scales but these people have the ear of government.

The second force is one of competition. In the short term adopting new practices, developing new ways of doing research, is a risk. In the longer term, those who adopt more effective and efficient approaches will simply out compete those who do not or can not. This is already starting happening in those disciplines already rich in shared data and the signs are there that other disciplines are approaching a tipping point.

Data intensive research enables new types of questions to be asked, and it allows us to answer questions that were previously difficult or impossible to get reliable answers on. Questions about weak effects, small correlations, and complex interactions.Â The kind of questions that bedevil strategic decision-making and evidence based policy.

So naturally youâ€™d expect that the policy development in this area, being driven by people excited by the vision of data intensive research, would have deeply embedded data gathering, model building, and analysis of how research data is being collected, made available, and re-used.

I donâ€™t mean opinion surveys, or dipstick tests, or case studies.Â These are important but theyâ€™re not the way data intensive research works. They donâ€™t scale, they donâ€™t integrate, and they canâ€™t provide the insight into the weak effects in complex systems that are needed to support decision making about policy.

Data intensive research is about tracking everything, logging every interaction, going through download logs, finding every mention of a specific thing wherever on the web it might be.

Itâ€™s about capturing large amounts of weakly structured data and figuring out how to structure it in a way that supports answering the question of interest. And its about letting the data guide you the answers it suggests, rather than looking within it for what we â€œknowâ€ should be in there.

What I donâ€™t see when I look at RDM policy development is the detailed analysis of download logs, the usage data, the click-throughs on website. Where are the analyses of IP ranges of users, automated reporting systems, and above all, when new policy directions are set where is the guidance on data collection and assessment of performance against those policies?

Without this, the RDM community is arguably doing exactly the same things that we complain about in researcher communities. Not taking a data driven view of what we are doing.

I know this is hard. I know it involves changing systems, testing things in new ways, collecting data in ways we are not used. Even imposing disciplinary approaches that are well outside the comfort zone of those involved.

I also know there are pockets of excellent practice and significant efforts to gather and integrate information. But they are pockets. And these are exactly the things that funders and RDM professionals and institutions are asking of researchers. They are the right things to be asking for, and weâ€™re making real progress towards realizing the vision of what is possible with data intensive research.

But just imagine if we could support policy development with that same level of information. At a pragmatic and political level it makes a strong statement when we â€œeat our own dogfoodâ€. And there is no better way to understand which systems and approaches are working and not working than by using them ourselves.

November 8, 2011

Designing for the phase change: Local communities and shared infrastructure

Michael Nielsen‘s talk at Science Online was a real eye opener for many of us who have been advocating for change in research practice. He framed the whole challenge of change as an example of a well known problem, that of collective action. How do societies manage big changes when those changes often represent a disadvantage to many individuals, at least in the short term. We can all see the global advantages of change but individually acting on them doesn’t make sense.

Michael placed this in the context of other changes, that of countries changing which side of the road they drive on, or the development of trade unions, that have been studied in some depth by political economists and similar academic disciplines. The message of these studies is that change usually occurs in two phases. First local communities adopt practice (or at least adopt a view that they want things changed in the case of which side of the road they drive on) and then these communities discover each other and “agglomerate”, or in the language of physical chemistry there are points of nucleation which grow to some critical level and then the whole system undergoes a phase change, crystallising into a new form.

These two phases are driven by different sets of motivations and incentives. At a small scale processes are community driven, people know each other, and those interactions can drive and support local actions, expectations, and peer pressure. At a large scale the incentives need to be different and global. Often top down policy changes (as in the side of the road) play a significant role here, but equally market effects and competition can also fall into place in a way that drives adoption of new tools or changes in behaviour. Think about the way new research techniques get adopted: first they are used by small communities, single labs, with perhaps a slow rate of spread to other groups. For a long time it’s hard for the new approach to get traction, but suddenly at some point either enough people are using it that its just the way things are done, or conversely those who are using it are moving head so fast that everyone else has to pile in just to keep up. It took nearly a decade for PCR for instance to gain widespread acceptance as a technique in molecular biology but when it did it went from being something people were a little unsure of to being the only way to get things done very rapidly.

So what does this tell us about advocating for, or designing for, change. Michael’s main point was that narrow scope is a feature, not a bug, when you are in that first phase. Working with small scale use cases, within communities is the way to get started. Build for those communities and they will become your best advocates, but don’t try to push the rate of growth, let it happen at the right rate (whatever that might be – and I don’t really know how to tell to be honest). But we also need to build in the grounding for the second phase.

The way these changes generally occur is through an accidental process of accretion and agglomeration. The phase change crystallises out around those pockets of new practice. But, to stretch the physical chemistry analogy, doesn’t necessarily crystallise in the form one would design for. But we have an advantage, if we design in advance to enable that crystallisation then we can prepare communities and prepare tooling for when it happens and we can design in the features that will get use closer to the optimum we are looking for.

What does this mean in practice? It means that when we develop tools and approaches it is more important for our community to have standards than it is for there to be an effort on any particular tool or approach. The language we use, that will be adopted by communities we are working with, should be consistent, so that when those communities meet they can communicate. The technical infrastructure we use should be shared, and we need interoperable standards to ensure that those connections can be made. Again, interchange and interoperability are more important than any single effort, any single project.

If we really believe in the value of change then we need to get these things together before we push them too hard into the diverse set of research communities where we want them to take root. We reallyÂ need to get interoperability, standards, and language sorted out before the hammer of policy comes down and forces us into some sort of local minimum. In fact, it sounds rather like we have a collective action problem of our own. So what are we going to do about that?

Why the world of scientific research needs to be disrupted (gigaom.com)
How might we get to a new model of mathematical publishing? (gowers.wordpress.com)
Incentives: Definitely a case of rolling your own (cameronneylon.net)

October 27, 2011

Building the perfect data repository…or the one that might get used

A UV-vis readout of bis(triphenylphosphine) ni... — Image via Wikipedia

While there has been a lot of talk about data repositories and data publication there remains a real lack of good tools that are truly attractive to research scientists and also provide a route to more general and effective data sharing. Peter Murray-Rust has recently discussed the deficiencies of the traditional institutional repository as a research data repository in some depth [1, 2, 3, 4].

Data publication approaches and tools are appearing including Dryad, Figshare, BuzzData and more traditional venues such as GigaScience from BioMedCentralÂ but these are all formal mechanisms that involve significant additional work alongside an active decision to â€œpublish the dataâ€. The research repository of choice remains a haphazard file store and the data sharing mechanism of choice remains email. How do we bridge this gap?

One of the problems with many efforts in this space is how they are conceived and sold as the user. â€œMaking it easy to put your data on the webâ€ and â€œhelping others to find your dataâ€ solve problems that most researchers donâ€™t think they have. Most researchers donâ€™t want to share at all, preferring to retain as much of an advantage through secrecy as possible. Those who do see a value in sharing are for the most part highly skeptical that the vast majority of research data can be used outside the lab in which it was generated. The small remainder who see a value in wider research data sharing are painfully aware of how much work it is to make that data useful.

A successful data repository system will start by solving a different problem, a problem that all researchers recognize they have, and will then nudge the users into doing the additional work of recording (or allowing the capture of) the metadata that could make that data useful to other researchers. Finally it will quietly encourage them to make the data accessible to other researchers. Both the nudge and the encouragement will arise by offering back to the user immediate benefits in the form of automated processing, derived data products, or other more incentives.

But first the problem to solve. The problem that researchers recognize they have, in many cases prompted by funder data sharing plan requirements, is to properly store and back up their data. A further problem most PIs realize they have is access to the data of their group in a useful form. So they initial sales pitch for a data repository is going to be local and secure backup and sharing within the group. This has to be dead simple and ideally automated.

Such a repository will capture as much data as possible at source as it is generated. Just grabbing the file and storing it with the minimal contextual data, of who (which directory was it saved in), when (aquired from the initial file datestamp), and what (where has it come from), backing it up and exposing it to the research group (or subsets of it) via a simple web service. It will almost certainly involve some form of Dropbox-style system which synchronises a users own data across their own devices. Here is an immediate benefit. I donâ€™t have to get the pen drive out of my pocket if Iâ€™m confident my new data will be on my laptop when I get back to it.

It will allow for simple configuration on each instrument that sets up a target directory and registers a filetype so that the system can recognize what instrument, or computational process, a file came from (the what). The who can be more complex, but a combination of designated directories (where a user has their own directory of data on a specific instrument), login info, and where required low irritant requests or claiming systems can be built. The when is easy. The sell here is in two parts, directory synching across computers means less mucking around with USB keys. And the backup makes everyone feel better, the researcher in the lab, the PI, and the funder.

So far so good, and there are in fact examples of systems like this that exist in one form or another or are being developed including DataStage within the DataFlow project from David Shottonâ€™s group at Oxford, the Clarion project (PMR group at Cambridge) and Smart Research Frameworks (that I have an involvement with) led by Jeremy Frey at Southampton. Iâ€™m sure there are dozens of other systems or locally developed tools that do similar things and these are a good starting position.

The question is how do you take systems like this and push it to the next level. How do capture, or encourage the user to provide, enough metadata to actually make the stored data more widely useful? Particularly when they donâ€™t have any real interest in sharing or data publication? I think that there is a significant potential in offering downstream processing of the data.

If This Then That (IFTTT) is a startup that has got quite a bit of attention over the past few weeks as it has come into public beta. The concept is very simple. For a defined set of services there are specific triggers (posting a tweet, favouriting a YouTube video) that can be used to set off another action at another service (send an email, bookmark the URL of the tweet or the video). What if we could offer data processing steps to the user? If the processing steps happen automatically but require a bit more metadata will that provide the incentive to get that data in?

This concept may sound a lot like the functionality provided by workflow engines but there is a difference. Workflow systems are generally very difficult for the general user to setup. This is mostly because they solve a general problem, that of putting any object into any suitable process. IFTTT offers something much simpler, a small set of common actions on common objects, that solves the 80/20 problem. Workflows are hard because they can do anything with any object. And that flexibility comes at a price because it is difficult to know whether that csv file is from a UV-Vis instrument, a small-angle x-ray experiment, or a simulated data set.

But locally, within a research group those there is a more limited set of data objects. With a local (or localized) repository it is possible to imagine plugins that do common single steps on common files. And because the configuration is local there is much less metadata required. But in turn that configuration provides metadata. If a particular filetype from a directory is configured for automated calculation of A280 for protein concentrations then we know that those data files are UV-Vis spectra. What is more, once we know that we can offer an automated protein concentration calculator. This will only work if the system knows what protein you are measuring, an incentive to identify the sample when you do the measurement.

The architecture of such a system would be relatively straightforward. A web service provides the cloud based backup and configuration for captured data files and users. Clients that sit on group usersâ€™ personal computers as well as on instruments grab their configuration information from the central repository. They might simply monitor specified directories, or they might pop up with a defined set of questions to capture additional metadata. Users register the instruments that they want to â€œfollowâ€ and when a new data file is generated with their name on it, it is also synchronized back to their registered devices.

The web service provides a plugin architecture where appropriate plugins for the group can be added from some sort of marketplace or online. Plugins that process data to generate additional metadata (e.g. by parsing a log file) can add that to the record of the data file. Those that generate new data will need to be configured as to where that data should go and who should have it synchronised. The plugins will also generate a record of what they did, providing an audit and provenance trail. Finally plugins can provide notifcation back to the users via email, the webservice, or a desktop client, of queued processes that need more information to proceed. The user can mute these but equally the encouragement is there to provide a little more info.

Finally the question of sharing and publication. For individual data file sharing sites like Figshare, plugins might enable straightforward direct submission of files submitted into a specific directory. For collections of data, such as those supported by Dryad, there will need to be to group files together but again this could be as simple as creating a directory. Even if files are copied and pasted or removed from their â€œproper directoriesâ€ the system stands a reasonable chance of recognizing files it has already seen and inferring their provenance.

By making it easy to share and easier to satisfy data sharing requests by pushing data to a public space (while still retaining some ability to see how it is being used and by who) the substrate that is required to build better plugins, more functionality, and above all better discovery tools will be provided, and in turn those tools will start to develop. As the tools and functionality develop then the value gained by sharing will rise creating a virtuous circle, encouraging both good data management practice, good metadata provision, and good sharing.

This path starts with things we can build today, and in some cases already exist. It becomes more speculative as it goes forward. There are issues with file synching and security. Things will get messy. The plugin architecture is nothing more than hand waving at the moment and success will require a whole ecosystem of repositories and tools for operating on them. But it is a way forward that looks plausible to me. One that solves the problems researchers have today and guides them towards a tomorrow where best practice is a bit closer to common practice.

September 8, 2011September 8, 2011

Incentives: Definitely a case of rolling your own

Lakhovsky: The Convesation; oil on panel (Ð‘ÐµÑÐµ... — Image via Wikipedia

Science Online London ran late last week and into the weekend and I was very pleased to be asked to run a panel, broadly speaking focused on evaluation and incentives. Now I had thought that the panel went pretty well but Iâ€™d be fibbing if I said I wasnâ€™t a bit disappointed. Not disappointed with the panel members or what they said. Yes it was a bit more general than I had hoped and there were things that I wished weâ€™d covered but the substance was good from my perspective. My disappointment was with the response from the audience, really on two slightly different points.

The first was the lack of response to what I thought were some of the most exciting things Iâ€™ve heard in a long time from major stakeholders. Iâ€™ll come back to that later. But a bigger disappointment was that people didnâ€™t seem to connect the dots to their own needs and experiences.

Science Online, both in London and North Carolina forms, has for me always been a meeting where the conversation proceeds at a more sophisticated level than the usual. So I pitched the plan of the session at where I thought the level should be. Yes we needed to talk about the challenges and surface the usual problems, non-traditional research outputs and online outputs in particular donâ€™t get the kind of credit that papers do, institutions struggle to give credit for work that doesnâ€™t fit in a pigeonhole, funders seem to reward only the conventional and traditional, and people outside the ivory tower struggle to get either recognition or funding. These are known challenges, the question is how to tackle them.

The step beyond this is the hard one. It is easy to say that incentives need to change. But incentives donâ€™t drop from heaven. Incentives are created within communities and they become meaningful when they are linked to the interests of stakeholders with resources. So the discussion wasnâ€™t really about impact, or funding, or to show that nothing can be done by amateurs. The discussion was about the needs of institutions and funders and how they can be served by what is being generated by the online community. It was also about the constraints they face in acting. But fundamentally you had major players on the stage saying â€œthis is the kind of thing we need to get the ball rollingâ€.

Make no mistake, this is tough. Everyone is constrained and resources are tight but at the centre of the discussion were the key pointers to how to cut through the knot. The head of strategy at a major research university stated that universities want to play a more diverse role, want to create more diverse scholarly outputs, and want to engage with the wider community in new ways. That smart institutions will be looking to diversify. The head of evaluation at a major UK funder said that funders really want to know about non-traditional outputs and how they were having a wider impact. That these outputs are amongst the best things they can talk about to government. That they will be crucial to make the case to sustain science funding.

Those statements are amongst the most direct and exciting I have heard in some years of advocacy in this space. The opportunity is there, if youâ€™re willing to put the effort in to communicate and to shape what you are doing to match to match their needs. As Michael Nielsen said in his morning keynote this is a collective action problem. That means finding what unites the needs of those doing with the needs of those with resources. It means compromise, and it means focusing on the achievable, but the point of the discussion was to identify what might be achievable.

So mostly I was disappointed that the excitement I felt wasnâ€™t mirrored in the audience. The discussion about incentives has to move on. Saying that â€œinstitutions should do Xâ€ or â€œfunders should do Yâ€ gets us nowhere. Understanding what we can do together with funders and institutions and other communities to take the online agenda forward and understanding what the constraints are is where we need to go. The discussion showed that both institutions and funders know that they need what the community of online scientists can do. They donâ€™t know how to go about it, and they donâ€™t even know very much what we are doing, but they want to know. And when they do know they can advise and help and they can bring resources to bear. Maybe not all the resources you would like, and maybe not for all the things you would like, but resources nonetheless.

With a lot of things it is easy to get too immersed in the detail of these issues and to forget that people are looking in from the outside without the same context. I guess the fact that I pulled out what might have seemed to the audience to be just asides as the main message is indicative of that. But I really want to get that message out because I think it Â is critical if the community of online scientists wants to be the mainstream. And I think it should be.

The bottom line is that smart funders and smart institutions value what is going on online. They want to support it, they want to be seen to support it, but theyâ€™re not always sure how to go about it and how to judge its quality. But they want to know more. Thatâ€™s where you come in and thatâ€™s why the session was relevant. Lars Fischer had it absolutely right: â€œI think the biggest and most consequential incentive for scientists is (informal) recognition by peers.â€ You know, we know, who is doing the good stuff and what is valuable. Take that conversation to the funders and the institutions, explain to them whatâ€™s good and why, and tell the story of what the value is. Put it in your CV, demand that promotion panels take account of it, whichever side of the table you are on. Show that you make an impact in language that they understand. They want to know. They may not always be able to act – funding is an issue – but they want to and they need your help. In many ways they need our help more than we need theirs. And if that isnâ€™t an incentive then I donâ€™t know what is.

Submission to the Royal Society Enquiry (cameronneylon.net)
Recognising, Appreciating, Measuring and Evaluating the Impact of Open Science (ukwebfocus.wordpress.com)
A new sustainability model: Major funders to support OA journal (cameronneylon.net)
(S)low impact research and the importance of open in maximising re-use (cameronneylon.net)
Some questions for British research policy (softmachines.org)
Proceed with caution when setting up financial incentives for general practice doctors (eurekalert.org)

August 5, 2011August 5, 2011

Submission to the Royal Society Enquiry

Title page of Philosophical Transactions of th... — Image via Wikipedia

The Royal Society is running a public consultation exercise on Science as a Public Enterprise. Submissions are requested to answer a set of questions. Here are my answers.

1. What ethical and legal principles should govern access to research results and data? How can ethics and law assist in simultaneously protecting and promoting both public and private interests?

There are broadly two principles that govern the ethics of access to research results and data. Firstly there is the simple position that publicly funded research should by default be accessible to the public (with certain limited exceptions, see below). Secondly claims that impinge on public policy, health, safety, or the environment, that are based on research should be supported by public access to the data. See more detail in answer to Q2.

2 a) How should principles apply to publicly-funded research conducted in the public interest?

By default research outputs from publicly funded research should be made publicly accessible and re-usable in as timely a manner as possible. In an ideal world the default would be immediate release, however this is not a practically accessible goal in the near future. Cultural barriers and community inertia prevent the exploitation of technological tools that demonstrably have the potential enable research to move faster and more effectively. Research communication mechanisms are currently shackled to the requirements of the research community to monitor career progression and not optimised for effective communication.

In the near term it is practical to move towards an expectation that research outputs that support published research should be accessible and re-usable. Reasonable exceptions to this include data that is personally identifiable, that may place cultural or environmental heritage at risk, that places researchers at risk, or that might affect the integrity of ongoing data collection. The key point is that while there are reasonable exceptions to the principle of public access to public research outputs that these are exceptions and not the general rule.

What is not reasonable is to withhold or limit the re-use of data, materials, or other research outputs from public research for the purpose of personal advancement, including the “squeezing out of a few more papers”. If these outputs can be more effectively exploited elsewhere then this a more efficient use of public resources to further our public research agenda. The community has placed the importance of our own career advancement ahead of the public interest in achieving outcomes from public research for far too long.

What is also politically naive is to believe or even to create the perception that it is acceptable to withhold data on the basis that “the public won’t understand” or “it might be misused”. The web has radically changed the economics of information transfer but it has perhaps more importantly changed the public perception on access to data. The wider community is rightly suspicious of any situation where public information is withheld. This applies equally to publicly funded research as it does to government data.

2 b) How should principles apply to privately-funded research involving data collected about or from individuals and/or organisations (e.g. clinical trials)?

Increasingly public advocacy groups are becoming involved in contributing to a range of research activities including patient advocacy groups supporting clinical trials, environmental advocacy groups supporting data collection, as well as a wider public involvement in, for instance, citizen science projects.

In the case where individuals or organisations are contributing to research they have a right for that contribution to be recognised and a right to participate on their own terms (or to choose not to participate where those terms are unacceptable).

Organised groups (particularly patient groups) are of growing importance to a range of research. Researchers should expect to negotiate with such groups as to the ultimate publication of data. Such groups should have the ability to demand greater public release and to waive rights to privacy. Equally contributors have a right to expect a default right to privacy where personally identifiable information is involved.

Privacy trumps the expectation of data release and the question of what is personally identifiable information is a vexed question which as a society we are working through. Researchers will need to explore these issues with participants and to work to ensure that data generated can be anonymised in a way that enables the released data to effectively support the claims made from it. This is a challenging area which requires significant further technical, policy, and ethics work.

2 c) How should principles apply to research that is entirely privately-funded but with possible public implications?

It is clear that public funded research is a public good. By contrast privately funded research is properly a private good and the decision to release or not release research outputs lies with the funder.

It is worth noting that much of the privately funded research in UK universities is significantly subsidised through the provision of public infrastructure and this should be taken into consideration when defining publicly and privately funded research. Here I consider research that is 100% privately funded.

Where claims are made on the basis of privately funded research (e.g. of environmental impact or the efficacy of health treatments) then such claims SHOULD be fully supported by provision of the underlying evidence and data if they are to be credible. Where such claims are intended to influence public policy such evidence and data MUST be made available. That is, evidence based public policy must be supported by the publication of the full evidence regardless of the source of that evidence. Claims made to influence public policy that are not supported by provision of evidence must be discounted for the purposes of making public policy.

2 d) How should principles apply to research or communication of data that involves the promotion of the public interest but which might have implications from the privacy interests of citizens?

See above: the right to privacy trumps any requirement to release raw data. Nonetheless research should be structured and appropriate consent obtained to ensure that claims made on the basis of the research can be supported by an adequate, publicly accessible, evidence base.

3. What activities are currently under way that could improve the sharing and communication of scientific information?

A wide variety of technical initiatives are underway to enable the wider collection, capture, archival and distribution of research outputs including narrative, data, materials, and other elements of the research process. It is technically possible for us today to immediately publish the entire research record if we so choose. Such an extreme approach is resource intensive, challenging, and probably not ultimately a sensible use of resources. However it is clear that more complete and rapid sharing has the potential to increase the effectiveness and efficiency of research.

The challenges in exploiting these opportunities are fundamentally cultural. The research community is focussed almost entirely on assessment through the extremely narrow lens of publication of extended narratives in high profile peer reviewed journals. This cultural bias must be at least partially reversed before we can realise the opportunities that technology affords us. This involves advocacy work, policy development, the addressing of incentives for researchers and above all the slow and arduous process of returning the research culture to one which takes responsibility for the return on the public investment, including economic, health, social, education, and research returns and one that takes responsibility for effective communication of research outputs.

4. How do/should new media, including the blogosphere, change how scientists conduct and communicate their research?

New media (not really new any more and increasingly part of the mainstream) democratise access to communications and increase the pace of communication. This is not entirely a good thing and en masse the quality of the discourse is not always high. High quality depends on the good will, expertise, and experience of those taking part.There is a vast quantity of high quality, rapid response discourse that occurs around research on the web today even if it occurs in many places. The most effective means of determining whether a recent high profile communication stands up to criticism is to turn to discussion on blogs and news sites, not to wait months for a possible technical criticism to appear in a journal. In many ways this is nothing new, it is return to the traditional approaches of communication seen at the birth of the Royal Society itself of direct and immediate communication between researchers by the most efficient means possible; letters in the 17C and the web today.

Alongside the potential for more effective communication of researchers with each other there is also an enormous potential for more effective engagement with the wider community, not merely through “news and views” pieces but through active conversation, and indeed active contributions from outside the academy. A group of computer consultants are working to contribute their expertise in software development to improving legacy climate science software. This is a real contribution to the research effort. Equally the right question at the right time may come from an unexpected source but lead to new insights. We need to be open to this.

At the same time there is a technical deficiency in the current web and that is the management of the sheer quantity of potential connections that can be made. Our most valuable resource in research is expert attention. This attention may come from inside or outside the academy but it is a resource that needs to be efficiently directed to where it can have the most impact. This will include the necessary development of mechanisms that assist in choosing which potential contacts and information to follow up. These are currently in their infancy. Their development is in any case a necessity to deal with the explosion of traditional information sources.

5. What additional challenges are there in making data usable by scientists in the same field, scientists in other fields, â€˜citizen scientistsâ€™ and the general public?

Effective sharing of data and indeed most research outputs remains a significant challenge. The problem is two-fold, first of ensuring sufficient contextual information that an expert can understand the potential uses of the research output. Secondly the placing of that contextual information in a narrative that is understandable to the widest possible range of users. These are both significant challenges that are being tackled by a large number of skilled people. Progress is being made but a great deal of work remains in developing the tools, techniques, and processes that will enable the cost effective sharing of research outputs.

A key point however is that in a world where publication is extremely cheap then simply releasing whatever outputs exist in their current form can still have a positive effect. Firstly where the cost of release is effectively zero even if there is only a small chance of those data being discovered and re-used this will still lead to positive outcomes in aggregate. Secondly the presence of this underexploited resource of released, but insufficiently marked up and contextualised, data will drive the development of real systems that will make them more useful.

6 a) What might be the benefits of more widespread sharing of data for the productivity and efficiency of scientific research?

Fundamentally more efficient, more effective, and more publicly engaging research. Less repetition and needless rediscovery of negative results and ideally more effective replication and critiquing of positive results are enabled by more widespread data sharing. As noted above another important outcome is that even suboptimal sharing will help to drive the development of tools that will help to optimise the effective release of data.

6 b) What might be the benefits of more widespread sharing of data for new sorts of science?

The widespread sharing of data has historically always lead to entirely new forms of science. The modern science of crystallography is based largely on the availability of crystal structures, bioinformatics would simply not exist without genbank, the PDB, and other biological databases and the astronomy of today would be unrecognizable to someone whose career ended prior to the availability of the Sloan Digital Sky Survey. Citizen science projects of the type of Galaxy Zoo, Fold-IT and many others are inconceivable without the data to support them. Extrapolating from this evidence provides an exciting view of the possibilities. Indeed one which it would be negligent not to exploit.

6 c) What might be the benefits of more widespread sharing of data for public policy?

Policy making that is supported by more effective evidence is something that appeals to most scientists. Of course public policy making is never that simple. Nonetheless it is hard to see how a more effective and comprehensive evidence base could fail to support better evidence based policy making. Indeed it is to be hoped that a wide evidence base, and the contradictions it will necessarily contain, could lead to a more sophisticated understanding of the scope and critique of evidence sources.

6 d) What might be the benefits of more widespread sharing of data for other social benefits?

The potential for wider public involvement in science is a major potential benefit. As in e) above a deeper understanding of how to treat and parse evidence and data throughout society can only be positive.

6 e) What might be the benefits of more widespread sharing of data for innovation and economic growth?

Every study of the release of government data has shown that it leads to a nett economic benefit. This is true even when such data has traditionally been charged for. The national economy benefits to a much greater extent than any potential loss of revenue. While this is notÂ necessarilyÂ sufficient incentive for private investors to release data in this case of public investment the object is to maximise national ROI. Therefore release in a fully open form is the rational economic approach.

The costs of lack of acces to publicly funded research outputs by SMEs is well established. Improved access will remove the barriers that currently stifle innovation and economic growth.

6 f) What might be the benefits of more widespread sharing of data for public trust in the processes of science?

There is both a negative and a positive side to this question. On the positive greater transparency, more potential for direct involvement, and a greater understanding of the process by which research proceeds will lead to greater public confidence. On the negative, doing nothing is simply not an option. Recent events have shown not so much that the public has lost confidence in science and scientists but that there is deep shock at the lack of transparency and the lack of availability of data.

If the research community does not wish to be perceived in the same way as MPs and other recent targets of public derision then we need to move rapidly to improve the degree of transparency and accessibility of the outputs of public research.

7. How should concerns about privacy, security and intellectual property be balanced against the proposed benefits of openness?

There is little evidence that the protection of IP supports a nett increase on the return on the public investment in research. While there may be cases where it is locally optimal to pursue IP protection to exploit research outputs and maximise ROI this is not generally the case. The presumption that everything should be patented is both draining resources and stifling British research. There should always be an avenue for taking this route to exploitation but there should be a presumption of open communication of research outputs and the need for IP protection should be justified on a case by case basis. It should be unacceptable for the pursuit of IP protection to damage the communication and downstream exploitation of research.

Privacy issues and concerns around the personal security of researchers have been discussed above. National security issues will in many cases fall under a justifiable exception to the presumption of openness although it is clear that this needs care and probably oversight to retain public confidence.

8. What should be expected and/or required of scientists (in companies, universities or elsewhere), research funders, regulators, scientific publishers, research institutions, international organisations and other bodies?

British research could benefit from a statement of values, something that has the cultural significance of the Haldane principle (although perhaps better understood) or the Hippocratic oath. A shared cultural statement that captures a commitment to efficiently discharging the public trust invested in us, to open processes as a default, and to specific approaches where appropriate would act as a strong centre around which policy and tools could be developed. Leadership is crucial here in setting values and embedding these within our culture. Organisations such as the Royal Society have an important role to play.

Researchers and the research community need to take these responsibilities on ourselves in a serious and considered manner. Funders and regulators need to provide a policy framework, and where appropriate community sanctions for transgression of important principles. Research institutions are for the most part tied into current incentive systems that are tightly coupled to funding arrangements and have limited freedom of movement. Nonetheless a serious consideration of the ROI of technology transfer arrangements and of how non-traditional outputs, including data, contribute to the work of the institution and its standing are required. In the current economic climate successful institutions will diversify in their approach. Those that do not are unlikely to survive in their current form.

Other comments

This is not the first time that the research community has faced this issue. Indeed it is not even the first time the Royal Society has played a central role. Several hundred years ago it was a challenge to persuade researchers to share information at all. Results were hidden. Sharing was partial, only within tight circles, and usually limited in scope. The precursors of the Royal Society played a key role in persuading the community that effective sharing of their research outputs would improve research. Many of the same concerns were raised; concerns about the misuse of those outputs, concerns about others stealing ideas, concerns about personal prestige and the embarrassment potential of getting things wrong.

The development of journals and the development of a values system that demanded that results be made public took time, it took leadership, and with the technology of the day the best possible system was developed over an extended period. With a new technology now available we face the same issues and challenges. It is to be hoped that we tackle those challenges and opportunities with the same sense of purpose.

Related articles

Related articles

Related articles

Related articles

Related articles

Related articles