Parsing the Willetts Speech on Access to UK Research Outputs

David Willetts speaking at the Big Society pol...
David Willetts speaking at the Big Society policy launch, Coin St, London. (Photo credit: Wikipedia)

Yesterday David Willetts, the UK Science and Universities Minister gave a speech to the Publishers Association that has got wide coverage. However it is worth pulling apart both the speech and the accompanying opinion piece from the Guardian because there are some interesting elements in there, and also some things have got a little confused.

The first really key point is that there is nothing new here. This is basically a re-announcement of the previous position from the December Innovation Strategy on moving towards a freely accessible literature and a more public announcement of the Gateway to Research project previously mentioned in the RCUK response to the Innovation Statement.

The Gateway to Research project is a joint venture of the Department of Business Innovation and Skills and Research Councils UK to provide a one stop shop for information on UK research funding as well as pointers to outputs. It will essentially draw information directly from sources that already exist (the Research Outputs System and eVal) as well as some new ones with the intention of helping the UK public and enterprise find research and researchers that is of interest to them, and see how they are funded.

The new announcement was that Jimmy Wales of Wikipedia fame will be advising on the GTR portal. This is a good thing and he is well placed to provide both technical and social expertise on the provision of public facing information portals as well as providing a more radical perspective than might come out of BIS itself. While this might in part be cynically viewed as another example of bringing in celebrities to advise on policy this is a celebrity with relevant expertise and real credibility based on making similar systems work.

The rest of the information that we can gather relates to government efforts in moving towards making the UK research literature accessible. Wales also gets a look in here, and will be “advising us on [..] common standards to ensure information is presented in a readily reusable form”. My reading of this is that the Minister understands the importance of interoperability and my hope is that this will mean that government is getting good advice on appropriate licensing approaches to support this.

However, many have read this section of the speech as saying that GTR will act as some form of national repository for research articles. I do not believe this is the intention, and reading between the lines the comment that it will “provide direct links to actual research outputs such as data sets and publications” [my emphasis] is the key. The point of GTR is to make UK research more easily discoverable. Access is a somewhat orthogonal issue. This is better read as an expression of Willetts’ and the wider government’s agenda on transparency of public spending than as a mechanism for providing access.

What else can we tell from the speech? Well the term “open access” is used several times, something that was absent from the innovation statement, but still the emphasis is on achieving “public access” in the near term with “open access” cast as the future goal as I read it. It’s not clear to me whether this is a well informed distinction. There is a somewhat muddled commentary on Green vs Gold OA but not that much more muddled than what often comes from our own community. There are also some clear statements on the challenges for all involved.

As an aside I found it interesting that Willetts gave a parenthetical endorsement of usage metrics for the research literature when speaking of his own experience.

As well as reading some of the articles set by my tutors, I also remember browsing through the pages of the leading journals to see which articles were well-thumbed. It helped me to spot the key ones I ought to be familiar with – a primitive version of crowd-sourcing. The web should make that kind of search behaviour far easier.

This is the most sophisticated appreciation of the potential for the combination of measurement and usage data in discovery that I have seen from any politician. It needs to be set against his endorsement of rather cruder filters earlier in the speech but it nonetheless gives me a sense that there is a level of understanding within government that is greater than we often fear.

Much of the rest of the speech is hedging. Options are discussed but not selected and certainly not promoted. The key message: wait for the Finch Report which will be the major guide for the route the government will take and the mechanisms that will be put in place to support it.

But there are some clearer statements. There is a strong sense that Hargreave’s recommendations on enabling text mining should be implemented. And the logic for this is well laid out. The speech and the policy agenda is embedded in a framework of enabling innovation – making it clear what kinds of evidence and argument we will need to marshal in order to persuade. There is also a strong emphasis on data as well as an appreciation that there is much to do in this space.

But the clearest statement made here is on the end goals. No-one can be left in any doubt of Willetts’ ultimate target. Full access to the outputs of research, ideally at the time of publication, in a way that enables them to be fully exploited, manipulated and modified for any purpose by any party. Indeed the vision is strongly congruent with the Berlin, Bethesda, and Budapest declarations on Open Access. There is still much to be argued about the route and and its length, but in the UK at least, the destination appears to be in little doubt.

Enhanced by Zemanta

Submission to the Royal Society Enquiry

Title page of Philosophical Transactions of th...
Image via Wikipedia

The Royal Society is running a public consultation exercise on Science as a Public Enterprise. Submissions are requested to answer a set of questions. Here are my answers.

1. What ethical and legal principles should govern access to research results and data? How can ethics and law assist in simultaneously protecting and promoting both public and private interests?

There are broadly two principles that govern the ethics of access to research results and data. Firstly there is the simple position that publicly funded research should by default be accessible to the public (with certain limited exceptions, see below). Secondly claims that impinge on public policy, health, safety, or the environment, that are based on research should be supported by public access to the data. See more detail in answer to Q2.

2 a) How should principles apply to publicly-funded research conducted in the public interest?

By default research outputs from publicly funded research should be made publicly accessible and re-usable in as timely a manner as possible. In an ideal world the default would be immediate release, however this is not a practically accessible goal in the near future. Cultural barriers and community inertia prevent the exploitation of technological tools that demonstrably have the potential enable research to move faster and more effectively. Research communication mechanisms are currently shackled to the requirements of the research community to monitor career progression and not optimised for effective communication.

In the near term it is practical to move towards an expectation that research outputs that support published research should be accessible and re-usable. Reasonable exceptions to this include data that is personally identifiable, that may place cultural or environmental heritage at risk, that places researchers at risk, or that might affect the integrity of ongoing data collection. The key point is that while there are reasonable exceptions to the principle of public access to public research outputs that these are exceptions and not the general rule.

What is not reasonable is to withhold or limit the re-use of data, materials, or other research outputs from public research for the purpose of personal advancement, including the “squeezing out of a few more papers”. If these outputs can be more effectively exploited elsewhere then this a more efficient use of public resources to further our public research agenda. The community has placed the importance of our own career advancement ahead of the public interest in achieving outcomes from public research for far too long.

What is also politically naive is to believe or even to create the perception that it is acceptable to withhold data on the basis that “the public won’t understand” or “it might be misused”. The web has radically changed the economics of information transfer but it has perhaps more importantly changed the public perception on access to data. The wider community is rightly suspicious of any situation where public information is withheld. This applies equally to publicly funded research as it does to government data.

2 b) How should principles apply to privately-funded research involving data collected about or from individuals and/or organisations (e.g. clinical trials)?

Increasingly public advocacy groups are becoming involved in contributing to a range of research activities including patient advocacy groups supporting clinical trials, environmental advocacy groups supporting data collection, as well as a wider public involvement in, for instance, citizen science projects.

In the case where individuals or organisations are contributing to research they have a right for that contribution to be recognised and a right to participate on their own terms (or to choose not to participate where those terms are unacceptable).

Organised groups (particularly patient groups) are of growing importance to a range of research. Researchers should expect to negotiate with such groups as to the ultimate publication of data. Such groups should have the ability to demand greater public release and to waive rights to privacy. Equally contributors have a right to expect a default right to privacy where personally identifiable information is involved.

Privacy trumps the expectation of data release and the question of what is personally identifiable information is a vexed question which as a society we are working through. Researchers will need to explore these issues with participants and to work to ensure that data generated can be anonymised in a way that enables the released data to effectively support the claims made from it. This is a challenging area which requires significant further technical, policy, and ethics work.

2 c) How should principles apply to research that is entirely privately-funded but with possible public implications?

It is clear that public funded research is a public good. By contrast privately funded research is properly a private good and the decision to release or not release research outputs lies with the funder.

It is worth noting that much of the privately funded research in UK universities is significantly subsidised through the provision of public infrastructure and this should be taken into consideration when defining publicly and privately funded research. Here I consider research that is 100% privately funded.

Where claims are made on the basis of privately funded research (e.g. of environmental impact or the efficacy of health treatments) then such claims SHOULD be fully supported by provision of the underlying evidence and data if they are to be credible. Where such claims are intended to influence public policy such evidence and data MUST be made available. That is, evidence based public policy must be supported by the publication of the full evidence regardless of the source of that evidence. Claims made to influence public policy that are not supported by provision of evidence must be discounted for the purposes of making public policy.

2 d) How should principles apply to research or communication of data that involves the promotion of the public interest but which might have implications from the privacy interests of citizens?

See above: the right to privacy trumps any requirement to release raw data. Nonetheless research should be structured and appropriate consent obtained to ensure that claims made on the basis of the research can be supported by an adequate, publicly accessible, evidence base.

3. What activities are currently under way that could improve the sharing and communication of scientific information?

A wide variety of technical initiatives are underway to enable the wider collection, capture, archival and distribution of research outputs including narrative, data, materials, and other elements of the research process. It is technically possible for us today to immediately publish the entire research record if we so choose. Such an extreme approach is resource intensive, challenging, and probably not ultimately a sensible use of resources. However it is clear that more complete and rapid sharing has the potential to increase the effectiveness and efficiency of research.

The challenges in exploiting these opportunities are fundamentally cultural. The research community is focussed almost entirely on assessment through the extremely narrow lens of publication of extended narratives in high profile peer reviewed journals. This cultural bias must be at least partially reversed before we can realise the opportunities that technology affords us. This involves advocacy work, policy development, the addressing of incentives for researchers and above all the slow and arduous process of returning the research culture to one which takes responsibility for the return on the public investment, including economic, health, social, education, and research returns and one that takes responsibility for effective communication of research outputs.

4. How do/should new media, including the blogosphere, change how scientists conduct and communicate their research?

New media (not really new any more and increasingly part of the mainstream) democratise access to communications and increase the pace of communication. This is not entirely a good thing and en masse the quality of the discourse is not always high. High quality depends on the good will, expertise, and experience of those taking part.There is a vast quantity of high quality, rapid response discourse that occurs around research on the web today even if it occurs in many places. The most effective means of determining whether a recent high profile communication stands up to criticism is to turn to discussion on blogs and news sites, not to wait months for a possible technical criticism to appear in a journal. In many ways this is nothing new, it is return to the traditional approaches of communication seen at the birth of the Royal Society itself of direct and immediate communication between researchers by the most efficient means possible; letters in the 17C and the web today.

Alongside the potential for more effective communication of researchers with each other there is also an enormous potential for more effective engagement with the wider community, not merely through “news and views” pieces but through active conversation, and indeed active contributions from outside the academy. A group of computer consultants are working to contribute their expertise in software development to improving legacy climate science software. This is a real contribution to the research effort. Equally the right question at the right time may come from an unexpected source but lead to new insights. We need to be open to this.

At the same time there is a technical deficiency in the current web and that is the management of the sheer quantity of potential connections that can be made. Our most valuable resource in research is expert attention. This attention may come from inside or outside the academy but it is a resource that needs to be efficiently directed to where it can have the most impact. This will include the necessary development of mechanisms that assist in choosing which potential contacts and information to follow up. These are currently in their infancy. Their development is in any case a necessity to deal with the explosion of traditional information sources.

5. What additional challenges are there in making data usable by scientists in the same field, scientists in other fields, ‘citizen scientists’ and the general public?

Effective sharing of data and indeed most research outputs remains a significant challenge. The problem is two-fold, first of ensuring sufficient contextual information that an expert can understand the potential uses of the research output. Secondly the placing of that contextual information in a narrative that is understandable to the widest possible range of users. These are both significant challenges that are being tackled by a large number of skilled people. Progress is being made but a great deal of work remains in developing the tools, techniques, and processes that will enable the cost effective sharing of research outputs.

A key point however is that in a world where publication is extremely cheap then simply releasing whatever outputs exist in their current form can still have a positive effect. Firstly where the cost of release is effectively zero even if there is only a small chance of those data being discovered and re-used this will still lead to positive outcomes in aggregate. Secondly the presence of this underexploited resource of released, but insufficiently marked up and contextualised, data will drive the development of real systems that will make them more useful.

6 a) What might be the benefits of more widespread sharing of data for the productivity and efficiency of scientific research?

Fundamentally more efficient, more effective, and more publicly engaging research. Less repetition and needless rediscovery of negative results and ideally more effective replication and critiquing of positive results are enabled by more widespread data sharing. As noted above another important outcome is that even suboptimal sharing will help to drive the development of tools that will help to optimise the effective release of data.

6 b) What might be the benefits of more widespread sharing of data for new sorts of science?

The widespread sharing of data has historically always lead to entirely new forms of science. The modern science of crystallography is based largely on the availability of crystal structures, bioinformatics would simply not exist without genbank, the PDB, and other biological databases and the astronomy of today would be unrecognizable to someone whose career ended prior to the availability of the Sloan Digital Sky Survey. Citizen science projects of the type of Galaxy Zoo, Fold-IT and many others are inconceivable without the data to support them. Extrapolating from this evidence provides an exciting view of the possibilities. Indeed one which it would be negligent not to exploit.

6 c) What might be the benefits of more widespread sharing of data for public policy?

Policy making that is supported by more effective evidence is something that appeals to most scientists. Of course public policy making is never that simple. Nonetheless it is hard to see how a more effective and comprehensive evidence base could fail to support better evidence based policy making. Indeed it is to be hoped that a wide evidence base, and the contradictions it will necessarily contain, could lead to a more sophisticated understanding of the scope and critique of evidence sources.

6 d) What might be the benefits of more widespread sharing of data for other social benefits?

The potential for wider public involvement in science is a major potential benefit. As in e) above a deeper understanding of how to treat and parse evidence and data throughout society can only be positive.

6 e) What might be the benefits of more widespread sharing of data for innovation and economic growth?

Every study of the release of government data has shown that it leads to a nett economic benefit. This is true even when such data has traditionally been charged for. The national economy benefits to a much greater extent than any potential loss of revenue. While this is not necessarily sufficient incentive for private investors to release data in this case of public investment the object is to maximise national ROI. Therefore release in a fully open form is the rational economic approach.

The costs of lack of acces to publicly funded research outputs by SMEs is well established. Improved access will remove the barriers that currently stifle innovation and economic growth.

6 f) What might be the benefits of more widespread sharing of data for public trust in the processes of science?

There is both a negative and a positive side to this question. On the positive greater transparency, more potential for direct involvement, and a greater understanding of the process by which research proceeds will lead to greater public confidence. On the negative, doing nothing is simply not an option. Recent events have shown not so much that the public has lost confidence in science and scientists but that there is deep shock at the lack of transparency and the lack of availability of data.

If the research community does not wish to be perceived in the same way as MPs and other recent targets of public derision then we need to move rapidly to improve the degree of transparency and accessibility of the outputs of public research.

7. How should concerns about privacy, security and intellectual property be balanced against the proposed benefits of openness?

There is little evidence that the protection of IP supports a nett increase on the return on the public investment in research. While there may be cases where it is locally optimal to pursue IP protection to exploit research outputs and maximise ROI this is not generally the case. The presumption that everything should be patented is both draining resources and stifling British research. There should always be an avenue for taking this route to exploitation but there should be a presumption of open communication of research outputs and the need for IP protection should be justified on a case by case basis. It should be unacceptable for the pursuit of IP protection to damage the communication and downstream exploitation of research.

Privacy issues and concerns around the personal security of researchers have been discussed above. National security issues will in many cases fall under a justifiable exception to the presumption of openness although it is clear that this needs care and probably oversight to retain public confidence.

8. What should be expected and/or required of scientists (in companies, universities or elsewhere), research funders, regulators, scientific publishers, research institutions, international organisations and other bodies?

British research could benefit from a statement of values, something that has the cultural significance of the Haldane principle (although perhaps better understood) or the Hippocratic oath. A shared cultural statement that captures a commitment to efficiently discharging the public trust invested in us, to open processes as a default, and to specific approaches where appropriate would act as a strong centre around which policy and tools could be developed. Leadership is crucial here in setting values and embedding these within our culture. Organisations such as the Royal Society have an important role to play.

Researchers and the research community need to take these responsibilities on ourselves in a serious and considered manner. Funders and regulators need to provide a policy framework, and where appropriate community sanctions for transgression of important principles. Research institutions are for the most part tied into current incentive systems that are tightly coupled to funding arrangements and have limited freedom of movement. Nonetheless a serious consideration of the ROI of technology transfer arrangements and of how non-traditional outputs, including data, contribute to the work of the institution and its standing are required. In the current economic climate successful institutions will diversify in their approach. Those that do not are unlikely to survive in their current form.

Other comments

This is not the first time that the research community has faced this issue. Indeed it is not even the first time the Royal Society has played a central role. Several hundred years ago it was a challenge to persuade researchers to share information at all. Results were hidden. Sharing was partial, only within tight circles, and usually limited in scope. The precursors of the Royal Society played a key role in persuading the community that effective sharing of their research outputs would improve research. Many of the same concerns were raised; concerns about the misuse of those outputs, concerns about others stealing ideas, concerns about personal prestige and the embarrassment potential of getting things wrong.

The development of journals and the development of a values system that demanded that results be made public took time, it took leadership, and with the technology of the day the best possible system was developed over an extended period. With a new technology now available we face the same issues and challenges. It is to be hoped that we tackle those challenges and opportunities with the same sense of purpose.

Enhanced by Zemanta