Update on publishers and SOPA: Time for scholarly publishers to disavow the AAP

Canute and his courtiers
Image via Wikipedia

In my last post on scholarly publishers that support the US Congress SOPA bill I ended up making a series of edits. It was pointed out to me that the Macmillan listed as a supporter is not the Macmillan that is the parent group of Nature Publishing Group but a separate U.S. subsidiary of the same ultimate holding company, Holtzbrinck. As I dug further it became clear that while only a small number of scholarly publishers were explicitly and publicly supporting SOPA, many of them are members of the Association of American Publishers, which is listed publicly as a supporter.

This is a little different to directly supporting the act. The AAP is a membership organisation that represents its members (including Nature Publishing Group, Oxford University Press, Wiley Blackwell and a number of other familiar names, see the full list at the bottom) to – amongst others – the U.S. government. Not all of its positions would necessarily be held by all its members. However, neither have any of those members come out and publicly stated that they disagree with the AAP position. In another domain Kaspersky software quit the Business Software Alliance over the BSA’s support of SOPA, even after the BSA withdrew its support.

I was willing to give AAP members some benefit of the doubt, hoping that some of them might come out publicly against SOPA. But if that was the hope then the AAP have just stepped over the line. In a spectacularly disingenuous press release the AAP claims significant credit for a new act just submitted to the U.S. Congress. This, in a repeat of some previous efforts, would block any efforts on the part of U.S. federal agencies to enact open access policies, even to the extent of blocking them from continuing to run the spectacularly successful PubMedCentral. That this comes days before the deadline for a request for information on the development of appropriate and balanced policies that would support access to the published results of U.S. taxpayer-funded research is a calculated political act, an abrogation of any principled stance, and clear signal of a lack of any interest in a productive discussion on how to move scholarly communications forward into a networked future.

I was willing to give AAP members some space. Not any more. The time has come to decide whether you want to be part of the future of research communication or whether you want to legislate to try and stop that future happening. You can be part of that future or you can be washed into the past. You can look forward or you can be part of a political movement working to rip off the taxpayers and charitable donors of the world. Remember that the profits alone of Elsevier and Springer (though I should be cutting Springer a little slack as they’re not on the AAP list – the one on the list is a different Springer) could fund the publication of every paper in the world in PLoS ONE. Remember that the cost of putting a SAGE article on reserve for a decent sized class or of putting a Taylor and Francis monograph on reserve for a more modest sized one at one university is more than it would cost to publish them in most BioMedCentral journals and make them available to all.

Ultimately this legislation is irrelevant – the artificial level of current costs of publication and the myriad of additional charges that publishers make for this, that, and the other (Colour charges? Seriously?) will ultimately be destroyed. The current inefficiencies and inflated markups cannot be sustained. The best legislation can do is protect them for a little longer, at the cost of damaging the competitiveness of the U.S. as a major player in global research. With PLoS ONE rapidly becoming a significant proportion of the world’s literature on its own and Nature and Science soon to be facing serious competition at the top end from an OA journal backed by three of the most prestigious funders in the world, we are moving rapidly towards a world where publishing in a subscription journal will be foolhardy at best and suicidal for researchers in many fields. This act is ultimately a pathetic rearguard action and a sign of abject failure.

But for me it is also a sign that the rhetoric of being supportive of a gradual managed change to our existing systems, a plausible argument for such organisations to make, is dead for those signed up to the AAP. Publishers have a choice – lobby and legislate to preserve the inefficient, costly, and largely ineffective status quo – or play a positive part in developing the future.

I don’t expect much; to be honest I expect deafening silence as most publishers continue to hope that most researchers will be too buried in their work to notice what is going on around them. But I will continue to hope that some members of that list, the organisations that really believe that their core mission is to support the most effective research communication – not that those are just a bunch of pretty words that get pulled out from time to time – will disavow the AAP position and commit to a positive and open discussion about how we can take the best from the current system and make it work with the best we can with the technology available. A positive discussion about managed change that enables us to get where we want to go and helps to make sure that we reap the benefits when we get there.

This bill is self-defeating as legislation but as a political act it may be effective in the short term. It could hold back the tide for a while. But publishers that support it will ultimately get wiped out as the world moves on and they spend so time pushing back the tide that they miss the opportunity to catch up. Publishers who move against the bill have a role to play in the future and are the ones with enough insight to see the way the world is moving. And those publishers who sit on the sidelines? They don’t have the institutional capability to take the strategic decisions required to survive. Choose.

Update: An interesting parallel post from John Dupuis and a trenchant expose (we expect nothing less) from Michael Eisen. Jon Eisen calls for people at the institutions and organisations with links to AAP to get on the phone and ask for them to resign from AAP. Lots of links appearing at this Google+ post from Peter Suber.

Enhanced by Zemanta
The List of AAP Members from http://www.publishers.org/members/psp/

An Open Letter to David Willetts: A bold step towards opening British research

English: Open Access logo, converted into svg,...
Image via Wikipedia

On the 8th December David Willetts, the Minister of State for Universities and Science, and announced new UK government strategies to develop innovation and research to support growth. The whole document is available online and you can see more analysis at the links at the bottom of the post.  A key aspect for Open Access advocates was the section that discussed a wholesale move by the UK to an author pays system to freely accessible research literature with SCOAP3 raised as a possible model. The report refers not to Open Access, but to freely accessible content. I think this is missing a massive opportunity for Britain to take a serious lead in defining the future direction of scholarly communication. That’s the case I attempt to lay out in this open letter. This post should be read in the context of my usual disclaimer.

Minister of State for Universities and Science

Department of Business Innovation and Skills

Dear Mr Willetts,

I am writing in the first instance to congratulate you on your stance on developing routes to a freely accessible research outputs. I cannot say I am a great fan of many current government positions and I might have wished for greater protection of the UK science budget but in times of resource constraint for research I believe your focus on ensuring the efficiency of access to and exploitation of research outputs in its widest sense is the right one.

The position you have articulated offers a real opportunity for the UK to take a lead in this area. But along with the opportunities there are risks, and those risks could entrench existing inefficiencies of our scholarly communication system. They could also reduce the value for money that the public purse, and it will be the public purse one way or another, gets for its investment. In our current circumstances this would be unfortunate. I would therefore ask you to consider the following as the implementation pathway for this policy is developed.

Firstly, the research community will be buying a service. This is a significant change from the current system where the community buys a product, the published journal. The purchasing exercise should be seen in this light and best practice in service procurement applied.

Secondly the nature of this service must be made clear. The service that is being provided must provide for any and all downstream uses, including commercial use, text mining, indeed any use that might developed at some point in the future. We are paying for this service and we must dictate its terms. Incumbent publishers will say in response that they need to retain commercial rights, or text mining rights, to ensure their viability, as indeed they have done in response to the Hargreaves Review.

This, not to put to fine a point on it, is hogwash. PLoS and BioMedCentral, both operate financially viable operations in which no downstream rights beyond that of appropriate attribution are retained by the publishers and where the author charges are lower in price then many of the notionally equivalent, but actually far more limited, offerings of more traditional publishers. High quality scholarly communication can be supported by reasonable author charges without any need for publishers to retain rights beyond those protected by their trademarks. An effective market place could therefore be expected to bring the average costs of this form of scholarly communications down.

The reason for supporting a system that demands that any downstream use of the communication be enabled is that we need innovation and development within the publishing systems well as innovation and development as a result of its content. Our scholarship is currently being held back by a morass of retained rights that prevent the development of research projects, of new technology startups and potentially new industries. The government consultation document of 14 December on the Hargreaves report explicitly notes that enabling downstream uses of content, and scholarly content in particular, can support new economic activity. It can also support new scholarly activity. The exploitation of our research outputs requires new approaches to indexing, mining, and parsing the literature. The shame of our current system is that much of this is possible today. The technology exists but is prevented from being exploited at scale by the logistical impossibility of clearing the required rights. These new approaches will require money and it is entirely appropriate, indeed desirable, that some of this work therefore occurs in the private sector. Experimentation will require both freedom to act as well as freedom to develop new business models. Our content and its accessibility and its reusability must support this.

Finally I ask you to look beyond the traditional scholarly publishing industry to the range of experimentation that is occurring globally in academic spaces, non-profits, and commercial endeavours. The potential leaps in functionality as well as the potential cost reductions are enormous. We need to work to encourage this experimentation and develop a diverse and vibrant market which both provides the quality assurance and stability that we are used to while encouraging technical experimentation and the improvement of business models. What we don’t need is a five or ten year deal that cements in existing players, systems, and practices.

Your government’s philosophy is based around the effectiveness of markets. The recent history of major government procurement exercises is not a glorious one. This is one we should work to get right. We should take our time to do so and ensure a deal that delivers on its promise. The vision of a Britain that is lead by innovation and development supported by a vibrant and globally leading research community is, I believe, the right one. Please ensure that this innovation isn’t cut off at the knees by agreeing terms that prevent our research communication tools being re-used to improve the effectiveness of that communication. And please ensure that the process of procuring these services is one that supports innovation and development in scholarly communications itself.

Yours truly,

Cameron Neylon

 

 

 

Enhanced by Zemanta

Designing for the phase change: Local communities and shared infrastructure

Pink crystal.
Image via Wikipedia

Michael Nielsen‘s talk at Science Online was a real eye opener for many of us who have been advocating for change in research practice. He framed the whole challenge of change as an example of a well known problem, that of collective action. How do societies manage big changes when those changes often represent a disadvantage to many individuals, at least in the short term. We can all see the global advantages of change but individually acting on them doesn’t make sense.

Michael placed this in the context of other changes, that of countries changing which side of the road they drive on, or the development of trade unions, that have been studied in some depth by political economists and similar academic disciplines. The message of these studies is that change usually occurs in two phases. First local communities adopt practice (or at least adopt a view that they want things changed in the case of which side of the road they drive on) and then these communities discover each other and “agglomerate”, or in the language of physical chemistry there are points of nucleation which grow to some critical level and then the whole system undergoes a phase change, crystallising into a new form.

These two phases are driven by different sets of motivations and incentives. At a small scale processes are community driven, people know each other, and those interactions can drive and support local actions, expectations, and peer pressure. At a large scale the incentives need to be different and global. Often top down policy changes (as in the side of the road) play a significant role here, but equally market effects and competition can also fall into place in a way that drives adoption of new tools or changes in behaviour. Think about the way new research techniques get adopted: first they are used by small communities, single labs, with perhaps a slow rate of spread to other groups. For a long time it’s hard for the new approach to get traction, but suddenly at some point either enough people are using it that its just the way things are done, or conversely those who are using it are moving head so fast that everyone else has to pile in just to keep up. It took nearly a decade for PCR for instance to gain widespread acceptance as a technique in molecular biology but when it did it went from being something people were a little unsure of to being the only way to get things done very rapidly.

So what does this tell us about advocating for, or designing for, change. Michael’s main point was that narrow scope is a feature, not a bug, when you are in that first phase. Working with small scale use cases, within communities is the way to get started. Build for those communities and they will become your best advocates, but don’t try to push the rate of growth, let it happen at the right rate (whatever that might be – and I don’t really know how to tell to be honest). But we also need to build in the grounding for the second phase.

The way these changes generally occur is through an accidental process of accretion and agglomeration. The phase change crystallises out around those pockets of new practice. But, to stretch the physical chemistry analogy, doesn’t necessarily crystallise in the form one would design for. But we have an advantage, if we design in advance to enable that crystallisation then we can prepare communities and prepare tooling for when it happens and we can design in the features that will get use closer to the optimum we are looking for.

What does this mean in practice? It means that when we develop tools and approaches it is more important for our community to have standards than it is for there to be an effort on any particular tool or approach. The language we use, that will be adopted by communities we are working with, should be consistent, so that when those communities meet they can communicate. The technical infrastructure we use should be shared, and we need interoperable standards to ensure that those connections can be made. Again, interchange and interoperability are more important than any single effort, any single project.

If we really believe in the value of change then we need to get these things together before we push them too hard into the diverse set of research communities where we want them to take root. We really need to get interoperability, standards, and language sorted out before the hammer of policy comes down and forces us into some sort of local minimum. In fact, it sounds rather like we have a collective action problem of our own. So what are we going to do about that?

Enhanced by Zemanta

Submission to the Royal Society Enquiry

Title page of Philosophical Transactions of th...
Image via Wikipedia

The Royal Society is running a public consultation exercise on Science as a Public Enterprise. Submissions are requested to answer a set of questions. Here are my answers.

1. What ethical and legal principles should govern access to research results and data? How can ethics and law assist in simultaneously protecting and promoting both public and private interests?

There are broadly two principles that govern the ethics of access to research results and data. Firstly there is the simple position that publicly funded research should by default be accessible to the public (with certain limited exceptions, see below). Secondly claims that impinge on public policy, health, safety, or the environment, that are based on research should be supported by public access to the data. See more detail in answer to Q2.

2 a) How should principles apply to publicly-funded research conducted in the public interest?

By default research outputs from publicly funded research should be made publicly accessible and re-usable in as timely a manner as possible. In an ideal world the default would be immediate release, however this is not a practically accessible goal in the near future. Cultural barriers and community inertia prevent the exploitation of technological tools that demonstrably have the potential enable research to move faster and more effectively. Research communication mechanisms are currently shackled to the requirements of the research community to monitor career progression and not optimised for effective communication.

In the near term it is practical to move towards an expectation that research outputs that support published research should be accessible and re-usable. Reasonable exceptions to this include data that is personally identifiable, that may place cultural or environmental heritage at risk, that places researchers at risk, or that might affect the integrity of ongoing data collection. The key point is that while there are reasonable exceptions to the principle of public access to public research outputs that these are exceptions and not the general rule.

What is not reasonable is to withhold or limit the re-use of data, materials, or other research outputs from public research for the purpose of personal advancement, including the “squeezing out of a few more papers”. If these outputs can be more effectively exploited elsewhere then this a more efficient use of public resources to further our public research agenda. The community has placed the importance of our own career advancement ahead of the public interest in achieving outcomes from public research for far too long.

What is also politically naive is to believe or even to create the perception that it is acceptable to withhold data on the basis that “the public won’t understand” or “it might be misused”. The web has radically changed the economics of information transfer but it has perhaps more importantly changed the public perception on access to data. The wider community is rightly suspicious of any situation where public information is withheld. This applies equally to publicly funded research as it does to government data.

2 b) How should principles apply to privately-funded research involving data collected about or from individuals and/or organisations (e.g. clinical trials)?

Increasingly public advocacy groups are becoming involved in contributing to a range of research activities including patient advocacy groups supporting clinical trials, environmental advocacy groups supporting data collection, as well as a wider public involvement in, for instance, citizen science projects.

In the case where individuals or organisations are contributing to research they have a right for that contribution to be recognised and a right to participate on their own terms (or to choose not to participate where those terms are unacceptable).

Organised groups (particularly patient groups) are of growing importance to a range of research. Researchers should expect to negotiate with such groups as to the ultimate publication of data. Such groups should have the ability to demand greater public release and to waive rights to privacy. Equally contributors have a right to expect a default right to privacy where personally identifiable information is involved.

Privacy trumps the expectation of data release and the question of what is personally identifiable information is a vexed question which as a society we are working through. Researchers will need to explore these issues with participants and to work to ensure that data generated can be anonymised in a way that enables the released data to effectively support the claims made from it. This is a challenging area which requires significant further technical, policy, and ethics work.

2 c) How should principles apply to research that is entirely privately-funded but with possible public implications?

It is clear that public funded research is a public good. By contrast privately funded research is properly a private good and the decision to release or not release research outputs lies with the funder.

It is worth noting that much of the privately funded research in UK universities is significantly subsidised through the provision of public infrastructure and this should be taken into consideration when defining publicly and privately funded research. Here I consider research that is 100% privately funded.

Where claims are made on the basis of privately funded research (e.g. of environmental impact or the efficacy of health treatments) then such claims SHOULD be fully supported by provision of the underlying evidence and data if they are to be credible. Where such claims are intended to influence public policy such evidence and data MUST be made available. That is, evidence based public policy must be supported by the publication of the full evidence regardless of the source of that evidence. Claims made to influence public policy that are not supported by provision of evidence must be discounted for the purposes of making public policy.

2 d) How should principles apply to research or communication of data that involves the promotion of the public interest but which might have implications from the privacy interests of citizens?

See above: the right to privacy trumps any requirement to release raw data. Nonetheless research should be structured and appropriate consent obtained to ensure that claims made on the basis of the research can be supported by an adequate, publicly accessible, evidence base.

3. What activities are currently under way that could improve the sharing and communication of scientific information?

A wide variety of technical initiatives are underway to enable the wider collection, capture, archival and distribution of research outputs including narrative, data, materials, and other elements of the research process. It is technically possible for us today to immediately publish the entire research record if we so choose. Such an extreme approach is resource intensive, challenging, and probably not ultimately a sensible use of resources. However it is clear that more complete and rapid sharing has the potential to increase the effectiveness and efficiency of research.

The challenges in exploiting these opportunities are fundamentally cultural. The research community is focussed almost entirely on assessment through the extremely narrow lens of publication of extended narratives in high profile peer reviewed journals. This cultural bias must be at least partially reversed before we can realise the opportunities that technology affords us. This involves advocacy work, policy development, the addressing of incentives for researchers and above all the slow and arduous process of returning the research culture to one which takes responsibility for the return on the public investment, including economic, health, social, education, and research returns and one that takes responsibility for effective communication of research outputs.

4. How do/should new media, including the blogosphere, change how scientists conduct and communicate their research?

New media (not really new any more and increasingly part of the mainstream) democratise access to communications and increase the pace of communication. This is not entirely a good thing and en masse the quality of the discourse is not always high. High quality depends on the good will, expertise, and experience of those taking part.There is a vast quantity of high quality, rapid response discourse that occurs around research on the web today even if it occurs in many places. The most effective means of determining whether a recent high profile communication stands up to criticism is to turn to discussion on blogs and news sites, not to wait months for a possible technical criticism to appear in a journal. In many ways this is nothing new, it is return to the traditional approaches of communication seen at the birth of the Royal Society itself of direct and immediate communication between researchers by the most efficient means possible; letters in the 17C and the web today.

Alongside the potential for more effective communication of researchers with each other there is also an enormous potential for more effective engagement with the wider community, not merely through “news and views” pieces but through active conversation, and indeed active contributions from outside the academy. A group of computer consultants are working to contribute their expertise in software development to improving legacy climate science software. This is a real contribution to the research effort. Equally the right question at the right time may come from an unexpected source but lead to new insights. We need to be open to this.

At the same time there is a technical deficiency in the current web and that is the management of the sheer quantity of potential connections that can be made. Our most valuable resource in research is expert attention. This attention may come from inside or outside the academy but it is a resource that needs to be efficiently directed to where it can have the most impact. This will include the necessary development of mechanisms that assist in choosing which potential contacts and information to follow up. These are currently in their infancy. Their development is in any case a necessity to deal with the explosion of traditional information sources.

5. What additional challenges are there in making data usable by scientists in the same field, scientists in other fields, ‘citizen scientists’ and the general public?

Effective sharing of data and indeed most research outputs remains a significant challenge. The problem is two-fold, first of ensuring sufficient contextual information that an expert can understand the potential uses of the research output. Secondly the placing of that contextual information in a narrative that is understandable to the widest possible range of users. These are both significant challenges that are being tackled by a large number of skilled people. Progress is being made but a great deal of work remains in developing the tools, techniques, and processes that will enable the cost effective sharing of research outputs.

A key point however is that in a world where publication is extremely cheap then simply releasing whatever outputs exist in their current form can still have a positive effect. Firstly where the cost of release is effectively zero even if there is only a small chance of those data being discovered and re-used this will still lead to positive outcomes in aggregate. Secondly the presence of this underexploited resource of released, but insufficiently marked up and contextualised, data will drive the development of real systems that will make them more useful.

6 a) What might be the benefits of more widespread sharing of data for the productivity and efficiency of scientific research?

Fundamentally more efficient, more effective, and more publicly engaging research. Less repetition and needless rediscovery of negative results and ideally more effective replication and critiquing of positive results are enabled by more widespread data sharing. As noted above another important outcome is that even suboptimal sharing will help to drive the development of tools that will help to optimise the effective release of data.

6 b) What might be the benefits of more widespread sharing of data for new sorts of science?

The widespread sharing of data has historically always lead to entirely new forms of science. The modern science of crystallography is based largely on the availability of crystal structures, bioinformatics would simply not exist without genbank, the PDB, and other biological databases and the astronomy of today would be unrecognizable to someone whose career ended prior to the availability of the Sloan Digital Sky Survey. Citizen science projects of the type of Galaxy Zoo, Fold-IT and many others are inconceivable without the data to support them. Extrapolating from this evidence provides an exciting view of the possibilities. Indeed one which it would be negligent not to exploit.

6 c) What might be the benefits of more widespread sharing of data for public policy?

Policy making that is supported by more effective evidence is something that appeals to most scientists. Of course public policy making is never that simple. Nonetheless it is hard to see how a more effective and comprehensive evidence base could fail to support better evidence based policy making. Indeed it is to be hoped that a wide evidence base, and the contradictions it will necessarily contain, could lead to a more sophisticated understanding of the scope and critique of evidence sources.

6 d) What might be the benefits of more widespread sharing of data for other social benefits?

The potential for wider public involvement in science is a major potential benefit. As in e) above a deeper understanding of how to treat and parse evidence and data throughout society can only be positive.

6 e) What might be the benefits of more widespread sharing of data for innovation and economic growth?

Every study of the release of government data has shown that it leads to a nett economic benefit. This is true even when such data has traditionally been charged for. The national economy benefits to a much greater extent than any potential loss of revenue. While this is not necessarily sufficient incentive for private investors to release data in this case of public investment the object is to maximise national ROI. Therefore release in a fully open form is the rational economic approach.

The costs of lack of acces to publicly funded research outputs by SMEs is well established. Improved access will remove the barriers that currently stifle innovation and economic growth.

6 f) What might be the benefits of more widespread sharing of data for public trust in the processes of science?

There is both a negative and a positive side to this question. On the positive greater transparency, more potential for direct involvement, and a greater understanding of the process by which research proceeds will lead to greater public confidence. On the negative, doing nothing is simply not an option. Recent events have shown not so much that the public has lost confidence in science and scientists but that there is deep shock at the lack of transparency and the lack of availability of data.

If the research community does not wish to be perceived in the same way as MPs and other recent targets of public derision then we need to move rapidly to improve the degree of transparency and accessibility of the outputs of public research.

7. How should concerns about privacy, security and intellectual property be balanced against the proposed benefits of openness?

There is little evidence that the protection of IP supports a nett increase on the return on the public investment in research. While there may be cases where it is locally optimal to pursue IP protection to exploit research outputs and maximise ROI this is not generally the case. The presumption that everything should be patented is both draining resources and stifling British research. There should always be an avenue for taking this route to exploitation but there should be a presumption of open communication of research outputs and the need for IP protection should be justified on a case by case basis. It should be unacceptable for the pursuit of IP protection to damage the communication and downstream exploitation of research.

Privacy issues and concerns around the personal security of researchers have been discussed above. National security issues will in many cases fall under a justifiable exception to the presumption of openness although it is clear that this needs care and probably oversight to retain public confidence.

8. What should be expected and/or required of scientists (in companies, universities or elsewhere), research funders, regulators, scientific publishers, research institutions, international organisations and other bodies?

British research could benefit from a statement of values, something that has the cultural significance of the Haldane principle (although perhaps better understood) or the Hippocratic oath. A shared cultural statement that captures a commitment to efficiently discharging the public trust invested in us, to open processes as a default, and to specific approaches where appropriate would act as a strong centre around which policy and tools could be developed. Leadership is crucial here in setting values and embedding these within our culture. Organisations such as the Royal Society have an important role to play.

Researchers and the research community need to take these responsibilities on ourselves in a serious and considered manner. Funders and regulators need to provide a policy framework, and where appropriate community sanctions for transgression of important principles. Research institutions are for the most part tied into current incentive systems that are tightly coupled to funding arrangements and have limited freedom of movement. Nonetheless a serious consideration of the ROI of technology transfer arrangements and of how non-traditional outputs, including data, contribute to the work of the institution and its standing are required. In the current economic climate successful institutions will diversify in their approach. Those that do not are unlikely to survive in their current form.

Other comments

This is not the first time that the research community has faced this issue. Indeed it is not even the first time the Royal Society has played a central role. Several hundred years ago it was a challenge to persuade researchers to share information at all. Results were hidden. Sharing was partial, only within tight circles, and usually limited in scope. The precursors of the Royal Society played a key role in persuading the community that effective sharing of their research outputs would improve research. Many of the same concerns were raised; concerns about the misuse of those outputs, concerns about others stealing ideas, concerns about personal prestige and the embarrassment potential of getting things wrong.

The development of journals and the development of a values system that demanded that results be made public took time, it took leadership, and with the technology of the day the best possible system was developed over an extended period. With a new technology now available we face the same issues and challenges. It is to be hoped that we tackle those challenges and opportunities with the same sense of purpose.

Enhanced by Zemanta

(S)low impact research and the importance of open in maximising re-use

Open
Image by tribalicious via Flickr

This is an edited version of the text that I spoke from at the Altmetrics Workshop in Koblenz in June. There is also an audio recording of the talk I gave available as well as the submitted abstract for the workshop.

I developed an interest in research evaluation as an advocate of open research process. It is clear that researchers are not going to change themselves so someone is going to have to change them and it is funders who wield the biggest stick. The only question, I thought,  was how to persuade them to use it

Of course it’s not that simple. It turns out that funders are highly constrained as well. They can lead from the front but not too far out in front if they want to retain the confidence of their community. And the actual decision making processes remain dominated by senior researchers. Successful senior researchers with little interest in rocking the boat too much.

The thing you realize as you dig deeper into this as that the key lies in finding motivations that work across the interests of different stakeholders. The challenge lies in finding the shared objectives. What it is that unites both researchers and funders, as well as government and the wider community. So what can we find that is shared?

I’d like to suggest that one answer to that is Impact. The research community as a whole has stake in convincing government that research funding is well invested. Government also has a stake in understanding how to maximize the return on its investment. Researchers do want to make a difference, even if that difference is a long way off. You need a scattergun approach to get the big results, but that means supporting a diverse range of research in the knowledge that some of it will go nowhere but some of it will pay off.

Impact has a bad name but if we step aside from the gut reactions and look at what we actually want out of research then we start to see a need to raise some challenging questions. What is research for?  What is its role in our society really? What outcomes would we like to see from it, and over what timeframes? What would we want to evaluate those outcomes against? Economic impact yes, as well as social, health, policy, and environmental impact. This is called the ‘triple bottom line’ in Australia. But alongside these there is also research impact.

All these have something in common. Re-use. What we mean by impact is re-use. Re-use in industry, re-use in public health and education, re-use in policy development and enactment, and re-use in research.

And this frame brings some interesting possibilities. We can measure some types of re-use. Citation, retweets, re-use of data or materials, or methods or software. We can think about gathering evidence of other types of re-use, and of improving the systems that acknowledge re-use. If we can expand the culture of citation and linking to new objects and new forms of re-use, particularly for objects on the web, where there is some good low hanging fruit, then we can gather a much stronger and more comprehensive evidence base to support all sorts of decision making.

There are also problems and challenges. The same ones that any social metrics bring. Concentration and community effects, the Matthew effect of the rich getting richer. We need to understand these feedback effects much better and I am very glad there are significant projects addressing this.

But there is also something more compelling for me in this view. It let’s us reframe the debate around basic research. The argument goes we need basic research to support future breakthroughs. We know neither what we will need nor where it will come from. But we know that its very hard to predict – that’s why we support curiosity driven research as an important part of the portfolio of projects. Yet the dissemination of this investment in the future is amongst the weakest in our research portfolio. At best a few papers are released then hidden in journals that most of the world has no access to and in many cases without the data, or other products either being indexed or even made available. And this lack of effective dissemination is often because the work is perceived as low, or perhaps better, slow impact.

We may not be able to demonstrate or to measure significant re-use of the outputs of this research for many years. But what we can do is focus on optimizing the capacity, the potential, for future exploitation. Where we can’t demonstrate re-use and impact we should demand that researchers demonstrate that they have optimized their outputs to enable future re-use and impact.

And this brings me full circle. My belief is that the way to ensure the best opportunities for downstream re-use, over all timeframes, is that the research outputs are open, in the Budapest Declaration sense. But we don’t have to take my word for it, we can gather evidence. Making everything naively open will not always be the best answer, but we need to understand where that is and how best to deal with it. We need to gather evidence of re-use over time to understand how to optimize our outputs to maximize their impact.

But if we choose to value re-use, to value the downstream impact that our research or have, or could have, then we can make this debate not about politics or ideology but how about how best to take the public investment in research and to invest it for the outcomes that we need as a society.

 

 

 

 

Enhanced by Zemanta

Wears the passion? Yes it does rather…

Leukemia cells.
Image via Wikipedia

Quite some months ago an article in Cancer Therapy and Biology by Scott Kern of Johns Hopkins kicked up an almighty online stink. The article entitled “Where’s the passion” bemoaned the lack of hard core dedication amongst the younger researchers that the author saw around him…starting with:

It is Sunday afternoon on a sunny, spring day.

I’m walking the halls—all of them—in a modern $59 million building dedicated to cancer research. A half hour ago, I completed a stroll through another, identical building. You see, I’m doing a survey. And the two buildings are largely empty.

The point being that if they really cared, those young researchers would be there day in-day out working their hearts out to get to the key finding. At one level this is risible, expecting everyone to work 24×7 is not a good or efficient way to get results. Furthermore you have to wonder why these younger researchers have “lost their passion”. Why doesn’t the environment create that naturally, what messages are the tenured staff sending through their actions. But I’d be being dishonest if there wasn’t a twinge of sympathy for me as well. Anyone who’s run a group has had that thought that the back of their mind; “if only they’d work harder/smarter/longer we’d be that much further ahead…”.

But all of that has been covered by others. What jumped out of the piece for me at the time were some other passages, ones that really got me angry.

When the mothers of the Mothers March collected dimes, they KNEW that teams, at that minute, were performing difficult, even dangerous, research in the supported labs. Modern cancer advocates walk for a cure down the city streets on Saturday mornings across the land. They can comfortably know that, uh…let’s see here…, some of their donations might receive similar passion. Anyway, the effort should be up to full force by 10 a.m. or so the following Monday.

[…]

During the survey period, off-site laypersons offer comments on my observations. “Don’t the people with families have a right to a career in cancer research also?” I choose not to answer. How would I? Do the patients have a duty to provide this “right”, perhaps by entering suspended animation?

Now these are all worthy statements. We’d all like to see faster development of cures and I’ve no doubt that the people out there pounding the streets are driven to do all they can to see those cures advance. But is the real problem here whether the postdocs are here on a Sunday afternoon or are there things we could do to advance this? Maybe there are other parts of the research enterprise that could be made more efficient…like I don’t know making the results of research widely available and ensuring that others are in the best position possible to build on their results?

It would be easy to pick on Kern’s record on publishing open access papers. Has he made all the efforts that would enable patients and doctors to make the best decisions they can on the basis of his research? His lab generates cell lines that can support further research. Are those freely available for others to use and build on? But to pick on Kern personally is to completely miss the point.

No, the problem is that this is systemic. Researchers across the board seem to have no interest whatsoever in looking closely at how we might deliver outcomes faster. No-one is prepared to think about how the system could be improved so as to deliver more because everyone is too focussed on climbing up the greasy pole; writing the next big paper and landing the next big grant. What is worse is that it is precisely in those areas where there is most public effort to raise money, where there is a desperate need, that attitudes towards making research outputs available are at their worse.

What made me absolutely incandescent about this piece was a small piece of data that some of use have known about for a while but has only just been published. Heather Piwowar, who has done a lot of work on how and where people share, took a close look at the sharing of microarray data. What kind of things are correlated with data sharing. The paper bears close reading (Full disclosure: I was the academic editor for PLoS ONE on this paper) but one thing has stood out from me as shocking since the first time I heard Heather discuss it: microarray data linked to studies of cancer is systematically less shared.

This is not an isolated case. Across the board there are serious questions to be asked about why it seems so difficult to get the data from studies that relate to cancer. I don’t want to speculate on the reasons because whatever they are, they are unnacceptable. I know I’ve recommended this video of Josh Sommer speaking many times before, but watch it again. Then read Heather’s paper. And then decide what you think we need to do about it. Because this can not go on.

Enhanced by Zemanta

How to waste public money in one easy step…

Oleic Acid
Image via Wikipedia

Peter Murray-Rust has sparked off another round in the discussion of the value that publishers bring to the scholarly communication game and told a particular story of woe and pain inflicted by the incumbent publishers. On the day he posted that I had my own experience of just how inefficient and ineffective our communication systems are by wasting the better part of the day trying to find some information. I thought it might be fun to encourage people to post their own stories of problems and frustrations with access to the literature and the downstream issues that creates, so here is mine.

I am by no means a skilled organic chemist but I’ve done a bit of synthesis in my time and I certainly know enough to be able to read synthetic chemistry papers and decide whether a particular synthesis is accessible. So on this particular day I was interested in deciding whether it was easy or difficult to make deuterated mono-olein. This molecule can be made by connecting glycerol to oleic acid. Glycerol is cheap and I should have in my hands some deuterated oleic acid in the next month or so. The chemistry for connecting acids to alcohols is straightforward, I’ve even done it myself, but this is a slightly special case. Firstly the standard methods tend to be wasteful of the acid, which in my case is the expensive bit. The second issue is that glycerol has three alcohol groups. I only want to modify one, leaving the other two unchanged, so it is important to find a method that gives me mostly what I want and only a little of what I don’t.

So the question for me is: is there a high yielding reaction that will give me mostly what I want, while wasting as little as possible of the oleic acid? And if there is a good technique is it accessible given the equipment I have in the lab? Simple question, quick trip to Google Scholar, to find reams of likely looking papers, not one of which I had full text access to. The abstracts are nearly useless in this case because I need to know details of yields and methodology so I had several hundred papers, and no means of figuring out which might be worth an inter-library loan. I spent hours trying to parse the abstracts to figure out which were the most promising and in the end I broke…I asked someone to email me a couple of pdfs because I knew they had access. Bear in mind what I wanted to do was spend a quick 30 minutes or so to decide whether this was pursuing in detail. What is took was about three hours, which at full economic cost of my time comes to about £250. That’s about £200 of UK taxpayers money down the toilet because, on the site of the UKs premiere physical and biological research facilities I don’t have access to those papers. Yes I could have asked someone else to look but that would have taken up their time.

But you know what’s really infuriating. I shouldn’t even have been looking at the papers at all when I’m doing my initial search. What I should have been able to do was ask the question:

Show me all syntheses of mono-olein ranked first by purity of the product and secondly by the yield with respect to oleic acid.

There should be a database where I can get this information. In fact there is. But we can’t afford access to the ACS’ information services here. These are incredibly expensive because it used to be necessary for this information to be culled from papers by hand. But today that’s not necessary. It could be done cheaply and rapidly. In fact I’ve seen it done cheaply and rapidly by tools developed in Peter’s group that get around ~95% accuracy and ~80% recall over synthetic organic chemistry. Those are hit rates that would have solved my problem easily and effectively.

Unfortunately despite the fact those tools exist, despite the fact that they could be deployed easily and cheaply, and that they could save researchers vast amounts of time research is being held back by a lack of access to the literature, and where there is access by contracts that prevent us collating, aggregating, and analysing our own work. The public pays for the research to be done, the public pays for researchers to be able to read it, and in most cases the public has to pay again if they should want to read it. But what is most infuriating is the way the public pays yet again when I and a million other scientists waste our time, the public’s time, because the tools that exist and work cannot be deployed.

How many researchers in the UK or world wide are losing hours or even days every week because of these inefficiencies. How many new tools or techniques are never developed because they can’t legally be deployed? And how many hundreds of millions of dollars of public money does that add up to?

Enhanced by Zemanta

A new sustainability model: Major funders to support OA journal

Open Access logo and text
Image via Wikipedia

”The Howard Hughes Medical Institute, the Max Planck Society and the Wellcome Trust announced today that they are to support a new, top-tier, open access journal for biomedical and life sciences research. The three organisations aim to establish a new journal that will attract and define the very best research publications from across these fields. All research published in the journal will make highly significant contributions that will extend the boundaries of scientific knowledge.” [Press Release]

It has been clear for some time that the slowness of the adoption of open access publication models by researchers is in large part down to terror that we have of stepping out of line and publishing in the ‘wrong’ journals. More radical approaches to publication will clearly lag even further behind while this inherent conservatism is dominant. Publishers like PLoS and BMC have tackled this head on by aiming to create prestigous journals but the top of the pile has remained the traditional clutch of Nature, Science, and Cell.

The incumbent publishers have simultaneously been able to sit back due to a lack of apparent demand from researchers. As the demand from funders has increased they have held back, complaining the business models to support Open Access publication are not clear. I’ve always found the ‘business model’ argument slightly specious. Sustainability is important but scholarly publishing has never really had a viable business model, it has had a subsidy from funders. Part of the problem has been the multiple layers and channels that subsidy has gone through but essentially funders, through indirect funding of academic libraries, have been footing the bill.

Some funders, and the Wellcome Trust has lead on this, have demanded that their researchers make their outputs accessible while simultaneously requiring publishers comply with their requirements on access and re-use rights. But progress has been slow, particularly in opening up what is perceived as the top of the market. Despite major inroads made by PLoS Biology and PLos Medicine those journals perceived as the most prestigious have remain resolutely closed.

Government funders are mostly constrained in their freedom to act but the Wellcome Trust, HHMI, and Max Planck Society have the independence to take the logical step. They are already paying for publication, why not actively support the formation of a new journal, properly open access, and at the same time lend the prestige that their names can bring?

This will send a very strong message, both to researchers and publishers, about what these funders value, and where they see value for money. It is difficult to imagine this will not lead to a seismic shift in the publishing landscape, at least from a political and financial perspective. I don’t believe this journal will be as technically radical as I would like, but it is unlikely it could be while achieving the aims that it has. I do hope the platform it is built on enables innovation both in terms of what is published and the process by which it is selected.

But in a sense that doesn’t matter. This venture can remain incredibly conservative and still have a huge impact on taking the research communication space forward. What it means is that three of the worlds key funders have made an unequivocal statement that they want to see Open Access, full open access on publication without restrictions on commercial use, or text-mining, or re-use in any form, across the whole of the publication spectrum. And if they don’t get it from the incumbent publishers they’re prepared to make it happen themselves.

Full Disclosure: I was present at a meeting at Janelia Farm in 2010 where the proposal to form a journal was discussed by members of the Wellcome, HHMI, and MPG communities.

Enhanced by Zemanta

Evidence to the European Commission Hearing on Access to Scientific Information

European Commission
Image by tiseb via Flickr

On Monday 30 May I gave evidence at a European Commission hearing on Access to Scientific Information. This is the text that I spoke from. Just to re-inforce my usual disclaimer I was not speaking on behalf of my employer but as an independent researcher.

We live in a world where there is more information available at the tips of our fingers than even existed 10 or 20 years ago. Much of what we use to evaluate research today was built in a world where the underlying data was difficult and expensive to collect. Companies were built, massive data sets collected and curated and our whole edifice of reputation building and assessment grew up based on what was available. As the systems became more sophisticated new measures became incorporated but the fundamental basis of our systems weren’t questioned. Somewhere along the line we forgot that we had never actually been measuring what mattered, just what we could.

Today we can track, measure, and aggregate much more, and much more detailed information. It’s not just that we can ask how much a dataset is being downloaded but that we can ask who is downloading it, academics or school children, and more, we can ask who was the person who wrote the blog post or posted it to Facebook that led to that spike in downloads.

This is technically feasible today. And make no mistake it will happen. And this provides enormous potential benefits. But in my view it should also give us pause. It gives us a real opportunity to ask why it is that we are measuring these things. The richness of the answers available to us means we should spend some time working out what the right questions are.

There are many reasons for evaluating research and researchers. I want to touch on just three. The first is researchers evaluating themselves against their peers. While this is informed by data it will always be highly subjective and vary discipline by discipline. It is worthy of study but not I think something that is subject to policy interventions.

The second area is in attempting to make objective decisions about the distribution of research resources. This is clearly a contentious issue. Formulaic approaches can be made more transparent and less easy to legal attack but are relatively easy to game. A deeper challenge is that by their nature all metrics are backwards looking. They can only report on things that have happened. Indicators are generally lagging (true of most of the measures in wide current use) but what we need are leading indicators. It is likely that human opinion will continue to beat naive metrics in this area for some time.

Finally there is the question of using evidence to design the optimal architecture for the whole research enterprise. Evidence based policy making in research policy has historically been sadly lacking. We have an opportunity to change that through building a strong, transparent, and useful evidence base but only if we simultaneously work to understand the social context of that evidence. How does collecting information change researcher behavior? How are these measures gamed? What outcomes are important? How does all of this differ cross national and disciplinary boundaries, or amongst age groups?

It is my belief, shared with many that will speak today, that open approaches will lead to faster, more efficient, and more cost effective research. Other groups and organizations have concerns around business models, quality assurance, and sustainability of these newer approaches. We don’t need to argue about this in a vacuum. We can collect evidence, debate what the most important measures are, and come to an informed and nuanced inclusion based on real data and real understanding.

To do this we need to take action in a number areas:

1. We need data on evaluation and we need to able to share it.

Research organizations must be encouraged to maintain records of the downstream usage of their published artifacts. Where there is a mandate for data availability this should include mandated public access to data on usage.

The commission and national funders should clearly articulate that that provision of usage data is a key service for publishers of articles, data, and software to provide, and that where a direct payment is made for publication provision for such data should be included. Such data must be technically and legally reusable.

The commission and national funders should support work towards standardizing vocabularies and formats for this data as well critiquing it’s quality and usefulness. This work will necessarily be diverse with disciplinary, national, and object type differences but there is value in coordinating actions. At a recent workshop where funders, service providers, developers and researchers convened we made significant progress towards agreeing routes towards standardization of the vocabularies to describe research outputs.

2. We need to integrate our systems of recognition and attribution into the way the web works through identifying research objects and linking them together in standard ways.

The effectiveness of the web lies in its framework of addressable items connected by links. Researchers have a strong culture of making links and recognizing contributions through attribution and citation of scholarly articles and books but this has only recently being surfaced in a way that consumer web tools can view and use. And practice is patchy and inconsistent for new forms of scholarly output such as data, software and online writing.

The commission should support efforts to open up scholarly bibliography to the mechanics of the web through policy and technical actions. The recent Hargreaves report explicitly notes limitations on text mining and information retrieval as an area where the EU should act to modernize copyright law.

The commission should act to support efforts to develop and gain wide community support for unique identifiers for research outputs, and for researchers. Again these efforts are diverse and it will be community adoption which determines their usefulness but coordination and communication actions will be useful here. Where there is critical mass, such as may be the case for ORCID and DataCite, this crucial cultural infrastructure should merit direct support.

Similarly the commission should support actions to develop standardized expressions of links, through developing citation and linking standards for scholarly material. Again the work of DataCite, CoData, Dryad and other initiatives as well as technical standards development is crucial here.

3. Finally we must closely study the context in which our data collection and indicator assessment develops. Social systems cannot be measured without perturbing them and we can do no good with data or evidence if we do not understand and respect both the systems being measured and the effects of implementing any policy decision.

We need to understand the measures we might develop, what forms of evaluation they are useful for and how change can be effected where appropriate. This will require significant work as well as an appreciation of the close coupling of the whole system.
We have a generational opportunity to make our research infrastructure better through effective evaluation and evidence based policy making and architecture development. But we will squander this opportunity if we either take a utopian view of what might technically feasible, or fail to act for a fear of a dystopian future. The way to approach this is through a careful, timely, transparent and thoughtful approach to understanding ourselves and the system we work within.

The commission should act to ensure that current nascent efforts work efficiently towards delivering the technical, cultural, and legal infrastructure that will support an informed debate through a combination of communication, coordination, and policy actions.

Enhanced by Zemanta

Hoist by my own petard: How to reduce your impact with restrictive licences

No Television
Image via Wikipedia

I was greatly honoured to be asked to speak at the symposium held on Monday to recognize Peter Murray-Rusts’ contribution to scholarly communication. The lineup was spectactular, the talks insightful and probing, and the discussion serious, but also no longer trapped in the naive yes/no discussions of openness and machine readability, but moving on into detail, edge cases, problems and issues.

For my own talk I wanted to do something different to what I’ve been doing in recent talks. Following the example of Deepak Singh, John Wilbank and others I’ve developed what seems to be a pretty effective way of doing an advocacy talk, involving lots of slides, big images, few words going by at a fast rate. Recently I did 118 slides in 20 minutes. The talk for Peter’s symposium required something different so I eschewed slides and just spoke for 40 minutes wanting to explore the issues deeply rather than skate over the surface in the way the rapid fire approach tends to do.

The talk was, I think, reasonably well received and provoked some interesting (and heated) discussion. I’ve put the draft text I was working from up on an Etherpad. However due to my own stupidity the talk was neither livestreamed nor recorded. In a discussion leading up to talk I was asked whether I wanted to put up a pretty picture as a backdrop and I thought it would be good to put up the licensing slide that I use in all of my talks to show that livestreaming, twittering, etc, is fine and encouraging people to do it. The trouble is that I navigated to the slideshare deck that has that slide and just hit full screen without thinking. What the audience therefore saw was the first slide, which looks like this.

A restrictive talk licence prohibiting live streaming, tweeting, etc.

I simply didn’t notice as I was looking the other way. The response to this was both instructive and interesting. The first thing that happened as soon as the people running the (amazingly effective given the resources they had) livestream and recording saw the slide they shut down everything. In a sense this is really positive, it shows that people respect the requests of the speaker by default.

Across the audience people didn’t tweet, and indeed in a couple of cases deleted photographs that they had taken. Again the respect for the request people thought I was making was solid. Even in an audience full of radicals and open geeks no-one questioned the request. I’m slightly gobsmacked in fact that no-one shouted at me to ask what the hell I thought I was doing. Some thought I was being ironic, which I have to say would have been too clever by half. But again it shows, if you ask, people do for the most part respect that request.

Given the talk was about research impact, and how open approaches will enable it, it is rather ironic that by inadvertantly using the wrong slide I probably significantly reduced the impact of the talk. There is no video that I can upload, no opportunity for others to see the talk. Several people who I know were watching online whose opinion I value didn’t get to see the talk, and the tweetstream that I might have hoped would be full of discussion, disagreement, and alternative perspectives was basically dead. I effectively made my own point, reducing what I’d hoped might kick off a wider discussion to a dead talk that only exists in a static document and memories of the limited number of people who were in the room.

The message is pretty clear. If you want to reduce the effectiveness and impact of the work you’re doing, if you want to limit the people you can reach, then use restrictive terms. If you want our work to reach people and to maximise the chance it has to make a difference, make it clear and easy for people to understand that they are encouraged to copy, share, and cite your work. Be open. Make a difference.

Enhanced by Zemanta