Submission to the European Commission Expert Group on Altmetrics

As part of the broader Open Science agenda of the European Commission an expert group on “altmetrics” has been formed. This group has a remit to consider how indicators of research performance can be used effectively to enhance the strategic goals of the commission and the risks and opportunities that new forms of data pose to the research enterprise. This is my personal submission. 

Next Generation Altmetrics

Submission by Cameron Neylon, Professor of Research Communications, Curtin University

1. Introduction

The European Commission has an ambitious program for Open Science as part of three aspirations, Open Innovation, Open Science, and Open to the World. Key to defining the role of evaluation, and within that the role of metrics, in achieving all these aspirations is a clear understanding of the opportunities and limitations that our new, data-rich, environment creates. I therefore welcome the Commission’s call for evidence and formation of the expert group.

My expertise in this area is based on a long term interest in the intersection between research evaluation and policy implementation, specifically the role that new indicators can play in helping to drive (or hinder) cultural change. I was an author of the Altmetrics Manifesto[1] as well as the first major paper on Article Level Metrics[2]. I have more recently (in my previous role as Advocacy Director at PLOS) been closely engaged in technology and policy development, and wrote the PLOS submission to the HEFCE Metrics enquiry[3]. Since leaving PLOS I have been focusing on developing a research program looking at how technology and incentives combine to effect the culture of research communities. In this context recent work has included the preprint (currently under review) Excellence R Us[4] which has gained significant attention, and two reports for Jisc[5,6], that address related issues of evaluation and culture.

2. Next generation Metrics for open science

A. How do metrics, altmetrics & ‘responsible metrics’ fit within the broader EC vision & agenda for open science?

Delivering on the Commission’s agenda across the research policy platform requires a substantial culture change across a range of stakeholders. The cultures of research communities, and the practices that they support are diverse and often contradictory. It is important to separate the question of evaluation and how indicators support this, how evaluation contributes to the overall incentives that individuals and organisations experience, and what effect changes in incentives have on culture. Thoughtful evaluation, including the application of new and improved indicators can contribute to, but will not, on its own, drive change.

B. What are the key policy opportunities and tensions in this area? What leadership role can the EU play in wider international debates?

There are two opportunities that the current environment offers. First, the Commission can take a progressive leadership position on research evaluation. As the HEFCE Metrics enquiry and many others have concluded, much research evaluation has the tail wagging the dog: available indicators drive targets and therefore behaviour. It is necessary to reframe evaluation around what public research investment is for and how different stakeholder goals can be tensioned and prioritised. The Commission can take a leadership role here. The second opportunity is in using new indicators to articulate the values that underpin the Commission’s policy agenda. In this sense using indicators that provide proxies of the qualities that align with the Open Science agenda can provide a strong signal to research communities, researchers and RPOs that these aspects (collaboration, open access, data accessibility, evidence of re-use) are important to the Commission.

3. Altmetrics: The emerging state of the art

A. How can we best categorise the current landscape for metrics and altmetrics? How is that landscape changing? How robust are various leading altmetrics, and how does their robustness compare to more ‘traditional’ bibliometrics?

The landscape of available indicators is diverse and growing, both in the range of indicators available and the quality of data underpinning them. That said, this increase is from a low base. The current quality and completeness of data underlying indicators, both new and traditional, does not meet basic standards of transparency, completeness or equity. These indicators are neither robust, stable nor reliable. Auditing and critical analysis is largely impossible because data is generally proprietary. On top of this, the analysis of this data to generate indicators is in most cases highly naïve and undertheorized. This can be seen in a literature providing conflicting results on even basic questions of how different indicators correlate with each other. Bibliometrics while more established suffer from many of the same problems. There is greater methodological rigour within the bibliometrics research community but much of the use of this data are by users without this experience and expertise.

B. What new problems and pitfalls might arise from their usage?

The primary risk in the use of all such poorly applied indicators and metrics is that individuals and organizations refocus their efforts on performing against metrics instead of delivering on the qualities of research that the policy agenda envisions. Lack of disciplinary and output-type coverage is a serious issue for representation, particularly across the arts and humanities as noted in the HEFCE Metrics report.

C. What are some key conclusions and unanswered questions from the fast-growing literature in this area?

With some outstanding exceptions the literature on new indicators is methodologically weak and under-theorized. In particular, there is virtually no work looking at the evolution of indicator signals over time. There is a fundamental failure to understand these indicators as a signal of underlying processes. As a result there is a tendency to seek indicators that match particular qualities (e.g. “influence”) rather than understand how a particular process (e.g. effective communication to industry) leads to specific signals. Core to this failure is the lack of a framework for defining how differing indicators can contribute to answering a strategic evaluative question, and a tendency to create facile mathematical constructs of available data and defining them as a notionally desired quality.

4. Data infrastructure and standards

I refer the expert group to the conclusions of the HEFCE Metrics report, the PLOS submission to that enquiry[3] and to my report to Jisc[6] particularly on the issues of access to open citations data. Robust, trusted and responsible metrics require an open and transparent data infrastructure, with robust and critical data quality processes, alongside open processes subjected to full scholarly critical analysis.

The Commission has the capacity and resources to lead infrastructure development, including in data and technology as well as social infrastructures such as standards. My broad recommendation is that the Commission treat administrative and process data with the same expectations of openness, quality assurance, re-usablity and critical analysis as the research data that it funds. The principles of Open Access, transparency, and accountability all apply. As with research data, privacy and other issues arise and I commend the Commission’s position that data should be “as open as possible, as closed as necessary”.

5. Cultures of counting: metrics, ethics and research

A. How are new metrics changing research cultures (in both positive and negative ways)? What are the implications of different metrics and indicators for equality and diversity?

The question of diversity has been covered in the PLOS submission to the HEFCE Enquiry[3]. Indicators and robust analysis can both be used to test for issues of diversity but can also create issues for diversity. These issues are also covered in detail in our recent preprint4. Culture has been changing towards a more rigid, homogeneous and performative stance. This is dangerous and counter to the policy goals of the Commission. It will only be addressed by developing a strong culture of critical evaluation supported by indicators.

B. What new dynamics of gaming and strategic response are being incentivized?

Gaming is a complex issue. On one side there is “cheating”, on the other an adjustment of practice towards policy goals (e.g. wider engagement with users of research through social media). New indicators are arguably more robust to trivial gaming than traditional single data-source metrics. Nonetheless we need to develop institutional design approaches that promote “strategic responses” in the desired direction, not facile performance against quantitative targets.

6. Next generation metrics: The way forward

A. Can we identify emerging best practices in this area? What recommendations might we as a group make, and to which actors in EU research systems?

There are structural reasons why it is difficult to identify specific examples of best practice. I take the thoughtful use of data and derived indicators to support strategic decision making against clearly defined goals and values as the ideal. The transparency and audit requirements of large scale evaluations make this difficult. Smaller scale evaluation that is not subject to external pressures is most likely to follow this path. Amongst the large scale efforts that best exemplify efforts to reach these goals is the UK REF, where the question of what “excellence” is to be determined is addressed with some specificity and in Impact Narratives where data was used to support a narrative claim against defined evaluation criteria.

Overall we need to develop a strong culture of evaluation.

  • The Commission can support this directly through actions that provide public and open data sources for administrative and activity data and through adopting critical evaluative processes internally. The Commission can also act to lead and encourage adoption of similar practice across European Funding Organisations, including through work with Science Europe.
  • Institutions and funders can support development of stronger critical evaluation processes (including that of evaluating those processes themselves) by implementing developing best practice as it is identified and by supporting the development of expertise, including new research, within their communities.
  • Scholarly Societies can play a strong role in articulating the distinctive nature of their communities’ work and what classes of indicators may or may not be appropriate in assessment of that. They are also valuable potential drivers of the narratives that can support culture change
  • Researchers can player a greater role by being supported to consider evaluation as part of the design of research programs. Developing a critical capacity for determining how to assess a program (as opposed to developing the skills required to defend it all costs) would be valuable.
  • Publics can be engaged to define some of the aspects of what matters to them in the conduct and outcomes of research and how performance against those measures might be demonstrated and critically assessed to their satisfaction.

References

  1. Priem et al (2010), altmetrics: a manifesto, http://altmetrics.org
  2. Wu and Neylon (2009), Article Level Metrics and the Evolution of Scientific Impact, PLOS Biology, http://dx.doi.org/10.1371/journal.pbio.1000242
  3. PLOS (2013), PLOS Submission to the HEFCE RFI on Metrics in Research Assessment, http://dx.doi.org/10.6084/m9.figshare.1089555
  4. Moore et al (2016): Excellence R Us: University Research and the Fetishisation of Excellence. https://dx.doi.org/10.6084/m9.figshare.3413821.v1
  5. Neylon, Cameron (2016) Jisc Briefing Document on Data Citations, http://repository.jisc.ac.uk/id/eprint/6399
  6. Neylon, Cameron (2016) Open Citations and Responsible Metrics, http://repository.jisc.ac.uk/id/eprint/6377

Squaring Circles: The economics and governance of scholarly infrastructures

This is a version of the paper I’ve had accepted for SciDataCon in a session on the sustainability of Research Data Infrastructures. It was also the basis for the session that I helped lead with Simon Coles at the Jisc-CNI meeting in mid-July in Oxford. The original version was quite short and skips over some of the background material and context. I’m hoping to work it up into a full paper at some point soon so any comments are welcome.

Summary

Infrastructures for data, such as repositories, curation systems, aggregators, indexes and standards are public goods. This means that finding sustainable economic models to support them is a challenge. This is due to free-loading, where someone who does not contribute to the support of the infrastructure nonetheless gains the benefit of it. The work of Mancur Olson (1965) suggests there are only three ways to address this for large groups: compulsion (often as some form of taxation) to support the infrastructure; the provision of non-collective (club) goods to those who contribute; or mechanisms that change the effective number of participants in the negotiation.

Subscription and membership models such as those used for online subscription journals and for some data infrastructures have been our traditional model and are an example of the second approach. These models are breaking down as the technology of the web and the agenda for transparency and open access leads to unbundling, the separate of the different services being provided. This tends to mean commercial suppliers focus on club and private good provision and neglect public good provision. Addressing this will require the development of support models more like taxation. However systems of taxation require a shared – and ideally globally shared – sense of the principles of governance and resource distribution.

In this paper I argue that the focus on sustainability models prior to seeking a set of agreed governance principles is the wrong approach. Rather we need to understand how to navigate from club-like to public-like goods. We need to define the communities that contribute and identify club-like benefits for those contributors. We need interoperable principles of governance and resourcing to provide public-like goods and we should draw on the political economics of taxation to develop this.

The provisioning challenge for public scholarly goods

The fundamental political and economic challenge for groups is the provision of “public goods” or “general utility”. These are goods that are non-rivalrous – they can be infinitely shared – and non-excludable – it is difficult or impossible to stop someone using them. Classical economics tells us there is a provisioning problem for such goods, the rational individual actor will never contribute because whether they do or not, they can still benefit.

Infrastructures, such as repositories for data, articles and code, are very close to the ideal of public goods. Mancur Olson in The Logic of Collective Action (1965) discusses how group size has a profound influence on the provision of public goods, in particular noting that provision is only possible for small groups, or where the public good is a byproduct of the provision of non-public goods that are provided to contributors.

Indeed Olson’s description of the groups that can and cannot provide public goods maps closely onto scholarly infrastructures. Small communities frequently develop local infrastructures out of existing resources (and the contributions to these are usually biased strongly to the larger players in that community as Olson predicts). Large scale infrastructures are also provided by collaboration between small communities but in this case a community of funders (such as those that fund Europe Pubmed Central) or national governments (as is the case for physical infrastructures that are generally formed as Inter-governmental Organisation like CERN). The transition from small to large is challenging and “medium” sized infrastructures struggle to survive, moving from grant to grant, and in many cases shifting to a subscription model.

In the case of digital infrastructures a public good (such as an online article or dataset) can be converted to a club good (made excludable) by placing an authentication barrier around it to restrict access to subscribers (as is the case for online subscription journals and databases). Buchannan and those that further developed his 1965 paper on the economics of clubs have probed how club goods and club size relate (Buchannan, 1965). A core finding is that such sustainable clubs have an equilibrium size that depends on congestion in access to good (the extent to which it is purely non-rivalrous) and the value it provides. With digital resources congestion is low, and the club can therefore grow large. This creates a challenge. Digital resources are not natively excludable, a technical barrier has to be put in place. As the group size rises the likelihood of “leakage” (sharing, or piracy if you prefer) increases. Thus resources are expended on strengthening excludability which leads to both economic and political costs as seen in the Open Access debate.

Solutions to the provisioning problem

If our political goal is to provide large scale access, to make the goods created more public-like through the provision of shared infrastructures then we need to develop a political economics of this “public-making”. Olson provides three routes to creating sustainable infrastructures providing public goods:

  • Compulsion: The good is provided through a mechanism that requires contributions from the whole community. Closed Union Shops, where all workers in a given company are required to be members of a union are an example that Olson discusses in detail. Taxation is another example. In the scholarly infrastructure space, overhead and indirect costs taken by institutions are an example, as is the top slicing of funder budgets to provide infrastructures and services.
  • By-product: The public good is provided through a mechanism that additionally provides club-like and/or private goods to contributors. Olson discusses insurance schemes only available to members of mutual benefit societies that also lobby on behalf of their members. In the research enterprise publishers have to join Crossref to be able to assign Crossref DOIs to outputs (a club good). As a by-product the whole community has access to an interoperable metadata set with defined schemas and access points (a public good).
  • Effective oligopoly: There are far too many funders globally for them to (easily) agree a mechanism for all contributing to any shared scheme. However because a relatively small set of funders fund a substantial proportion of biomedical research they are able to agree mechanisms to fund data infrastructures such as Europe Pubmed Central. Other funders may contribute but many will free-ride.

The difficult truth is that there is no mechanism that will directly lead to a large community supporting the provision of a large-scale public good infrastructure. Any successful sustainability model will depend on some mixture of these three approaches for resourcing. There are interesting models for solving some of these collective action problems such as crowd-funding models where a project only proceeds if sufficient contributions are made, but these amount in effect to new ways to implement compulsion and often also depend on a by-product strategy (the contributor benefits).

If our challenge in delivering on the openness and transparency agenda is one of supporting the conversion of successful medium-scale club-like provision of infrastructure into open systems providing public goods then we need to solve the political and economic problems of transitioning from the club state to a model that successfully provides a mix of these models.

The politics of compulsion: The need for shared principles

Of the three approaches to sustainability, it is generally the second which infrastructures are expected to pursue as they grow. Membership models can work in those cases where there are club goods being created which attract members. Training experiences or access to valued meetings are possible examples. In the wider world this parallels the “Patreon” model where members get exclusive access to some materials, access to a person (or more generally expertise), or a say in setting priorities. Much of this mirrors the roles that Scholarly Societies play or at least could play.

In the scholarly infrastructures space the compulsion/taxation and oligopoly approach are very similar in practice as top slicing funder resources amounts to a tax on overall research funds. Some membership models also approach the level of compulsion. While this is rare in scholarly communities, it is common in professional communities such as medicine, law, and some areas of engineering. Schemes offering professional certification (including the validation of degree programs) blur this boundary as well.

The word “compulsion” is pejorative but there are many activities within the work of researchers that are compulsory. Gaining a doctorate, publishing at some level, having access to the literature in some form are all effectively compulsory. These forms of compulsion (or call them “social expectations” if you prefer) are considered acceptable because the fit within a known and understood system of rules. Systems of taxations are acceptable, according to Adam Smith (1776) where there is proportionality, predictability, convenience and efficiency. Today we would also add representation in governance and sustainability. This requires us to build institutions, in the sense that Elinor Ostrom describes them: “institutions are the prescriptions that humans use to organize all forms of repetitive and structured interactions” (Ostrom, 2005). Much of political economics is bound up in trying to justify post-hoc the provision of institutions like governments, courts, the law by inventing things like “the social contract”. Our advantage as a community, or communities, is that we could explicitly develop agree principles of operation as a way of reducing the costs of creating institutions.

A common set of principles for foundational infrastructures

Building institutions is hard. It takes resources. To reduce costs it makes sense to build templates; sets of agreed principles under which such institutions and systems should operate. If our communities can sign up to a set of principles up front, then building institutions and infrastructures that reflect those principles should become a lot easier.

To address this a draft set of principles were developed to provoke a conversation about the governance and management of these infrastructures (Bilder, Lin, Neylon, 2015). Our principles rest on three pillars: transparency and community governance; financial sustainability, efficiency and commitment to community needs; and mechanisms to protect integrity and manage and mitigate the risk of failures. They draw from the observation of successes in providing foundational infrastructures and seek to generalise these. The focus is on building trustworthy institutions.

Outline of principles for Open Scholarly Infrastructures
Figure 1: Principles for Open Scholarly Infrastructures

It is interesting to note that these also map quite closely to Adam Smith’s four principles for sound taxation (Smith, 1776). The commitment to representation is more modern, and the concept of enabling a community to fork a project through committing to Open Source and Open Data – while modern in its approach – is an expression of the principle of efficiency, a mechanism for the community to restrain costs in effect.

The principles (or a future refinement or replacement for them) can serve in two ways. First they could be used to set out the minimum requirements for governance and sustainability that are required before funders are willing to directly fund (the oligogopoly or tax mechanisms). Second they provide a template for a developing club, either one built in a community sufficiently small to bootstrap its funding, or one that has found a byproduct model, to demonstrate its ability to make the transition from club to infrastructure.

Predictions and practical consequences

The preceding is an abstract argument. What are its practical consequences for actually sustaining infrastructures? First we can make a prediction to be tested:
All sustainable scholarly infrastructures providing collective (public-like) goods to the research community will be funded on one of the three identified models (taxation, byproduct, oligopoly) or some combination of them.

Second, we can look at stable long standing infrastructures (Crossref, Protein Data Bank, NCBI, arXiv, SSRN) and note that in most cases governance arrangements are an accident of history and were not explicitly planned. Crises of financial sustainability (or challenges of expansion) for these organisations are often coupled to or lead to a crisis in governance, and in some cases a breakdown of community trust. Changes are therefore often made to governance in response to a specific crisis.

Where there is governance planning it frequently adopts a “best practice” model which looks for successful examples to draw from. It is not often based on “worse case scenario” planning. We suggest that this is a problem. We can learn as much from failures of sustainability and their relationship to governance arrangements as from successes.

Above all the key is to learn from our experiences, as well as from the theory of economics and governance to identify the patterns and templates that provide resilient organisations and infrastructures that through being trustworthy earn the trust of their communities through both the good times and the bad.

References

  • Bilder G, Lin J, Neylon C 2015 Principles for Open Scholarly Infrastructure-v1, Available at http://dx.doi.org/10.6084/m9.figshare.1314859 [Last Accessed 30 May 2016]
  • Buchannan, JM 1965 An Economic Theory of Clubs Economica, 32(125):1-14
  • Olson, M 1965 The Logic of Collective Action: Public Goods and the Theory of Groups (Revised ed.). Harvard University Press, Cambridge Massachussets
  • Ostrom, E 2005 Understanding Institutional Diversity (Revised ed.). Princeton University Press, Princeton
  • Smith, A 1776 An Inquiry into the Nature and Causes of the Wealth of Nations. W. Strahan and T. Cadell, London

A letter to my MP, the Honourable Ben Howlett, member for Bath

The Honourable Ben Howlett MP
Member for Bath
House of Commons, United Kingdom

Dear Ben,

I need to come clean at the beginning of this letter. I did not vote for you in the general election. I am unlikely to vote for you or any member of your party in the future. We come from different political, and I would imagine cultural backgrounds. Nonetheless we were on the same side of the debate for the referendum. This as I will return to, offers me some hope for repairing the damage that has been done.

I write without the ability to offer any answers but with the hope that some of my perspectives might be useful in helping you reach the difficult decisions you will have to make on how to vote in parliament over the next few months. I am an immigrant of Australian birth and a UK citizen. I am a researcher, but one that holds a post in a foreign university, and a small business owner, and a resident of your constituency. Because a substantial proportion of my income comes from overseas the referendum, and its outcome, has made me substantially better off in the short term, just as it makes the UK a substantially less attractive place to do my work in the longer term. This is an irony that leaves a bad taste in my mouth.

As you consider your position on key votes over the next few months I would ask you to consider the following three issues as central to the argument of how we proceed together:

First, the platform the referendum was fought on was a farce. We are rapidly heading to a situation where no-one will get what they thought they were voting for. Those of us who voted to remain are told it is impossible because the referendum fell the other way by a 2% margin. Those who voted to leave so as to repatriate funds paid to the EU were the first to be told they were lied to. Those that voted to leave due to concerns over immigration were the next to be disappointed. As we go into negotiations and it becomes clear that we will retain the vast majority of EU regulations and requirements so as to retain access to the single market those who voted for less regulation will the last group to discover that no element of the Leave platform will actually be delivered in practice. Referenda are blunt instruments at the best of times.

The result is also a demographic tragedy. It has already been suggested that by the time negotiations are concluded a re-run referendum could deliver a majority to remain in the EU, even if no-one changed their minds[1, 2, 3], simply due to the divergence in the proportion of young and old voters voting Leave and Remain. The combination of a lack of a clear plan of action, demographics and the inevitable delay for negotiation, and the divide in the community the referendum has created means that there is no mandate, democratic or otherwise, for a specific course of action or negotiation.

I ask you to consider how a consensus can be created for a course of action that has a demonstrable democratic mandate as parliament moves towards a decision on whether to make an Article 50 notification to the European Council.

Secondly, I ask you to consider the form of leadership we need as you consider your choice in electing a new leader of the Conservative Party. I will not offer a direct opinion as my own political bias clouds my view of many of your colleagues. I have been reminded recently of Nolan’s seven Principles of Public Life: selflessness, integrity, objectivity, accountability, openness, honesty, and leadership[4]. That these seem terribly old-fashioned, even naive, should be a concern. I am more than happy to work with those I disagree with, the best decisions come from arguing an issue out from multiple perspectives. But these qualities, qualities that underpin constructive discussion and resolution of difficult issues, have been notably missing from public life in recent weeks. We desperately need more than political gamesmanship to resolve these issues. We need leadership focussed on building consensus.

I ask you to consider in your choice of vote for leader of the parliamentary conservative party the qualities of leadership that will deliver consensus for the whole country, or failing that the courage to call a general election to deliver a democratic mandate for action.

And this brings me to my third point. Many people voted in this referendum for Leave for different reasons. Many of those reasons are deeply rooted in anger and disenfranchisement as a consequence of changes in the world. That areas with some of the most EU funding voted most strongly to leave, that the slogan of “take back control” resonated at so many different levels[5], that level of education so strongly correlated with voting intention[3], means we have a problem. Not a problem that can be dealt with by telling people they are wrong, or by explaining that the rising tide of the free market floats all boats, nor a problem that can be dealt with by simple leftist re-distribution. I can easily imagine being sick of being told that you’re wrong, failing to see this rising tide, and resenting being seen as a charity case (or indeed the political punching bag for a right vs left fight). We have a fractured society, in which our economic and political systems have failed to distribute the gains that immigration and globalization have created. The likely centrist fudge which will be made to resolve our current issues will completely fail to address the underlying issues that led to the referendum result. And we ignore those issues at our peril.

The country is split: whether North-South, over 50-under 50, city-rural, Labour voting-Tory voting, Remain-Leave. Those hard dichotomies are the result of two-party political system that is no longer fit for purpose, a polarised and politicised mass media, an increasing divide between rich and poor and between those with opportunities and those without, and are reinforced by the filter bubbles of social media (and the perhaps much greater bubble caused by the exclusion of the 25% of residents who’ve never used the internet). It is those hard splits, the binary decision making, more than any specific treaty or referendum that will kill us. The false choice of this or that; in or out, Labour or Tory, rather than finding new ways to reach consensus, to listen and understand the concerns of those who are disenfranchised. I don’t doubt that you and I would have different ideas about how to work to resolve those issues but that is a strength, not a weakness. Chances are we’re both wrong, but ideally in different and complementary ways.

And this is where I return to the point of hope. We may come from different political traditions. But we were on the same side of the referendum, arguing against many of our “natural” political allies. What this shows more than anything else is that there isn’t really a “natural” political landscape. The Punch and Judy, zero-sum game, in which there has to be a winning and losing side is something we need to be working towards consigning to the past. I don’t have any real answers on how, just a sense that we need to listen more the grievances that people have and work, somehow, towards a collective demonstrable consensus on what happens next.

Because if someone “wins” the argument over if and how we leave the EU then we will all have lost.

Yours truly,

Cameron Neylon

 

  1. https://twitter.com/gemmaklowe/status/747172162491023365/photo/1
  2. https://twitter.com/WillBrambley/status/747450027518398464
  3. http://lordashcroftpolls.com/2016/06/how-the-united-kingdom-voted-and-why
  4. https://www.gov.uk/government/publications/the-7-principles-of-public-life
  5. http://www.perc.org.uk/project_posts/thoughts-on-the-sociology-of-brexit/

NotAllBrexiters

I’ve been conflicted about posting this. I had planned to write something along these lines for several weeks but the murder of Jo Cox threw that sideways. The way I write tends to involve pushing words around in my head for a week or so, and then writing it all out. Thus most of this existed in some form prior to her murder but it was not written down. I don’t want to claim any prescience but those events, and the subsequent debate over their meaning, so reinforced the fears that led to this that I’ve decided to go ahead and post it.

I am an economic immigrant. And a British Citizen. I am culturally European, with primarily British ancestry, born in Australia. I never know how to answer the census question: am I non-white British as an Australian? I am unquestionably a member of this “metropolitan elite”. And I get irritated by the way Londoners forget the rest of the country exists. I am a mess of contradictions. Like most humans I guess.

And I’m frightened.

“Oh. But you’ll be OK”. Someone actually said this to me. I’m apparently not the problem, some unspecified “other” people are the problem. Other immigrants, with different accents, skin colours, or countries of origin. People I know. People I work with. I wonder which of them are on the wrong side of this invisible dividing line. This line I shouldn’t worry about because “oh, but you’re a good chap”.

We are far beyond the point where argument or “facts” will change anyone’s mind in this referendum. We are down the base level of what this was always about: identity politics. Arguments about regulation, or democracy are not really about the facts on the ground but a fight over whether “our” (desperately flawed) systems of elections, governance and regulation are better than “their” (desperately flawed) systems of negotiation, consensus building and, yes, democratic checks and balances.

It’s about how globalisation undermines some forms of identity, particularly those rooted in place, class, and traditional roles, and how it reinforces others, particularly those rooted in mobility, internationalism and some strands of social liberalism. And its about fear. Fear of the outsider. Fear of being the outsider. Fear of being on the wrong side of that line.

There are plenty of good solid arguments for Britain to leave the European Union. I know a number of people who genuinely find those arguments decisive. But the centre of the campaign to leave has been driven by the stoking of fear: fear of Turkey, fear of Syrians, Farage and his posters, and above all fear of a loss of control over a particular form of British, or at least English, identity.

And in turn those of us with a different identity – an identity often rooted in being an immigrant, or a traveller, or having students, colleagues or friends from many places – are also voting out of fear. A fear of where that line falls. Will we be tossed out? Some will. Will we be made to feel unwelcome? Many of us already do. And fear begets fear begets distrust begets anger begets violence.

I’m not writing this to tell anyone what to think or how to vote, or to accuse anyone of racism by association. I’m writing this to explain how I feel. And how I suspect many other immigrants, and children of immigrants, and friends and family of immigrants feel. The point of the title of this post is precisely that not that all those voting leave are driven by fear, but that virtually every person I know who is an immigrant or is close to immigrants is fearful. Not all Brexiters. Yes, pretty much, all immigrants.

Don’t ask me to be happy that I’m on the right side of that line. Because lines have a habit of moving.

Canaries in the Elsevier Mine: What to watch for at SSRN

English: Yellow canary
English: Yellow canary (Photo credit: Wikipedia)

Just to declare various conflicts of interest: I regard Gregg Gordon, CEO of SSRN as a friend and have always been impressed at what he has achieved at SSRN. From the perspective of what is best for the services SSRN can offer to researchers, selling to Elsevier was a smart opportunity and probably the best of the options available given the need for investment. My concerns are at the ecosystem level. What does this mean for the system as a whole? 

 

 

The first two paragraphs of this post have been edited following its initial posting. The original version can be found at the bottom of the post.

The response to the sale of SSRN to Elsevier is following a pattern that will be very familiar to those who watched the sale of Mendeley a few years back. As I noted in my UKSG talk we face real communications problems in scholarly communication because different groups are concerned about very different things, and largely fail to understand those differences. These plays out in commentary that argues that “the other side” has “failed to understand” the “real issues” at stake, and mostly missing the issues that are at the centre of concerns for that “other side”.

To be clear, obviously I’m doing exactly the same thing in this post, it’s just that I’m pointing the finger at everyone and at a slightly higher level of abstraction. What I hope to do in this post is use one particular strand of the arguments being raised to illustrate how different stakeholders focus on quite different issues when they ask the question “is this purchase good or bad”. The real questions to ask are “what is it we care about” and “what difference does the purchase make to those aspects”. What we can do is look at this particular strand of the conversation and what it tells us about the risks, both for SSRN as a trusted repository for a range of research communities, and for the ecosystem as a whole.

The strand I’m interested in both illustrates the gaping divide between the narrative on both sides and also because it shows how quickly we forget history. But its useful because it also points us to some criteria for judgement. That strand goes like this: “Elsevier did a great job of supporting and looking after Mendeley [ed. as someone noted I originally had Elsevier here, arguably a Freudian slip…], so you can expect SSRN to be the same”. That’s a paraphrase rather than a direct quote but it captures the essence of a narrative that focuses on the idea that all the fears that arose when Elsevier purchased Mendeley turned out to be unfounded.

Except of course it depends which fears you mean. And that’s where the narrative gets interesting.

What were the fears for Mendeley’s future?

First the credit side. Elsevier invested massively in Mendeley, including a desperately needed top to bottom re-engineering of the code base. It’s not widely known just how shakey the Mendeley infrastructure was at the point of purchase, and without a massive capital investment from somewhere its questionable whether it could have stayed afloat. Elsevier have also maintained Mendeley as a free service for users – many assumed that it would convert to a fully paid model. The free for users offering sits alongside an institutional business offering, but that was already the primary focus of the team before the purchase. Indeed, again for full disclosure I, amongst I am sure many others, strongly recommended that Mendeley should shift its business focus from individual memberships to institutions long before the purchase.

Yes, there was a fear that Elsevier would just crush Mendeley, that it would go the way of many previous purchases, but at the very least that would never have been the intention, and many of those fears seem to have been unfounded.

Except that they weren’t the fears that at least some of us had. It’s actually telling that the narrative has legs because it shows how much of recent history has been forgotten. What has been largely forgotten that Mendeley was once positioned as a major and very public force in driving wider access. The description of Mendeley on the Elsevier Connect blog post about the SSRN purchase illustrates this:

As a start-up, Mendeley was focused on building its reference manager workflow tool, gaining a loyal following of users thankful that it made their lives easier. Mendeley continues to evolve from offering purely software and content to being a sophisticated social, informational and analytical tool.

SSRN—the leading social science and humanities repository and online community—joins Elsevier – Gregg Gordon

It’s not that this isn’t true. But it excises from history one of the significant strands of Mendeley’s early development. Compare and contrast that with this from Ricardo Vidal on the official Mendeley blog in 2010 (emphasis added):

Keeping with the Open Access week spirit, we’re taking this opportunity to show you how to publicly share your own research on Mendeley. Making it openly available for others to easily access means they are more likely to cite you in their own publications, and also allows your colleagues to build upon your work faster.

When you sign up for a Mendeley user account, a researcher profile is created for you. On this page, along with your name, academic status, and short bio, you will also see a section titled “Publications”. This section is where you can display work you’ve published or perhaps even work that’s not yet published.

Or this from William Gunn in 2012 (again emphasis added):

Requiring the published results of taxpayer-funded research to be posted on the Internet in human and machine readable form would provide access to patients and caregivers, students and their teachers, researchers, entrepreneurs, and other taxpayers who paid for the research. Expanding access would speed the research process and increase the return on our investment in scientific research.

We have a brief, critical window of opportunity to underscore our community’s strong commitment to expanding the NIH Public Access Policy across all U.S. Federal Science Agencies. The administration is currently considering which policy actions are priorities that will they will act on before the 2012 Presidential Election season swings into high gear later this summer. We need to ensure that Public Access is one of those priorities.

From 2008 when I met Victor Henning for the first time until 2013 Mendeley was a major, and very public, advocate for Public Access and Open Access. It was a big part of the sales pitch for the service and a big part of the drive to increase the user base, particularly amongst younger researchers. Mendeley provided a means for researchers to share their articles on a public profile and it did this in the context of a service that was useable and gaining traction. Alongside this it made a very public commitment to open data, being one of the first scholarly communications services to have a functional API providing data on bookmarks. It was the future of those activities that many of us feared in 2013 when the purchase went through. So how have they fared?

Has Mendeley thrived as an “open access” organisation?

Public sharing of articles is long gone. Long enough that many people have forgotten it ever existed. I can’t actually figure out exactly when this happened but it was around the time of the purchase. A chunk of a Scholarly Kitchen post on the sale is very sceptical of how this could be managed inside Elsevier. Now any public links on researcher profiles are to the publisher versions of the articles. Note that shift: an available version of a manuscript was removed from public view and replaced with a link to the publisher website (not augmented, not “offering the better version”, replaced). Certainly Mendeley does not provide any means of publicly sharing documents any more.

Has Mendeley retained an individual voice as an advocate of Public and Open Access? Well to be fair there is at least some evidence of independence. This from William Gunn on the STM/Elsevier policies on article sharing at least raises some things that Mendeley was not happy with. But more usual are posts that align with the overall Elsevier position. This on the UK Access to Research Program for instance is a straightforward articulation of the Publishers Association position (and completely at odds with those earlier positions above). A search for “open access” gives this post on “Myths of Open Access“, a rather muted re-hash of a much older BioMedCentral post. The next several hits are posts from before the purchase, followed by the Access to Research post. There is nothing wrong with this, priorities change, but Mendeley is no longer the place to go for full throated Open Access advocacy.

What about the API? As was noted of both the Mendeley and SSRN purchases it’s the data that really matters. The real diagnostic of whether Elsevier was committed to the same vision as the founders would have been strong support for the API and Open Data. Now, the API never went away, but nor was it exactly the priority for development resources. For those who used that data the reliability went steadily downhill from around 2013 onwards. What had once been the jewell in the crown of open usage data in scholarly communications was for periods basically un-usable. The priority was working with internal Elsevier data, building the vertical integration from Scopus through to Mendeley to support future products. Outside users (and therefore competitors) were a distant second. Things have improved significantly recently, a sign perhaps of investment priorities changing, but it is also coupled with some other troubling developments that indicate a desire to maintain control over information.

Within the STM/Elsevier Article Sharing Policy is an implication of a requirement for information tracking: “Publishers and libraries should be able to measure the amount and type of sharing, using standards such as COUNTER, to better understand the habits of their readers and quantify the value of the services they provide”. Read that next to this post on the LibLicense List on growing restrictions on the re-use of COUNTER data imposed by publishers. Then ask why there is a need for a private mechanism for pushing usage data, one that is intended to support the transport of “personally identifiable data about usage”? Business models where “you are the product” benefit from scale and enclosed data. Innovative and small scale market entrants benefit from an ecosystem of open data – closed data benefits large scale and vertically integrated companies, of which there is really only one example in the Scholarly Communications space (for the future watch SpringerNature and the inevitable re-integration of DigitalScience after the IPO but they’re not there yet).

If the argument for SSRN continuing successfully within Elsevier depends on the argument as to whether Mendeley has succeeded within Elsevier, then that argument depends entirely on what you mean by “succeeded”. Mendeley has changed: opinion will divide rather sharply between those who think that change is overwhelmingly good and those that think that change is overwhelmingly bad. About now, one group of readers (if they made it this far) will be exasperated with the “wooly” and “political” nature of the comments above. They will point to Mendeley “growing up” and “becoming serious”, properly integrated into a real business. Another group will be exasperated by the failure to criticise any and all commercial involvement, and the lack of commitment to the purity of the cause. If you fall into one of those camps then it tells me more about you, than it does about either Mendeley or SSRN. Far more interesting is the question of whether each element of that change is good, or bad, and for who, and at what level? Either way, the argument that “Mendeley thrived within Elsevier, therefore SSRN will” is certainly not a slam dunk.

What does this tell us about SSRN?

So what of the canaries? What does this tell us about what we should be watching for at SSRN? Certainly the purchase makes sense in the context of establishing stronger and broader data flows that can be converted to product for Elsevier. But what would be a sign that the service is drifting away from its original purpose? It’s those same three things. The elements that I have argued were lost after Mendeley was purchased.

  1. Advocacy: SSRN always occupied a quite different space in its disciplinary fields to that of Mendeley, and has never had a strong policy/advocacy stance. Nonetheless look for shifts in policy or narrative that align with the STM Article Sharing Policy or other policy initiatives driven from within Elsevier. Particularly in the light of recent developments with the Cancer Moonshot in the US look for efforts to use SSRN to show that “these disciplines are different to science/medicine”.
  2. Data: SSRN doesn’t have an API and access to data on usage is moderately restrictive already. One way for SSRN and Elsevier to prove me wrong is to make a covenant on releasing data or even better for them to build a truly open API. In the meantime watch for changes in terms of use on the pages that provides the rankings. When it is updated with the major site refresh that is almost certainly coming check for the tell tale signs of obfuscation to make it hard to scrape. These would be the signs of a gradual effort to lock down the data.
  3. Redirection: This is the big one. SSRN is a working paper repository. That is its central function. In that way it is different to Mendeley where you could always argue that the public access element was a secondary function. Watch very closely for when (not if) links to publisher versions of articles appear. Watch how those are presented and whether there is a move towards removing versions that might (perhaps) relate closely to a publisher version. Ask the question: is the fundamental purpose of this repository changing based on the way it directs the attention of users seeking content?

SSRN is a good service, and the Elsevier investment will improve it substantially. Whether the existing community of users will continue to trust and use it is an open question. I’ve heard more disquiet from real researchers beyond the twitter echo chamber than I would have expected but any reading of history suggests that it will lose some users but retain a substantial number. Service improvements offer the opportunity for future growth, although SSRN already has substantial market penetration, something that wasn’t true of Mendeley and its subsequent growth. It’s not clear to me that integration with the wider Elsevier ecosystem is something that user base wants, but improved interfaces and information flow to other systems certainly are. The real question is the extent to which Elsevier exerts control over information and user flows, and whether it uses that to foster competition or to control it.

Note on changes made post-publication: I have modified the initial paragraph because it was pointed out that it could be read as me being dismissive of the quality of arguments being made in other posts on the SSRN sale. That wasn’t my intent. What I had intended to say was that the different sides of those arguments build on quite separate narratives and world-views and that I had seen little evidence of real understanding of what the issues are across the stakeholder and cultural divides.  The original first paragraph is below.

The response to the sale of SSRN to Elsevier is pretty much entirely predictable. As I noted in my UKSG talk different groups are simply talking past each other, completely failing to understand what matters to the others, and for the most part failing to see the damage that some of that communication is doing. But that’s not what I want to cover here. What I’m interested is a particular strand of the conversation and what it tells us about the risks, both for SSRN as a trusted repository for a range of research communities, and for the ecosystem as a whole.

Scholarly Communications: Less of a market, more like general taxation?

English: Plot of top bracket from U.S. Federal...
Top bracket from U.S. Federal Marginal Income Tax (Photo credit: Wikipedia)

 

This is necessarily speculative and I can’t claim to have bottomed out all the potential issues with this framing. It therefore stands as one of the “thinking in progress” posts I promised earlier in the year. Nonetheless it does seem like an interesting framing to pursue an understanding of scholarly communications through.

The idea of “the market” in scholarly communications has rubbed me up the wrong way for a long time. Both the moral and political superiority claimed by private and commercial players for the presumption that markets should be unregulated and protected from “public interference” and the simplistic notion (acknowledging I’ve been guilty of doing this myself) of pointing to the market as “broken”.

A market, whether meant in the sense of a system of exchange that optimising complex pricing for a (set of) good(s) or in the political sense of a protected space for free private exchange, requires the ability for free exchange across that “market”. In it’s political sense it should also be somehow “organic” – that is it develops free from direct subsidy or indirect government control. None of these are true of the scholarly communications system: there is neither free exchange amongst providers, nor purchasers, there is as we shall see effectively compulsion on purchasers and providers, and the whole system only exists because of government subsidy.

Pointing to market failure isn’t helpful if there isn’t really a market. Nor is claiming that markets support better innovation and customer service. It’s poor politics but its also poor system analysis. If we want to make things better, we need a better framing that actually explains, and ideally predicts how changes might play out. After a conversation this morning I’m wondering whether general taxation is a better framing of the scholarly communications system.

Mancur Olson, in The Logic of Collective Action lays out the problem of how large groups can provision what he calls a “collective good”, today we would use the term “public good”. While he focusses on other examples he keeps the case of general taxation in the background. His conclusion is that large groups, which he describes as “latent”, need some form of compulsion to provide public goods. Buchannan, Ostrom and others would later work on the way in which smaller groups can successfully manage public-like goods that have some (differing) elements of private good-like natures. Buchannan focussed on “local public goods”, what we now call “toll goods” or “club goods“, which are non-rivalrous but excludable, Ostrom on “common pool resources” (CPRs) which are non-excludable but rivalrous. In both cases they diagnose limits on the scale of groups that can successfully manage these resources. Neither Ostrom nor Buchannan looked in great detail about the gradation from their ideal types (pure CPRs and club goods) as good become more public-like. We therefore need an economics of “public-making”.

The scholarly community (at least as currently configured) is far too large to provide public goods without compulsion or to manage CPRs or club goods. Yet research is about the generation of public goods (or at least public-like club goods). We therefore have a provisioning problem, and according to Olson one that requires compulsion to resource. We need tax.

That publishers provide a public-like good is easy to demonstrate for subscription articles. Once a PDF is created it is an almost pure public good in the wild. It can be distributed infinitely (non-rivalrous) and is almost impossible to stop (non-excludable). Equally the provision of metadata that is freely available for re-use is also a public good provision. The benefits of the review process are available to all who can access the article so that is also a public good. But herein lies a paradox. If these are public goods then classical economics tells us they won’t be provided by private enterprise. So what is going on?

One of Olson’s key insights is to separate the issue of public vs private (which is a vexed division in scholarly communications anyway) from systems that can impose the compulsion that provide public or collective goods. He notes for instance that in the early to mid-20th century workers in many US firms voted overwhelmingly in secret ballots to impose union membership on themselves and to create closed shops. Although we don’t get a vote on it, the world of academic publishing is not so different from the closed industrial shops of 50s America. If you don’t publish in the right places, you’re not part of the club. So there is a “tax” on the supply side, although the dues are paid in the effort of authorship (and refereeing) rather than as union membership fees. That compulsion is how knowledge as a club good is made more public-like.

But what about the demand side? What about the real money? Publishers are also providing a public good for institutions. It’s the system that matters from an institutional perspective, the exposure that their researchers get, the credit for publication and therefore “membership”. One journal more or less doesn’t really matter. Again the classical economics says there should be a substantial free rider problem with each library gradually cutting back on subscriptions to bring down expenditure (this argument has nothing to do with the serials crisis, it would still follow with a flat, or even dropping over all expenditure). In practice libraries don’t really have a choice about dealing with the Big 4 publishers and not that much choice about what they pay. Fees are configured so that there are only marginal savings in cutting the big deal and going a la carte. Indeed fees are based on a bizarre set of historical factors that have no real basis in reality. Effectively there is compulsion. It is more like tax than a market.

If the Big 4 are a bit like Federal income tax in the US system (yes, the analogy breaks down a bit there but there are multiple strands of national tax in some countries, such as “National Insurance” here in the UK) then the smaller publishers are a bit similar to US state income taxes. If you want you can move to a state with low or no tax, and you can equally have an institution that does no humanities, or no chemistry and avoid paying out to SAGE or the ACS, but those are not the regular choices, and they limit your opportunities to operate. Article Processing Charges are equivalent to a sales or consumption tax. The analogy works right down to the political argument about how consumption taxes can in principle reduce income tax evasion (or redistribute the burden of the scholcomms system back towards the biggest players) because even those who can afford good accountants (or in our case subscription negotiators) still consume goods (or publish articles), but often in practice also raise access barriers for the poor, particularly those seeking to raise their income above the local tax-free threshold.

The serials crisis is then the unconstrained rise in collective provisioning. Far from being a market response to demand, it is a self-reinforcing increase in provision that can’t possibly be subject to proper market discipline, because a market can never truly operate. I don’t need to spell out the political consequences of this argument, but I find it fascinating that we can frame the system as a European general taxation system run by organisations that espouse American fiscal politics. It certainly creates a moral equivalence between “private” provision of these public goods and collective provisioning, such as through repository systems. There really isn’t a difference. It’s simply a question of which form of provision is better suited and better value.

The economic consequences are that we cannot constrain costs through market oriented actions. The only way to constrain run away collective provisioning is through collective action. But collective action is hard because we operate in an existing collective system, with its existing mechanisms of compulsion. The evolution of systems that provide public goods is slow, compared to that of private goods (I am minded to compare manuscript submission systems with government websites as an example). Change will come locally, with local changes of rules. The local governments in the system, the scholarly societies, perhaps some institutions and University Presses, are where change can start, where there are small groups, those that Buchannan and Ostrom as well as Olson identify as able to act.

Ostrom in particular notes that very large scale management of common pool resources is possible within communities (and that privatisation and nationalisation both tend to fail) as long as institutions are built that can coordinate groups of groups. Federations of societies and groups of institutions alongside collective mechanisms like Open Library of Humanities and Knowledge Unlatched offer some possible directions, as does an interesting increase in the number of endowed journals being launched. Ostrom’s prescription is to build institutions that support communities working together effectively, so they can build these large collectives.

And the big governments? Well they can be changed or their orientation shifted from “big” to “small”. And if I recall correctly issues with taxation are often at the root of that change.

Open Access Progress: Anecdotes from close to home

nanoparticles
Solution extinction measurements of (A) faceted (B) and Au@MNPs and (C) photos of the particles. From Silva et al Chem. Commun., 2016 DOI:10.1039/C6CC03225G

It has become rather fashionable in some circles to decry the complain about the lack of progress on Open Access. Particularly to decry the apparent failure of UK policies to move things forward. I’ve been guilty of frustration at various stages in the past and one thing I’ve always found useful is thinking back to where things were. So with that in mind here’s an anecdote or two that suggests not just progress but a substantial shift in the underlying practice.

I live with a chemist, a group not known for their engagement with Open Access. More than most other disciplines in my experience there is a rigid hierarchy of journals, a mechanistic view of productivity, and – particularly in those areas not awash with pharmaceutical funding – not a huge amount of money. Combine that with a tendency to think everything is – or at least should be – patentable (which tends to rule out preprints) and this is not fertile ground for OA advocacy.

Over the years we’ve had our fair share of disagreements. A less than ideal wording on the local institutional mandate meant that archiving was off the menu for a while (the agreement to deposit required all staff to deposit but also required the depositor to take personal responsibility for any copyright breaches) and a lack of funds (and an institutional decision to concentrate RCUK funds and RSC vouchers on only the journals at the top of that rigid hierarchy) meant that OA publication in the journals of choice was not feasible either. That argument about whether you choose to pay an APC or buy reagents for the student was not a hypothetical in our household.

But over the past year things have shifted. A few weeks ago: “You know, I just realised my last two papers were published Open Access”. The systems and the funds are starting to work, are starting to reach even into those corners  of resistance, yes even into chemistry. Yes it’s still the natural sciences, and yes it’s only two articles out of who knows how many (I’m not the successful scientist in the house), buts its a quite substantial shift from it being out totally out of the question.

But around about the same time something that I found even more interesting. Glimpsed over a shoulder I saw something I found odd…searching on a publisher website, which is strange enough, and also searching only for Open Access content. A query raised the response: “Yeah, these CC BY articles is great, I can use the images directly in my lectures without having to worry; I just cite the article, which after all I would have obviously done anyway”. It turns out that with lecture video capture now becoming standard universities are getting steadily more worried about copyright. The Attribution licensed content meant there was no need to worry.

Sure these are just anecdotes but they’re indicative to me of a shift in the narrative. A shift from “this is expensive and irrelevant to me” to “the system takes care of it and I’m seeing benefits”. Of course we can complain that its costing too much, that much of the system is flakey at best and absent at worst, or that the world could be so much better. We can and should point to all the things that are sub-optimal. But just as the road may stretch out some distance ahead, and there may be roadblocks and barriers in front of us, there is also a long stretch of road behind, with the barriers cleared or overcome.

As much as anything it was the sense of “that’s just how things are now” that made me feel like real progress has been made. If that is spreading, even if slowly, then the shift towards a new normal may finally be underway.

What is it publishers DO? (Reprise)

This is more a note to write something on this up in some more detail. In the original post What is it publishers do anyway? I gestured towards the idea that one of the main value-adds for the artist formerly known as the publisher is in managing a long tail of challenging, and in some cases quite dangerous issues. What I didn’t quite say, but was implicit, is that a big role for publishers in preventing the researcher-author from getting egg on their face.

Enter this weeks entry into the pantheon of grotesque research ethics fails, the release of 70 thousand OK Cupid records as a dataset on the Open Science Framework. Now OSF and its home the Centre for Open Science would not claim to be a “publisher” but it provides a platform for publishing research outputs just as does Nature, Cell, PLOS ONE, bioRxiv, Figshare, and Dryad…and…Github.

Some of these platforms would have triaged this (or anything like it) out before it went public, or where its relevant before it even reached a reviewer. You might be a little surprised by which ones (remember PNAS publishing the Facebook emotion study paper?). These kinds of disaster submissions don’t need to be very frequent before it can require significant resources to catch the ones that do come in. It also means you need to develop policies to decide what to allow on the platform. For instance bioRxiv did not allow “medically relevant” manuscripts to be made public on their platform when it launched.

As we expand the ways and forms in which we communicate research outputs, it will become increasingly important to ask what standards we expect of ourselves in deciding where and how things are shared, and what expectations we have of those platforms that make a claim to be “scholarly” in some sense. No-one expects Google to police the publishing of each and every dataset on a shared drive. Everyone expects a traditional publisher to ensure community ethical standards are applied (even, or rather especially, when the community is diverse enough to not necessarily know what they are). Where does OSF, or bioRxiv, or arXiv fit into this continuum?

It is catching this kind of thing that really contributes to the costs that publishers incur. When we talk about a completely in-house system this is the kind of thing we’re talking about throwing away. Of course there may be better and cheaper ways of maintaining these kinds of standards and we need a serious discussion about which ones matter and what kinds of breaches are acceptable. But for those of you who wonder where that extra few hundred dollars in costs is coming from, this is definitely one of them.

 

PolEcon of OA Publishing VI: Economies of Scale

Victory Press of Type used by SFPP (Photo credit: Wikipedia)
Victory Press (Photo credit: Wikipedia)

I think I committed to one of these every two weeks didn’t I? So already behind? Some of what I intended in this section already got covered in What are the assets of a journal? and the other piece Critiquing the Standard Analytics Paper so this is headed in a slightly different direction from originally planned.

There are two things you frequently hear in criticism of scholarly publishers. One is “why can’t they do X? It’s trivial. Service Y does this for free and much better!”. I covered some of the reasons that this is less true than you might think in What’s the technical problem with reforming scholarly publishing? In particular I argued that it is often the scale of systems and the complexities of their interconnection that mean that things that appear simple (and often are for a one off case) are much more complex in context.

In turn this argument, that big systems become more rigid and require greater investment in management and maintenance, raises the other oft-heard comment, that the industry is “ripe for disruption” by new nimble players. There is growing criticism that Christenson’s narrative of industrial disruption isn’t very general (see for e.g. articles in New Yorker, Sloan Business Review[paywall]), but for the purposes of this piece lets assume that it is a good model; that large industrial players tend towards a state of maintenance and incremental improvement where they are vulnerable to smaller, more innovative players who have less to lose, less investment in existing systems, and customers, and can therefore implement the radical change. This kind of narrative is definitely appealing, even empowering, to those agitating for change. It is also picked up and applied by analysts who take a more traditional approach.

So why hasn’t it happened?

The news pieces heralding the imminent demise of Elsevier started in the mid-90s. And people in the know will admit that internally there was a real fear that the business would be in big trouble. But despite this the big players remain remarkably stable. Indeed consolidation in the industry has increased that stability with a rapidly decreasing number of independent companies publishing a substantial proportion of the world’s research literature. The merger of Springer and Nature is just the latest in a long run of mergers and purchases.

A conventional market watcher will say that such mergers indicate that there are economies of scale that can be harnessed to deliver better returns. Cynics often point out that what is really happening is a recapitalisation that benefits a small number of people but doesn’t generate overall returns. Certainly in scholarly publishing there are serious questions about combining very different cultures creates benefits. It is not an accident that mergers and purchases are often followed by a mass exodus of staff. In the case of the Springer-Nature merger the departure of many senior ex-Nature staff is a clear indication of which of the two cultures is dominating the transition.

If we take Christenson seriously then a lack of disruption, and simultaneous consolidation implies that there are real economies of scale to be achieved. What are they? And are they enough to continue providing a sufficient barrier to entry that disruption won’t happen in the future?

Capital

Possibly the biggest economy of scale is the most obvious. Simply having access to resources. Elsevier’s public reports show that it has access to substantial amounts of cash (albeit as a credit facility). The 2015 RELX Annual Report [pdf] (p57) notes that “Development costs of £242m (2014: £203m) were capitalised within internally developed intangible assets, most notably investment in new products and related infrastructure in the Legal and Scientific, Technical & Medical businesses” giving a sense of the scale of investment. It is these resources that allow large corporations invest large amounts of money in various internal projects and also to buy up external developments, whether to add to the technology stack, bring in new expertise or to remove potential competitors for the market. Sufficient cash provides a lot of flexibility and opportunity for experimentation.

It’s not just the big players. PLOS is probably best characterised as a “medium sized” publisher but benefits from having access to capital built up during the period 2010-13 where it had significant surpluses. In the 2015 Annual Update PLOS reported $3.7M (8% of $46.5M total expenses) on R&D in FY 2014 and with a new platform launching imminently this probably saw a large uptick in FY 2015. eLife has not reported 2015 figures but again substantial development has gone into a recently launched platform, supported by access to capital. Smaller publishers, and particularly new market entrants don’t generally have access to the kind of capital that enables this scale of technology development. Community development is starting to make inroads into this and Open Source community projects are the most likely to challenge existing capital concentration but it is slow progress.

Broad Business Intelligence Base

Scholarly publishing is both slow and tribal. Publishers with a broad based set of journals have a substantial business intelligence advantage in understanding how publication behaviour and markets are changing. They also have privileged access to large numbers of authors, editors, and readers. They don’t always use this information well – the number of poorly constructed and misinterpreted surveys is appalling – and indeed sometimes still get tunnel vision based on the information they have, but nonetheless this is an incredible asset.

At best a disruptive market entrant will have deep insight into a specific community, and may be well placed to better serve that specific niche. The tradeoff is that they frequently struggle to scale beyond that base. This is in fact generally true of technology interventions in scholarly communications. What developers think is general often does not work outside of the proof of concept community. A variant of this is showing that something works, using the literature in Pubmed Central or Europe Pubmed Central as a demonstrator and failing to understand how the lack of infrastructure outside that narrow disciplinary space makes generalisation incredibly difficult.

Seeing a broad landscape and being able to see future trends developing is a powerful advantage of the big players. Used well it can drive the big strategic shifts (Springer’s big bet on APC funded Open Access, Elsevier’s massive shift in investment away from content and publishing into integrating Scopus/SciVal and Mendeley for information management). Used less effectively it can lead to stasis and confusion. But this information is an enormous strategic asset and one that smaller players struggle to compete with. Disruption in this context has to wait for the unforced error.

Diversified financial risks

Having the scale to develop a range of revenue sources is a huge benefit. Springer have a strong books business, Elsevier a significant revenue source in information. They are also diversified across disciplinary areas, some growing, some shrinking, and at generally have more coverage of varied geographies. New players like PeerJ, eLife, or Open Library of Humanities tend to have at best a few revenue sources, and often a restricted disciplinary focus. A lack of revenue diversity is certainly a risk factor for any growing business. On the other side most advice to start-ups to focus on developing one single revenue source, the transition from that start-up mode to diversification is often the challenge.

The other form of scale comes from having a sufficiently diverse and large journal portfolio to make up a big deal. It can be difficult for a small publisher to even get libraries to give them time when the amounts of money are relatively small. Effort is concentrated on the big negotiations. Having a stable of journals that includes “must-have” subscription titles along side volume (whether in mast heads or article numbers) that can be used to justify the annual price increases has been a solid strategy. The arguments for market failure for subscriptions are well rehearsed.

Ironically the serials crisis is leading to a new market emerging, a market in big deals themselves. With libraries increasingly asking the question of “which big deal do we cancel” the question of which of the big four is offering the worst deal becomes important. The deals being sought also differ. In North America it is usually Wiley or Springer deals being cancelled. The Elsevier big deal for subscription content generally appears to be better value for money in that context. In Europe, where the deal being sought includes provision of large scale Open Access publishing in some form it is the Elsevier big deal that more frequently looks at risk of being dropped. This kind of competition over big deals isn’t yet happening at a large scale but it does pose some risk of the economies of scale gained by a diverse journal list becoming a liability if large sets of (low volume and low prestige) titles become less attractive.

Successful Challengers

If we look at new and emerging entrants to the market that have succeeded we can see that in many cases their success lies in having a way around some of these economies of scale. Capital injection, directly through grants (PLOS, eLife, OLH) or from investors (Ubiquity, PeerJ) is common. Building on existing Open Source technology stacks (Ubiquity, Co-Action) and or applying deep technical knowledge to reduce start-up costs (PeerJ, Pensoft) is a strategy for avoiding this.

Successful startups have often created a new market in some form (PLOS ONE being the classic example) or build on deep experience of a specific segment (PeerJ, CoAction) or a new insight into how finances could work (Ubiquity, OLH). Many big players are poor at fully exploiting the business intelligence they have at their disposal. For whatever reason scholarly publishing is not a strongly data-led business in the way that term is usually used in the technology industry. Missed opportunities remains one of routes to success for the new smaller players.

Looking across the crop of existing and emerging new players revenue diversifications remains a serious weakness. And this limits the scale of any disruption they might lead. In this sense it could be argued that there are not yet any mature businesses amongst them. Ubiquity Press is probably the best example of a new publisher developing diversified revenue streams. Ubiquity’s offerings include an in-house OA publishing business with a a low enough cost-base to provide a range of funding models including APCs, endowments and community supported arrangements. It also provides services to a growing number of University Presses as well as underpinning the operations of OLH.

Real disruption of the big players will need a combination of financial stability, very low cost base, and technical systems that can truly scale. All of these need to come together at the same time as an ability to either co-opt or appropriate the existing markers of prestige. Christenson’s disruption narrative presumes that there is a parallel space or new market that can be created that is separate from existing assumptions about “quality” in the disrupted market. But when “quality” is really “prestige”, that is in a luxury goods market, this is much harder to achieve. The financial and business capacity to disrupt is not enough when quality, prestige and price are all coupled together in ways that don’t necessarily make any rational economic or business sense. That’s the piece I’ll move onto tackling next.

Critiquing the Standard Analytics Paper (reprise)

This morning I received an email from a senior policy person. They’d read my blog post on Marginal Costs of Article Publishing but they couldn’t seem to get to the original article from Standard Analytics that I was commenting on. I took a look myself and found the following.

sa-paper

If you remember the claim of the original article was that platform and cloud provision costs meant that really the marginal cost of publishing was below $1. My comment was that there were lots of costs that were missing, and that marginal costs per article probably isn’t a good way of analysing the problem anyway. But my first point was:

The first criticism to be made is that there are basic services included in scholarly publishing that have not been included in the calculation. Ironically one of those most important of these is the very system that Standard Analytics are demonstrating with their article, that of hosting and discovery. There is an expectation for modern web based publishers that they will provide some form of web portal for content with some level of functionality. While new options, including overlay journals based on archives, are developing, these do not remove this cost but place it elsewhere in the system. Preservation systems such as CLOCKSS and PORTICO are not designed to be the primary discovery and usage site for articles.

Running a high availability site with medium to high (by scholarly standards) traffic is not free. Calculating a per-article marginal cost for hosting is not straightforward, particularly given a lack of standards for the timeframe over which server costs should be amortised (how long do articles need to be available?) and the limited predictability of future bandwidth and storage costs.

But what does this tell us? Keeping a site up is hard. The scholarly community has expectations of availability and preservation, neither of which were met in this case. These cost money, and that cost is not easily calculated on a per-article basis, precisely because it isn’t really a per-article cost. It’s a platform cost.

Google didn’t help much, pointing me me to things that pointed to the dead link. There wasn’t an obvious backup copy anywhere. I did eventually find a copy of the article on the Internet Archive. But that was captured by another infrastructure, one that also costs money and one that relies on donations and grants to keep it running.

The real story here is not that Standard Analytics did a bad job. Something went wrong, I’m sure they’ll fix it shortly. It’s that it is hard work to meet the standards we expect for scholarly content. And that the systems that provide the levels of service that we expect are invisible until they go wrong. That’s why infrastructure is so important. And a lot of the “extra” cost that people complain about when they see prices over $10-50 (or $100 or $400) is going on different kinds of infrastructure.

Sure we can do it cheaper. We just need to discuss whether the service level we’ll get when things break is something we can live with.