This is a version of the paper I’ve had accepted for SciDataCon in a session on the sustainability of Research Data Infrastructures. It was also the basis for the session that I helped lead with Simon Coles at the Jisc-CNI meeting in mid-July in Oxford. The original version was quite short and skips over some of the background material and context. I’m hoping to work it up into a full paper at some point soon so any comments are welcome.
Summary
Infrastructures for data, such as repositories, curation systems, aggregators, indexes and standards are public goods. This means that finding sustainable economic models to support them is a challenge. This is due to free-loading, where someone who does not contribute to the support of the infrastructure nonetheless gains the benefit of it. The work of Mancur Olson (1965) suggests there are only three ways to address this for large groups: compulsion (often as some form of taxation) to support the infrastructure; the provision of non-collective (club) goods to those who contribute; or mechanisms that change the effective number of participants in the negotiation.
Subscription and membership models such as those used for online subscription journals and for some data infrastructures have been our traditional model and are an example of the second approach. These models are breaking down as the technology of the web and the agenda for transparency and open access leads to unbundling, the separate of the different services being provided. This tends to mean commercial suppliers focus on club and private good provision and neglect public good provision. Addressing this will require the development of support models more like taxation. However systems of taxation require a shared – and ideally globally shared – sense of the principles of governance and resource distribution.
In this paper I argue that the focus on sustainability models prior to seeking a set of agreed governance principles is the wrong approach. Rather we need to understand how to navigate from club-like to public-like goods. We need to define the communities that contribute and identify club-like benefits for those contributors. We need interoperable principles of governance and resourcing to provide public-like goods and we should draw on the political economics of taxation to develop this.
The provisioning challenge for public scholarly goods
The fundamental political and economic challenge for groups is the provision of “public goods†or “general utilityâ€. These are goods that are non-rivalrous – they can be infinitely shared – and non-excludable – it is difficult or impossible to stop someone using them. Classical economics tells us there is a provisioning problem for such goods, the rational individual actor will never contribute because whether they do or not, they can still benefit.
Infrastructures, such as repositories for data, articles and code, are very close to the ideal of public goods. Mancur Olson in The Logic of Collective Action (1965) discusses how group size has a profound influence on the provision of public goods, in particular noting that provision is only possible for small groups, or where the public good is a byproduct of the provision of non-public goods that are provided to contributors.
Indeed Olson’s description of the groups that can and cannot provide public goods maps closely onto scholarly infrastructures. Small communities frequently develop local infrastructures out of existing resources (and the contributions to these are usually biased strongly to the larger players in that community as Olson predicts). Large scale infrastructures are also provided by collaboration between small communities but in this case a community of funders (such as those that fund Europe Pubmed Central) or national governments (as is the case for physical infrastructures that are generally formed as Inter-governmental Organisation like CERN). The transition from small to large is challenging and “medium†sized infrastructures struggle to survive, moving from grant to grant, and in many cases shifting to a subscription model.
In the case of digital infrastructures a public good (such as an online article or dataset) can be converted to a club good (made excludable) by placing an authentication barrier around it to restrict access to subscribers (as is the case for online subscription journals and databases). Buchannan and those that further developed his 1965 paper on the economics of clubs have probed how club goods and club size relate (Buchannan, 1965). A core finding is that such sustainable clubs have an equilibrium size that depends on congestion in access to good (the extent to which it is purely non-rivalrous) and the value it provides. With digital resources congestion is low, and the club can therefore grow large. This creates a challenge. Digital resources are not natively excludable, a technical barrier has to be put in place. As the group size rises the likelihood of “leakage†(sharing, or piracy if you prefer) increases. Thus resources are expended on strengthening excludability which leads to both economic and political costs as seen in the Open Access debate.
Solutions to the provisioning problem
If our political goal is to provide large scale access, to make the goods created more public-like through the provision of shared infrastructures then we need to develop a political economics of this “public-makingâ€. Olson provides three routes to creating sustainable infrastructures providing public goods:
- Compulsion: The good is provided through a mechanism that requires contributions from the whole community. Closed Union Shops, where all workers in a given company are required to be members of a union are an example that Olson discusses in detail. Taxation is another example. In the scholarly infrastructure space, overhead and indirect costs taken by institutions are an example, as is the top slicing of funder budgets to provide infrastructures and services.
- By-product: The public good is provided through a mechanism that additionally provides club-like and/or private goods to contributors. Olson discusses insurance schemes only available to members of mutual benefit societies that also lobby on behalf of their members. In the research enterprise publishers have to join Crossref to be able to assign Crossref DOIs to outputs (a club good). As a by-product the whole community has access to an interoperable metadata set with defined schemas and access points (a public good).
- Effective oligopoly: There are far too many funders globally for them to (easily) agree a mechanism for all contributing to any shared scheme. However because a relatively small set of funders fund a substantial proportion of biomedical research they are able to agree mechanisms to fund data infrastructures such as Europe Pubmed Central. Other funders may contribute but many will free-ride.
The difficult truth is that there is no mechanism that will directly lead to a large community supporting the provision of a large-scale public good infrastructure. Any successful sustainability model will depend on some mixture of these three approaches for resourcing. There are interesting models for solving some of these collective action problems such as crowd-funding models where a project only proceeds if sufficient contributions are made, but these amount in effect to new ways to implement compulsion and often also depend on a by-product strategy (the contributor benefits).
If our challenge in delivering on the openness and transparency agenda is one of supporting the conversion of successful medium-scale club-like provision of infrastructure into open systems providing public goods then we need to solve the political and economic problems of transitioning from the club state to a model that successfully provides a mix of these models.
The politics of compulsion: The need for shared principles
Of the three approaches to sustainability, it is generally the second which infrastructures are expected to pursue as they grow. Membership models can work in those cases where there are club goods being created which attract members. Training experiences or access to valued meetings are possible examples. In the wider world this parallels the “Patreon†model where members get exclusive access to some materials, access to a person (or more generally expertise), or a say in setting priorities. Much of this mirrors the roles that Scholarly Societies play or at least could play.
In the scholarly infrastructures space the compulsion/taxation and oligopoly approach are very similar in practice as top slicing funder resources amounts to a tax on overall research funds. Some membership models also approach the level of compulsion. While this is rare in scholarly communities, it is common in professional communities such as medicine, law, and some areas of engineering. Schemes offering professional certification (including the validation of degree programs) blur this boundary as well.
The word “compulsion†is pejorative but there are many activities within the work of researchers that are compulsory. Gaining a doctorate, publishing at some level, having access to the literature in some form are all effectively compulsory. These forms of compulsion (or call them “social expectations†if you prefer) are considered acceptable because the fit within a known and understood system of rules. Systems of taxations are acceptable, according to Adam Smith (1776) where there is proportionality, predictability, convenience and efficiency. Today we would also add representation in governance and sustainability. This requires us to build institutions, in the sense that Elinor Ostrom describes them: “institutions are the prescriptions that humans use to organize all forms of repetitive and structured interactions†(Ostrom, 2005). Much of political economics is bound up in trying to justify post-hoc the provision of institutions like governments, courts, the law by inventing things like “the social contractâ€. Our advantage as a community, or communities, is that we could explicitly develop agree principles of operation as a way of reducing the costs of creating institutions.
A common set of principles for foundational infrastructures
Building institutions is hard. It takes resources. To reduce costs it makes sense to build templates; sets of agreed principles under which such institutions and systems should operate. If our communities can sign up to a set of principles up front, then building institutions and infrastructures that reflect those principles should become a lot easier.
To address this a draft set of principles were developed to provoke a conversation about the governance and management of these infrastructures (Bilder, Lin, Neylon, 2015). Our principles rest on three pillars: transparency and community governance; financial sustainability, efficiency and commitment to community needs; and mechanisms to protect integrity and manage and mitigate the risk of failures. They draw from the observation of successes in providing foundational infrastructures and seek to generalise these. The focus is on building trustworthy institutions.
It is interesting to note that these also map quite closely to Adam Smith’s four principles for sound taxation (Smith, 1776). The commitment to representation is more modern, and the concept of enabling a community to fork a project through committing to Open Source and Open Data – while modern in its approach – is an expression of the principle of efficiency, a mechanism for the community to restrain costs in effect.
The principles (or a future refinement or replacement for them) can serve in two ways. First they could be used to set out the minimum requirements for governance and sustainability that are required before funders are willing to directly fund (the oligogopoly or tax mechanisms). Second they provide a template for a developing club, either one built in a community sufficiently small to bootstrap its funding, or one that has found a byproduct model, to demonstrate its ability to make the transition from club to infrastructure.
Predictions and practical consequences
The preceding is an abstract argument. What are its practical consequences for actually sustaining infrastructures? First we can make a prediction to be tested:
All sustainable scholarly infrastructures providing collective (public-like) goods to the research community will be funded on one of the three identified models (taxation, byproduct, oligopoly) or some combination of them.
Second, we can look at stable long standing infrastructures (Crossref, Protein Data Bank, NCBI, arXiv, SSRN) and note that in most cases governance arrangements are an accident of history and were not explicitly planned. Crises of financial sustainability (or challenges of expansion) for these organisations are often coupled to or lead to a crisis in governance, and in some cases a breakdown of community trust. Changes are therefore often made to governance in response to a specific crisis.
Where there is governance planning it frequently adopts a “best practice†model which looks for successful examples to draw from. It is not often based on “worse case scenario†planning. We suggest that this is a problem. We can learn as much from failures of sustainability and their relationship to governance arrangements as from successes.
Above all the key is to learn from our experiences, as well as from the theory of economics and governance to identify the patterns and templates that provide resilient organisations and infrastructures that through being trustworthy earn the trust of their communities through both the good times and the bad.
References
- Bilder G, Lin J, Neylon C 2015 Principles for Open Scholarly Infrastructure-v1, Available at http://dx.doi.org/10.6084/m9.figshare.1314859 [Last Accessed 30 May 2016]
- Buchannan, JM 1965 An Economic Theory of Clubs Economica, 32(125):1-14
- Olson, M 1965 The Logic of Collective Action: Public Goods and the Theory of Groups (Revised ed.). Harvard University Press, Cambridge Massachussets
- Ostrom, E 2005 Understanding Institutional Diversity (Revised ed.). Princeton University Press, Princeton
- Smith, A 1776 An Inquiry into the Nature and Causes of the Wealth of Nations. W. Strahan and T. Cadell, London