Where are the pipes? Building Foundational Infrastructures for Future Services

Cite as “Bilder G, Lin J, Neylon C (2016) Where are the pipes? Building Foundational Infrastructures for Future Services, retrieved [date], http://cameronneylon.net/blog/where-are-the-pipes-building-foundational-infrastructures-for-future-services/ ‎”
Utility RodeoYou probably don’t think too much about where all the services to your residence run. They go missing from view until something goes wrong. But how do we maintain them unless they are identified? An entire utilities industry, which must search for utility infrastructure, hangs in the balance on this knowledge. There’s even an annual competition, a rodeo no less, to crown the best infrastructure locators in the land, rewarding those who excel at re-discovering where lost pipes and conduits run.

It’s all too easy to forget where the infrastructure lies when it’s running well. And then all too expensive if you have to call in someone to find it again. Given that preservation and record keeping lies at the heart of research communications, we’d like to think we could do a better job. Good community governance makes it clear who is responsible for remembering where the pipes run as well as keeping them “up to code”. Turns out that keeping the lights on and the taps running involves the greater We (i.e., all of us).

Almost a year ago, we proposed a set of principles for open scholarly infrastructure based on discussions over the past decade or so. Our intention was to generate conversation amongst funders, infrastructure players, tool builders, all those who identify as actors in the scholarly ecosystem. We also sought test cases for these principles, real world examples that might serve as reference models and/or refine the set of principles.

One common question emerged from the ensuing discussions in person and online, and we realized we ducked a fundamental question: “what exactly is infrastructure? In our conversations with scholars and others in the research ecosystem, people frequently speak of “infrastructures” to reference services or tools that support research and scholarly communications. The term is gaining currency as more scholars and developers are turning their attention to a) the need for improved research tools and platforms that can be openly shared and adopted and b) that sense that some solutions are interoperable and more efficiently implemented across communities (for example, see this post from Bill Mills).

It is exciting for us that the principles might have a broader application and we are more than happy to talk with groups and organisations that are interested in how the principles might apply in these setting. However our interest was always in going deeper. The kinds of “infrastructures” – we would probably say “services” – that Bill is talking about rely on deeper layers. Our follow-up blog post was aimed at addressing the question, but it ended with a koan:

“It isn’t what is immediately visible on the surface that makes a leopard a leopard, otherwise the black leopard wouldn’t be, it is what is buried beneath.”

Does making the invisible visible require an infrared camera trap or infrastructure rodeo competitions? Thankfully, no. But we do continue to see the need to shine a light on those deeper underlying layers to reveal the leopard’s spots. This is the place where we feel the most attention is needed, precisely because these layers are invisible and largely ignored. We have started to use the term “foundational infrastructure” to distinguish these deeper layers, that for us need to be supported by organisations with strong principles.

The important layers applicable to foundational infrastructure seem to be:

  • Storage: places to put stuff that is generated by research, including possibly physical stuff
  • Identifiers: means of uniquely identifying each of the – sufficiently important – objects the research process created
  • Metadata: information about each of these objects
  • Assertions: relationships between the objects, structured as assertions that link identifiers

These requirements nicely spell out SIMA, a term that in geology refers to the layer of crust that sits below both the ocean and continental crusts, satisfying both the idea of a foundational layer and also one that is global. Examples of organizations that might fall under this rubric include:

  • ISNI
  • ORCID
  • ArXiV
  • CHORUS
  • DSpace
  • Worldwide Protein Data Bank (PDB)

This is just a beginning list and some might object to the inclusion of one or more on the list. Certainly many will object to some that are missing and certainly none would fully qualify as compliant with the principles. Some have a disciplinary focus, some are at least perceived to be controlled by or serving specific stakeholder interests rather than the community as a whole. That “perceived to be” can be as important as real issues of control. If an infrastructure is truly foundational it needs to be trusted by the whole community. That trust is the foundation in a very real sense, the core on which the SIMA infrastructures should sit.

We originally avoided a list of names as we didn’t want to give the impression of criticising specific organisations against a set of principles we still think of as being at a draft stage. Now we name these examples because we’d like to elevate the conversation on the importance of these foundational infrastructures and the organisations that support them. Some examples may never fit the community principles. Some might with changes in governance or focus. The latter are of particular interest: what would those changes look like and how can we identify those infrastructures which are truly foundational? Institutional support for these organisations in the long run is a critical community discussion.

At the moment, numerous cross-stakeholder initiatives for new services are being developed, and in many cases being hampered by the lack of shared, reliable, and trusted foundations of SIMA infrastructures. Where these infrastructures do exist, initiatives take hold and thrive. Where they are patchy, we struggle. At the same time these issues are starting to be recognised more widely, for instance in the technology world where Nadia Eghbal recently left a venture capital firm to investigate important projects invisible to VCs. She identified Open Source Infrastructure, as a class of foundational projects “which tech simply cannot do without” but which are generally without any paths to funding.

Identifying foundational infrastructure – an institutional construct – is neither an art nor a science. There’s no point in drawing lines in the sand for its own sake. This question is more a means to a greater end if we take a holistic point of view: how do we create a healthy, robust ecosystem of players that support and enable scholarly communications? We know all of these layers are necessary, each with different roles they play and serving distinct communities. They each have different paths for setting up and sustainability. It is precisely the fact that these common needs are boring that means they starts to disappear from view, in some cases before they even get built. Understanding the distinctions between these two layers will help us better support both of them.

The other option is to forget where the pipes are, and to have to call in someone to find them again.

Over, and over, and over again…

Credits:

  • cowboy hat by Lloyd Humphreys from the Noun Project, used here under a CC-BY license
  • tower by retinaicon from the Noun Project, used here under a CC-BY license

The authors are writing in a personal capacity. None of the above should be taken as the view or position of any of our respective employers or other organisations.