network ready research – Science in the Open

BBN Technologies TCP/IP internet map early 1986 — Image via Wikipedia

Prior to all the nonsense with the Research Works Act, I had been having a discussion with Heather Morrison about licenses and Open Access and peripherally the principle of requiring specific licenses of authors. I realized then that I needed to lay out the background thinking that leads me to where I am. The path that leads me here is one built on a technical understanding of how networks functional and what their capacity can be. This builds heavily on the ideas I have taken from (in no particular order)Â Jon Udell, Jonathan Zittrain, Michael Nielsen, Clay Shirky, Tim Oâ€™Reilly, Danah Boyd, and John WilbanksÂ among many others. Nothing much here is new but it remains something that very few people really get. Ironically the debate over the Research Works Act is what helped this narrativeÂ crystallise. This should be read as a contribution to Heather’s suggested “Articulating the Commons” series.

A pragmatic perspective

I am at heart a pragmatist. I want to see outcomes, I want to see evidence to support the decisions we make about how to get outcomes. I am happy to compromise, even to take tactical steps in the wrong direction if they ultimately help us to get where we need to be. In the case of publicly funded research we need to ensure that the public investment in research is made in such a way that it maximizes those outcomes. We may not agree currently on how to prioritize those outcomes, or the timeframe they occur on. We may not even agree that we can know how best to invest. But we can agree on the principle that public money should be effectively invested.

Ultimately the wider global public is for the most part convinced that research is something worth investing in, but in turn they expect to see outcomes of that research, jobs, economic activity, excitement, prestige, better public health, improved standards of living. The wider public are remarkably sophisticated when it comes to understanding that research may take a long time to bear fruit. But they are not particularly interested in papers. And when they become aware of academiaâ€™s obsession with papers they tend to be deeply unimpressed. We ignore that at our peril.

So it is important that when we think about the way we do research, that we understand the mechanisms and the processes that lead to outcomes. Even if we canâ€™t predict exactly where outcomes will spring from (and I firmly believe that we cannot) that does not mean that we can avoid the responsibility of thoughtfully designing our systems so as to maximize the potential for innovation. The fact that we cannot, literally cannot under our current understanding of physics, follow the path of an electron through a circuit does not meant that we cannot build circuits with predictable overall behaviour. You simply design the system at a different level.

The assumptions underlying research communication have changed

So why are we having this conversation? And why now? What is it about todayâ€™s world that is so different? The answer, of course, is the internet. Our underlying communications and information infrastructure is arguably undergoing its biggest change since the development of the Gutenbergâ€™s press. Like all developments of new communication networks, SMS, fixed telephones, the telegraph, the railways, and writing itself, the internet doesnâ€™t just change how well we can do things, it qualitatively changes what we can do. To give a seemingly trivial example the expectations and possibilities of a society with mobile telephones is qualitatively different and their introduction has changed the way we behave and expect others to behave. The internet is a network on a scale, and with connectivity, that we have never had before. The potential change in our capacity as individuals, communities, and societies is therefore immense.

Why do networks change things? Before a network technology spreads you can imagine people, largely separated from each other, unable to communicate in this new way. As you start to make connections nothing much really happens, a few small groups can start to communicate in this new way, but that just means that they can do a few things a bit better. But as more connections form suddenly something profound happens. There comes a point where there is a transition â€“ where suddenly nearly everyone is connected. For the physical scientists this is in fact a phase transition and can display extreme cooperativity â€“ a sudden break where the whole system crystallizes into a new state.

At this point the whole is suddenly greater than the sum of its parts. Suddenly there is the possibility of coordination, of distribution of tasks that was simply not possible before. The internet simply does this better than any other network we have ever had. It is better for a range of reasons but they key ones are: its immense scale – connecting more people, and now machines than any previous network; its connectivity â€“ the internet is incredibly densely connected, essentially enabling any computer to speak to any other computer globally; its lack of friction â€“ transfer of information is very low cost, essentially zero compared to previous technologies, and is very very easy. Anyone with a web browser can point and click and be a part of that transfer.

What does this mean for research?

So if the internet and the web bring new capacity, where is the evidence that this is making a difference? If we have fundamentally new capacity where are the examples of that being exploited? I will give two examples, both very familiar to many people now, but ones that illustrate what can be achieved.

In late January 2009 Tim Gowers, a Fields medalist and arguably one of the worlds greatest living mathematicians, posed a question. Could a group of mathematicians working together be better at solving a problem than one on their own. He suggested a problem, one that he had an idea how to solve but felt was too challenging to tackle on his own. He then started to hedge his bets, stating:

â€œIt is not the case that the aim of the project is [to solve the problem but rather it is to see whether the proposed approach was viable] I think that the chances of success even for this more modest aim are substantially less than 100%.â€

A loose collection of interested parties, some world leading mathematicians, others interested but less expert, started to work on the problem. Six weeks later Gowerâ€™s announced that he believed the problem solved:

â€œI hereby state that I am basically sure that the problem is solved (though not in the way originally envisaged).â€

In six weeks a non-planned assortment of contributors had solved a problem that a world-leading mathematician had thought both interesting, and too hard. And had solved it by a route other than the one he had originally proposed. Gowerâ€™s commented:

â€œIt feels as though this is to normal research as driving is to pushing a car.â€

For one of the worldâ€™s great mathematicians, there was a qualitative difference in what was possible when a group of people with the appropriate expertise were connected via a network through which they could easily and effectively transmit ideas, comments, and proposals. Three key messages emerge, the scale of the network was sufficient to bring the required resources to bear, the connectivity of the network was sufficient that work could be divided effectively and rapidly, and there was little friction in transferring ideas.

The Galaxy Zoo project arose out of a different kind of problem at at different kind of scale. One means of testing theories of the history and structure of the universe is to look at the numbers and types of different categories of galaxy in the sky. Images of the sky are collected and made freely available to the community. Researchers will then categories galaxies by hand to build up data sets to allow them to test theories. An experienced researcher could perhaps classify a hundred galaxies in a day. A paper might require a statistical sample of around 10,000 galaxy classifications to get past peer review. One truly heroic student classified 50,000 galaxies within their PhD, declaring at the end that they would never classify another again.

However problems were emerging. It was becoming clear that the statistical power offered by even 10,000 galaxies was not enough. One group would get different results to another. More classifications were required. Data wasnâ€™t the problem. The Sloan Digital Sky Survey had a million galaxy images. But computer based image categorization wasnâ€™t up to the job. The solution? Build a network. In this case a network of human participants willing to contribute by categorizing the galaxies. Several hundred thousand people classified the millions of images several times over in a matter of months. Again the key messages: scale of the network – both the number of images and the number of participants; the connectivity of the network â€“ the internet made it easy for people to connect and participate; a lack of friction â€“ sending images one way, and a simple classification was easy. Making the website easy, even fun, for people to use was a critical part of the success.

Galaxy Zoo changed the scale of this kind of research. It provided a statistical power that was unheard of and made it possible to ask fundamentally new types of questions. It also enabled fundamentally new types of people to play an effective role in the research, school children, teachers, full time parents. It enabled qualitatively different research to take place.

So why hasnâ€™t the future arrived then?

These are exciting stories, but they remain just that. Sure I can multiply examples but they are still limited. We havenâ€™t yet taken real advantage of the possibilities. There are lots of reasons for this but the fundamental one is inertia. People within the system are pretty happy for the most part with how it works. They donâ€™t want to rock the boat too much.

But there are a group of people who are starting to be interested in rocking the boat. The funders, the patient groups, that global public who want to see outcomes. The thought process hasnâ€™t worked through yet, but when it does they will all be asking one question. â€œHow are you building networks to enable researchâ€. The question may come in many forms – â€œHow are you maximizing your research impact?â€ â€“ â€œWhat are you doing to ensure the commercialization of your research?â€ â€“ â€œWhere is your research being used?â€ â€“ but they all really mean the same thing. How are you working to make sure that the outputs of your research are going into the biggest, most connected, lowest friction, network that they possibly can.

As service providers, all of those who work in this industry â€“ and I mean all, from the researchers to the administrators, to the publishers to the librarians â€“ will need to have an answer. The suprising thing is that itâ€™s actually very easy. The web makes building and exploiting networks easier than it has ever been because it is a network infrastructure. It has scale, billions of people â€“ billions of computers â€“ exabytes of information resources â€“ exaflops of computational resources. It has connectivity on a scale that is literally unimaginable â€“ the human mind canâ€™t conceive of that number of connections because the web has more. It is incredibly low in friction â€“ the cost of information transfer is in most cases so close to zero as to make no difference.

Service requirements

To exploit the potential of the network all we need to do is get as much material online as fast as we can. We need to connect it up, to make it discoverable, to make sure that people can find and understand and use it. And we need to ensure that once found those resources can be easily transferred, shared, and used. And used in any way â€“ at network scale the system is designed to ensure Â that resources get used in unexpected ways. At scale you can have serendipity by design, not by blind luck.

The problem arises with the systems we have in place to get material online. The raw material of science is not often in a state where putting it online is immediately useful. It needs checking, formatting, testing, indexing. All of this does require real work, and real money. So we need services to do this, and we need to be prepared to pay for those services. The trouble is our current system has this backwards. We donâ€™t pay directly for those services so those costs have to be recouped somehow. And the current set of service providers do that by producing the product that we really need and want and then crippling it.

Currently we take raw science and through a collaborative process between researchers and publishers we generate a communication product, generally a research paper, that is what most of the community holds as the standard means by which they wish to receive information. Because the publishers receive no direct recompense for their contribution they need to recover those costs by other means. They do this by artificially introducing friction and then charging to remove it.

This is a bad idea on several levels. Firstly because it means the product we get doesnâ€™t have the maximum impact it could, because its not embedded in the largest possible network. From a business perspective it creates risks, publishers have to invest up front and then recoup money later, rather than being confident that expenditure and cash flow are coupled. This means, for instance that if there is a sudden rise (or fall) in the number of submissions there is no guarantee that cash flows or costs will scale with that change. But the real problem is that it distorts the market. Because on the researcher side we donâ€™t pay for the product of effective communication we donâ€™t pay much attention to what weâ€™re getting. On the publisher side it drives a focus on surface and presentation, because it enhances the product in the current purchasers eyes, rather than a ruthless focus on production costs and shareability.

Network Ready Research Communication

If we care about taking advantage of the web and internet for research then we must tackle the building of scholarly communication networks. These networks will have those critical characteristics described above, scale and a lack of friction. The question is how do we go about building them. In practice we actually already have a network at huge scale â€“ the web and the internet do that job for us, connecting essentially all professional researchers and a large proportion of the interested public. There is work to be done on expanding the reach of the network but this is a global development goal, not something specific to research.

So if we already have the network then what is the problem? The issue lies in the second characteristic â€“ friction. Our current systems are actually designed to create friction. Before the internet was in place our network was formed of a distribution system involving trucks and paper â€“ reducing costs to reasonable levels meant charging for that distribution process. Today those distribution costs have fallen to as near zero as makes no difference, yet we retain the systems that add friction unnecessarily. Slow review processes, charging for access, formats and discovery tools that are no longer fit for purpose.

What we need to do is focus on the process of taking research we that we do and convert it into a Network Ready form. That is we need to have access to the services that take our research and make them ready to exploit our network infrastructure â€“ or we need to do it ourselves. What does â€œNetwork Readyâ€ mean? A piece of Network Ready Research will be modular and easily discoverable, it will present different facets that will allow people and systems to use it in a wide variety of ways, it will be compatible with the widest range of systems and above all it will be easily shareable. Not just copyable or pasteable but easily shared through multiple systems while carrying with it all the context required to make use of it, all the connections that will allow a user to dive deeper into its component parts.

Network Ready Research will be interoperable, socially, technically, and legally with the rest of the network. The network is more than just technical infrastructure. It is also built up from the social connections, a shared understanding of the parameters of re-use, and a compatible system of checks and balances. The network is the shared set of technical and social connections that together enable new connections to be made. Network Ready Research will move freely across that, building new connections as it goes, able to act as both connecting edge and connected node in different contexts.

Building and strengthening the network

If you believe the above, as I do, then you see a potential for us to qualitatively change our capacity as a society to innovate, understand our world, and help to make it a better place. That potential will be best realized by building the largest possible, most effective, and lowest friction network possible. A networked commons in which ideas and data, concepts and expertise can be most easily shared, and can most easily find the place where they can do the most good.

Therefore the highest priority is building this network, making its parts and components interoperable, and making it as easy as possible to connect up networks that already exist. For an agency that funds research and seeks to ensure that research makes a difference the only course of action is to place the outputs of that research where they are most accessible on the network. In blunt terms that means three things: free at the point of access, technically interoperable with as many systems as possible, and free to use for any purpose. The key point is that at network scale the most important uses are statistically likely to be unexpected uses. We know we canâ€™t predict the uses, or even success, of much research. That means we must position it so it can be used in unexpected ways.

Ultimately, the bigger the commons, the bigger the network, the better. And the more interoperable and the widest range of uses the better. That ultimately is why I argue for liberal licences, for the exclusion of non-commercial terms. It is why I use ccZero on this blog and for software that I write where I can. For me, the risk of commercial enclosure is so much smaller than the risk of not building the right networks, or of creating fragmented incompatible networks, of ultimately not being able to solve the crises we face today in time to do any good, that the course of action is clear. At the same time we need to build up the social interoperability of the network, to call out bad behavior and perhaps in some cases to isolate its perpetrators but we need to find ways of doing this that donâ€™t damage the connectivity and freedom of movement on the network. Legal tools are useful to assure users of interoperability and their rights, otherwise they just become a source of friction. Social tools are a more viable route for encouraging desirable behaviour.

The priority has to be achieving scale and lowering friction. If we can do this then we have the potential to create a qualitative jump in our research capacity on a scale not seen since the 18th centuryÂ and perhaps never. And it certainly feels like we need it.

Tag: network ready research

On the 10th Anniversary of the Budapest Declaration

Network Enabled Research: Maximise scale and connectivity, minimise friction