networks – Science in the Open

July 10, 2013July 10, 2013

Open is a state of mind

English: William Henry Fox Talbot's 'The Open ... — English: William Henry Fox Talbot’s ‘The Open Door’ (Photo credit: Wikipedia)

“Open source” is not a verb

Nathan YerglerÂ viaÂ John Wilbanks

I often return to the question of what “Open” means and why it matters. Indeed the very first blog post I wrote focussed on questions of definition. Sometimes I return to it because people disagree with my perspective. Sometimes because someone approaches similar questions in a new or interesting way. But mostly I return to it because of the constant struggle to get across the mindset that it encompasses.

Most recently I addressed the question of what “Open” is about in a online talk I gave for the Futurium Program of the European Commission (video is available). In this I tried to get beyond the definitions of Open Source, Open Data, Open Knowledge, and Open Access to the motivation behind them, something which is both non-obvious and conceptually difficult. All of these various definitions focus onÂ mechanismsÂ – on the means by which you make things open – but not on the motivations behind that. As a result they can often seem arbitrary and rules-focussed, and do become subject to the kind of religious wars that result from disagreements over the application of rules.

In the talk I tried to move beyond that, to describe the motivation and theÂ mind setÂ behind taking an open approach, and to explain why this is so tightly coupled to the rise of the internet in general and the web in particular. Being open as opposed to making open resources (or making resources open) is about embracing a particular form of humility. For the creator it is about embracing the idea that – despite knowing more about what you have done than any other person – Â the use and application of your work is something that you cannot predict. Similarly for someone working on a project being open is understanding that – despite the fact you know more about the project than anyone else – that crucial contributions and insights could come from unknown sources. At one level this is just a numbers game, given enough people it is likely that someone, somewhere, can use your work, or contribute to it in unexpected ways. As a numbers game it is rather depressing on two fronts. First, it feels as though someone out there must be cleverer than you. Second, it doesn’t help because you’ll never find them.

Most of our social behaviour and thinking feels as though it is built around small communities. People prefer to be a (relatively) big fish in a small pond, scholars even take pride in knowing the “six people who care about and understand my work”, the “not invented here” syndrome arises from the assumption that no-one outside the immediate group could possibly understand the intricacies of the local context enough to contribute. It is better to build up tools that work locally rather than put an effort into building a shared community toolset. Above all the effort involved in listening for, and working to understand outside contributions, is assumed to be wasted. There is no point “listening to the public” because they will “just waste my precious time”. We work on the assumption that, even if we accept the idea that there are people out there who could use our work or could help, that we can never reach them. That there is no value in expending effort to even try. And we do this for a very good reason; because for the majority of people, for the majority of historyÂ it was true.

For most people, for most of history, it was only possible to reach and communicate with small numbers of people. And that means in turn that for most kinds of work, those networks were simply not big enough to connect the creator with the unexpected user, the unexpected helper with the project. The rise of the printing press, and then telegraph, radio, and television changed the odds, but only the very small number of people who had access to these broadcast technologies could ever reach larger numbers. And even they didn’t really have the tools that would let them listen back. What is different today is the scale of the communication network that binds us together. By connecting millions and then billions together the probability that people who can help each otherÂ canÂ be connected has risen to the point that for many types of problem that they actuallyÂ are.

That gap between “can” and “are”, the gap between the idea that there is a connection with someone, somewhere, that could be valuable, and actually making the connection is the practical question that underlies the idea of “open”. How do we make resources, discoverable, and re-usable so that they can find those unexpected applications? How do we design projects so that outside experts can both discover them and contribute? Many of these movements have focussed on the mechanisms of maximising access, the legal and technical means to maximise re-usability. These are important; they are a necessary but not sufficient condition for making those connections. Making resources open enables, re-use, enhances discoverability, and by making things more discoverable and more usable, has the potential to enhance both discovery and usability further. But beyond merely making resources open we also need toÂ be open.

Being open goes in two directions. First we need to be open to unexpected uses. The Open Source community was first to this principle by rejecting the idea that it is appropriate to limitÂ whoÂ can use a resource. The principle here is that by being open to any use you maximise the potentialÂ forÂ use. Placing limitations always has the potential to block unexpected uses. But the broader open source community has also gone further by exploring and developing mechanisms that support the ability of anyone toÂ contribute to projects. This is why Yergler says “open source” is not a verb. You can license code, you can make it “open”, but that does not create an Open Source Project. You may have a project to create open source code, an “Open-source project“, but that is not necessarily a project that is open, an “Open source-project“. Open Source is not about licensing alone, but about public repositories, version control, documentation, and the creation of viable communities. You don’t just throw the code over the fence and expect a project to magically form around it, you invest in and support community creation with the aim of creating a sustainable project. Successful open source projects put community building, outreach, both reaching contributors and encouraging them, at their centre. The licensing is just an enabler.

In the world of Open Scholarship, and I would include both Open Access and Open Educational Resources in this, we are a long way behind. There are technical and historical reasons for this but I want to suggest that a big part of the issue is one of community. It is in large part about a certain level of arrogance. An assumption that others, outside our small circle of professional peers, cannot possibly either use our work or contribute to it. There is a comfort in this arrogance, because it means we are special, that we uniquely deserve the largesse of the public purse to support our work because others cannot contribute. It means do note need to worry about access because the small group of people who understand our work “already have access”. Perhaps more importantly it encourages the consideration of fears about what might go wrong with sharing over a balanced assessment of the risks of sharing versus the risks of not sharing, the risks of not finding contributors, of wasting time, of repeating what others already know will fail, or of simply never reaching the audience who can use our work.

It also leads to religious debates about licenses, as though a license were the point or copyright was really a core issue. Licenses are just tools, a way of enabling people to use and re-use content. But the license isn’t what matters, what matters is embracing the idea that someone, somewhere can use your work, that someone, somewhere can contribute back, and adopting the practices and tools that make it as easy as possible for that to happen. And that if we do this collectively that the common resource will benefit us all. This isn’t just true of code, or data, or literature, or science. But the potential for creating critical mass, for achieving these benefits, is vastly greater with digital objects on a global network.

All the core definitions of “open” from the Open Source Definition, to the Budapest (and Berlin and Bethesda) Declarations on Open Access, to the Open Knowledge Definition have a common element at their heart – that an open resource is one that any person can use for any purpose. This might be good in itself, but thats not the real point, the point is that it embraces the humility of not knowing. It says, I will not restrict uses because that damages the potential of my work to reach others who might use it. And in doing this I provide the opportunity for unexpected contributions. With Open Access we’ve only really started to address the first part, but if we embrace the mind set of being open then both follow naturally.

February 14, 2012

On the 10th Anniversary of the Budapest Declaration

Budapest: Image from Wikipedia, by Christian MehlfÃ¼hrer

Ten years ago today, the Budapest Declaration was published. The declaration was the output of a meeting held some months earlier, largely through the efforts of Melissa Hagemann, that brought together key players from the, then nascent, Open Access movement. BioMedCentral had been publishing for a year or so, PLoS existed as an open letter, Creative Commons was still focussed on building a commons and hadn’t yet released its first licences. The dotcom bubble had burst, deflating many of the exuberant expectations of the first generation of web technologies and it was to be another year before Tim O’Reilly popularised the term “Web 2.0” arguably marking the real emergence of the social web.

In that context the text of the declaration is strikingly prescient. It focusses largely on the public good of access to research, a strong strand of the OA argument that remains highly relevant today.

“An old tradition and a new technology have converged to make possible an unprecedented public good. The old tradition is the willingness of scientists and scholars to publish the fruits of their research in scholarly journals without payment, for the sake of inquiry and knowledge. The new technology is the internet.Â The public good they make possible is the world-wide electronic distribution of the peer-reviewed journal literature and completely free and unrestricted access to it by all scientists, scholars, teachers, students, and other curious minds. Removing access barriers to this literature will accelerate research, enrich education, share the learning of the rich with the poor and the poor with the rich, make this literature as useful as it can be, and lay the foundation for uniting humanity in a common intellectual conversation and quest for knowledge.”

But at the same time, and again remember this is at the very beginning of the development of the user-generated web, the argument is laid out to support a networked research and discovery environment.

“…many different initiatives have shown that open access […] gives readers extraordinary power to find and make use of relevant literature, and that it gives authors and their worksÂ vast and measurableÂ newÂ visibility,Â readership, andÂ impact.”

But for me, the core of the declaration lies in its definition. At one level it seems remarkable to have felt a need to define Open Access, and yet this is something we still struggle with this today. The definition in the Budapest Declaration is clear, direct, and precise:

“By ‘open access’ to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.”

Core to this definition are three things. Access to the text, understood as necessary to achieve the other aims; a limitation on restrictions and a limitation on the use of copyright to only support the integrity and attribution of the work – which I interpret in retrospect to mean the only acceptable licences are those that require attribution only. But the core forward looking element lies in the middle of the definition, focussing as it does on specific uses; crawling, passing to software as data, that would have seemed outlandish, if not incomprehensible, to most researchers at the time.

In limiting the scope of acceptable restrictions and in focussing on the power of automated systems, the authors of the Budapest declaration recognised precisely the requirements of information resources that we have more recently come to understand as requirements for effective networked information. Ten years ago, before Facebook existed, let alone before anyone was talking about frictionless sharing – the core characteristics were identified that would enable research outputs to be accessed and read, but above all integrated, mined, aggregated and usedÂ in ways that their creators did not, could not, expect. The core characteristics of networked information that enable research outputs to become research outcomes. The characteristics that will maximise the impact of that research.

I am writing this in a hotel room in Budapest. I am honoured to have been invited to attend a meeting to mark the 10th anniversary of the declaration and excited to be discussing what we have learnt over the past ten years and how we can navigate the next ten. The declaration itself remains as clear and relevant today as it was ten years ago. Its core message is one of enabling the use and re-use of research to make a difference. Its prescience in identifying exactly those issues that best support that aim in a networked world is remarkable.

In looking both backwards, over the achievements of the past ten years, and forwards, towards the challenges and opportunities that await us when true Open Access is achieved, the Budapest Declaration is, for me, the core set of principles that can guide us along the path to realising the potential of the web for supporting research and its wider place in society.

July 29, 2008December 30, 2009

We need to get out moreâ€¦

The speaker had started the afternoon with a quote from Ian Rogers, â€˜Losers wish for scarcity. Winners leverage scale.â€™ He went on to eloquently, if somewhat bluntly, make the case for exposing data and discuss the importance of making it available in a useable and re-useable form. In particular he discussed the sophisticated re-analysis and mashing that properly exposed data enables while excoriating a number of people in the audience for forcing him to screen scrape data from their sites.

All in all, as you might expect, this was music to my ears. This was the case for open science made clearly and succinctly, and with passion, over the course of several days. The speaker? Mike Ellis from EduServ; I suspect both a person and an organization of which most of the readers of this blog have never heard. Why? Because he comes from a background in museums, the data he wanted was news streams, addresses, and lat long for UK higher education institutions, or library catalogues, not NMR spectra or gene sequences. Yet the case to be made is the same. I wrote last week about the need to make better connections between the open science blogosphere and the wider interested science policy and funding community. But we also need to make more effective connections with those for whom the open data agenda is part of their daily lives.

I spent several enjoyable days last week at the UKOLN Institutional Web Managersâ€™ Workshop in Aberdeen. UKOLN is a UK centre of excellence for web based activities in HE in the UK and IWMW is their annual meeting. It is attended primarily by the people who manage web systems within UK HE including IT services, Web services, and library services, as well as the funders, and support organisations associated with these activities.

There were a number of other talks that would be of interest to this community and many of the presentations are available as video at the conference website. James Curral on Web Archiving, Stephanie Taylor on Institutional Repositories, and David Hyett of the British Antarctic Survey providing the sceptics view of implementing Web2.0 services for communicating with the public. His central point, which was well made, was that there is no point adding a whole bunch of wizzbang features to an institutional website if you havenâ€™t got the fundamentals right: quality content; straightforward navigation; relevance to the user. Where I disagreed with his position was that I felt he extrapolated from the fact that most user generated content is poor to the presumption that â€˜user generated content on my site will be poorâ€™. This to me misses the key point: that it is by focussing on community building that you generate high quality content that is of relevance to that community. Nonetheless, his central point, donâ€™t build in features that your users donâ€™t want or need, is well made.

David made the statement â€˜90% of blogs are boringâ€™ during his talk. I took some exception to this (I am sure the situation is far, far, worse than that). In a question I made the point that it was generally accepted that Google had made the web useable by making things findable amongst the rubbish but that for social content we needed to adopt a different kind of â€˜social searchâ€™ strategy with different tools. That with the right strategies and the right tools every person could find their preferred 10% (or 1% or 0.00001%) of the worldâ€™s material. That in fact this social search approach led to the formation of new communities and new networks.

After the meeting however it struck me that I had failed to successfully execute my own advice. Mike Ellis blogs a bit, twitters a lot, and is well known within the institutional web management community. He lives not far away from me. He is a passionate advocate of data availability and has the technical smarts to do clever staff with the data that is available. Why hadnâ€™t I already made this connection? If I go around making the case that web based tools will transform our ability to communicate where is the evidence that this happens in practice. Our contention is that online publishing frees up communication and allows the free flow of information and ideas. The sceptics contention is that it just allows us to be happy in our own little echo chamber. Elements of both are true but I think it is fair to say that we are not effectively harnessing the potential of the medium to drive forward our agenda. By broadening the community and linking up with like minded people in museums, institutional web services, archives, and libraries we can undoubtedly do better.

So there are two approaches to solving this problem, the social approach and the technical approach. Both are intertwined but can be separated to a certain extent. The social approach is to link existing communities and allow the interlinks between them to grow. This blog post is one attempt â€“ some of you may go on to look at Mikeâ€™s Blog. Another is for people to act as supernodes within the community network. Michael Nielsenâ€™s joining of the (mostly) life science oriented community on FriendFeed and more widely in the blogosphere has connected that community with a theoretical physics community and another â€˜Open Scienceâ€™ community that was largely separate from the existing online community. A small number of connections made a big difference to overall network size. I was very happy to accept the invitation to speak at the IWMW meeting precisely because I hoped to make these kinds of connections. Hopefully a few people from the meeting may read this blog post (if so please do leave a comment â€“ lets build on this!). We make contacts we expand the network â€“ but this relies very heavily on supernodes within the network and their ability to cope with the volume.

So is there a technical solution to the problem? Well in this specific case there is a technical problem to the problem. Mike doesnâ€™t use Friendfeed but is a regular Twitter user. My most likely connection to Mike is Brian Kelly, based at UKOLN, who does have a Friendfeed account but I suspect doesnâ€™t monitor it. The connection fails because the social networks donâ€™t effectively interconnect. It turns out the web management community arenâ€™t convinced by FriendFeed and prefer Twitter. So a technical solution would somehow have to bridge this gap. Right at the moment that bridge is most likely to be a person, not a machine, which leaves us back where we started, and I donâ€™t see that changing anytime soon. The problem is an architectural one, not an application or service one. I can aggregate Twitter, FriendFeed or anything else in one place but unless everyone else does the same thing its not really going to help.

I donâ€™t really have a solution except once again to make the case for the value of those people who build stronger connections between poorly interconnected networks. It is not just that information is valuable, but the timely delivery of that information is valuable. These people add value. What is more, if we are going to fully exploit the potential of the web in the near term, not to mention demonstrate the value of exploiting it to others, we need to value these people and support their activities. How we do that is an open question. It will clearly cost money. The question is where to get it from and how to get it to where it needs to be.