25,000 signatures and still rolling: Implications of the White House petition

The petition at 25,000 signatures

I’m afraid I went to bed. It was getting on for midnight and it looked like another four hours or so before the petition would reach the magic mark of 25,000 signatures. As it turns out a final rush put us across the line at around 2am my time, but never mind, I woke up wondering whether we had got there, headed for the computer and had a pleasant surprise waiting for me.

What does this mean? What have John Wilbanks, Heather Joseph, Mike Carroll, and Mike Rossner achieved by deciding to push through what was a real hard slog? And what about all those people and groups involved in getting signatures in? I think there are maybe three major points here.

Access to Research is now firmly on the White House (and other governments’) agenda

The petition started as a result of a meeting between the Access2Research founders and John Holdren from the White House. John Wilbanks has written about how the meeting went and what the response was. The US administration has sympathy and understands many of the issues. However it must be hard to make the case that this was something worth the bandwidth it would take to drive a policy initiative. Especially in an election year. The petition and the mechanism of the “We the people” site has enabled us to show that it is a policy item that generates public interest, but more importantly it creates an opportunity for the White House to respond. It is worth noting that this has been one of the more successful petitions. Reaching the 25k mark in two weeks is a real achievement, and one that has got the attention of key people.

And that attention spreads globally as well. The Finch Report on mechanisms for improving access to UK research outputs will probably not mention the petition, but you can bet that those within the UK government involved in implementation will have taken note. Similarly as the details of the Horizon2020 programme within the EU are hammered out, those deciding on the legal instruments that will distribute around $80B, will have noted that there is public demand, and therefore political cover, to take action.

The Open Access Movement has a strong voice, and a diverse network, and can be an effective lobby

It is easy, as we all work towards the shared goal of enabling wider access and the full exploitation of web technologies, to get bogged down in details and to focus on disagreements. What this effort showed was that when we work together we can muster the connections and the network to send a very strong message. And that message is stronger for coming from diverse directions in a completely transparent manner. We have learnt the lessons that could be taken from the fight against SOPA and PIPA and refined them in the campaign to defeat, in fact to utterly destroy, the Research Works Act. But this was not a reaction, and it was not merely a negative campaign. This was a positive campaign, originating within the movement, which together we have successfully pulled off. There are lessons to be learnt. Things we could have done better. But what we now know is that we have the capacity to take on large scale public actions and pull them off.

The wider community wants access and has a demonstrated capacity to use it

There has in the past been an argument that public access is not useful because “they can’t possibly undertand it”, that “there is no demand for public access”. That argument has been comprehensively and permanently destroyed. It was always an arrogant argument, and in my view a dangerous one for those with a vested interest in ensuring continued public funding of research. The fact that it had strong parallels with the arguments deployed in the 18th and 19th centuries that colonists, or those who did not own land, or women, could not possibly be competent to vote should have been enough to warn people off using it. The petition has shown demand, and the stories that have surfaced through this campaign show not only that there are many people who are not professional researchers who can use research, but that many of these people also want, and are more than capable, to contribute back to the professional research effort.

The campaign has put the ideas of Open Access in front of more people than perhaps ever before. We have reached out to family, friends, co-workers, patients, technologists, entrepreneurs, medical practitioners, educators, and people just interested in the world around them. Perhaps one in ten of them actually signed the petition, but many of them will have talked to others, spreading the ideas. This is perhaps one of the most important achievements of the petition. Getting the message and the idea out in front of hundreds of thousands of people who may not take action today, but will now be primed to see the problems that arise from a lack of access, and the opportunities that could be created through access.

Where now?

So what are our next steps? Continuing to gain signatures for the next two weeks is still important. This may be one of the most rapidly growing petitions but showing that continued growth is still valuable. But more generally my sense is that we need to take stock and look forward to the next phase. The really hard work of implementation is coming. As a movement we still disagree strongly on elements of tactics and strategy. The tactics I am less concerned about, we can take multiple paths, applying pressure at multiple points and this will be to our advantage. But I think we need a clearer goal on strategy. We need to articulate what the endgame is. What is the vision? When will we know that we have achieved what we set out to do?

Peter Murray-Rust has already quoted Churchill but it does seem apposite. “…this is not the end. This is not even the beginning of the end. But it is perhaps, the end of the beginning.”

We now know how much we can achieve when we work together with a shared goal. The challenge now is to harness that to a shared understanding of the direction of travel, if perhaps not the precise route. But if we, with all the diversity of needs and views that this movement contains, we can find the core of goals that we all agree on, then what we now know is that we have the capacity, the depth, and the strength to achieve them.

 

Enhanced by Zemanta

Parsing the Willetts Speech on Access to UK Research Outputs

David Willetts speaking at the Big Society pol...
David Willetts speaking at the Big Society policy launch, Coin St, London. (Photo credit: Wikipedia)

Yesterday David Willetts, the UK Science and Universities Minister gave a speech to the Publishers Association that has got wide coverage. However it is worth pulling apart both the speech and the accompanying opinion piece from the Guardian because there are some interesting elements in there, and also some things have got a little confused.

The first really key point is that there is nothing new here. This is basically a re-announcement of the previous position from the December Innovation Strategy on moving towards a freely accessible literature and a more public announcement of the Gateway to Research project previously mentioned in the RCUK response to the Innovation Statement.

The Gateway to Research project is a joint venture of the Department of Business Innovation and Skills and Research Councils UK to provide a one stop shop for information on UK research funding as well as pointers to outputs. It will essentially draw information directly from sources that already exist (the Research Outputs System and eVal) as well as some new ones with the intention of helping the UK public and enterprise find research and researchers that is of interest to them, and see how they are funded.

The new announcement was that Jimmy Wales of Wikipedia fame will be advising on the GTR portal. This is a good thing and he is well placed to provide both technical and social expertise on the provision of public facing information portals as well as providing a more radical perspective than might come out of BIS itself. While this might in part be cynically viewed as another example of bringing in celebrities to advise on policy this is a celebrity with relevant expertise and real credibility based on making similar systems work.

The rest of the information that we can gather relates to government efforts in moving towards making the UK research literature accessible. Wales also gets a look in here, and will be “advising us on [..] common standards to ensure information is presented in a readily reusable form”. My reading of this is that the Minister understands the importance of interoperability and my hope is that this will mean that government is getting good advice on appropriate licensing approaches to support this.

However, many have read this section of the speech as saying that GTR will act as some form of national repository for research articles. I do not believe this is the intention, and reading between the lines the comment that it will “provide direct links to actual research outputs such as data sets and publications” [my emphasis] is the key. The point of GTR is to make UK research more easily discoverable. Access is a somewhat orthogonal issue. This is better read as an expression of Willetts’ and the wider government’s agenda on transparency of public spending than as a mechanism for providing access.

What else can we tell from the speech? Well the term “open access” is used several times, something that was absent from the innovation statement, but still the emphasis is on achieving “public access” in the near term with “open access” cast as the future goal as I read it. It’s not clear to me whether this is a well informed distinction. There is a somewhat muddled commentary on Green vs Gold OA but not that much more muddled than what often comes from our own community. There are also some clear statements on the challenges for all involved.

As an aside I found it interesting that Willetts gave a parenthetical endorsement of usage metrics for the research literature when speaking of his own experience.

As well as reading some of the articles set by my tutors, I also remember browsing through the pages of the leading journals to see which articles were well-thumbed. It helped me to spot the key ones I ought to be familiar with – a primitive version of crowd-sourcing. The web should make that kind of search behaviour far easier.

This is the most sophisticated appreciation of the potential for the combination of measurement and usage data in discovery that I have seen from any politician. It needs to be set against his endorsement of rather cruder filters earlier in the speech but it nonetheless gives me a sense that there is a level of understanding within government that is greater than we often fear.

Much of the rest of the speech is hedging. Options are discussed but not selected and certainly not promoted. The key message: wait for the Finch Report which will be the major guide for the route the government will take and the mechanisms that will be put in place to support it.

But there are some clearer statements. There is a strong sense that Hargreave’s recommendations on enabling text mining should be implemented. And the logic for this is well laid out. The speech and the policy agenda is embedded in a framework of enabling innovation – making it clear what kinds of evidence and argument we will need to marshal in order to persuade. There is also a strong emphasis on data as well as an appreciation that there is much to do in this space.

But the clearest statement made here is on the end goals. No-one can be left in any doubt of Willetts’ ultimate target. Full access to the outputs of research, ideally at the time of publication, in a way that enables them to be fully exploited, manipulated and modified for any purpose by any party. Indeed the vision is strongly congruent with the Berlin, Bethesda, and Budapest declarations on Open Access. There is still much to be argued about the route and and its length, but in the UK at least, the destination appears to be in little doubt.

Enhanced by Zemanta

Response to the OSTP Request for Information on Public Access to Research Data

Response to Request for Information – FR Doc. 2011-28621

Dr Cameron Neylon – U.K. based research scientist writing in a personal capacity

Introduction

Thankyou for the opportunity to respond to this request for information and to the parallel RFI on access to scientific publications. Many of the higher level policy issues relating to data are covered in my response to the other RFI and I refer to that response where appropriate here. Specifically I re-iterate my point that a focus on IP in the publication is a non-productive approach. Rather it is more productive to identify the outcomes that are desired as a result of the federal investment in generating data and from those outcomes to identify the services that are required to convert the raw material of the research process into accessible outputs that can be used to support those outcomes.

Response

(1) What specific Federal policies would encourage public access to and the preservation of broadly valuable digital data resulting from federally funded scientific research, to grow the U.S. economy and improve the productivity of the American scientific enterprise?

Where the Federal government has funded the generation of digital data, either through generic research funding or through focussed programs that directly target data generation, the purpose of this investment is to generate outcomes. Some data has clearly defined applications, and much data is obtained to further very specific research goals. However while it is possible to identify likely applications it is not possible, indeed is foolhardy, to attempt to define and limit the full range of uses which data may find.

Thus to ensure that data created through federal investment is optimally exploited it is crucial that data be a) accessible, b) discoverable, c)interpretable and d) legally re-usable by any person for any purpose. To achieve this requires investment in infrastructure, markup,  and curation. This investment is not currently seen as either a core activity for researchers themselves, or a desirable service for them to purchase. It is rare therefore for such services or resource need to be thoughtfully costed in grant applications.

The policy challenge is therefore to create incentives, both symbolic and contractual, but also directly meaningful to researchers with an impact on their career and progression, that encourage researchers to either undertake these necessary activities directly themselves or to purchase and appropriately cost third party services to have them carried out.

Policy intervention in this area will be complex and will need to be thoughtful. Three simple policy moves however are highly tractable and productive, without requiring significant process adjustments in the short term:

a) Require researchers to provide a data management or data accessibility plan within grant requests. The focus of these plans should be showing how the project will enable third party groups to discover and re-use data outputs from the project.

b) As part of the project reporting, require measures of how data outputs have been used. These might include download counts, citations, comments, or new collaborations generated through the data. In the short term this assessment need to be directly used but it sends a message that agencies consider this important.

c) Explicitly measure performance on data re-use. Require as part of bio sketches and provide data on previous performance to grant panels. In the longer term it may be appropriate to provide guidance to panels on the assessment of previous performance on data re-use but in the first instance simply providing the information will affect behaviour and the general awareness of issues of data accessibility, discoverability, and usability.

(2) What specific steps can be taken to protect the intellectual property interests of publishers, scientists, Federal agencies, and other stakeholders, with respect to any existing or proposed policies for encouraging public access to and preservation of digital data resulting from federally funded scientific research?

As noted in my response to the other RFI, the focus on intellectual property is note helpful. Private contributors of data such as commercial collaborators should be free to exploit their own contribution of IP to projects as they see fit. Federally funded research should seek to maximise the exploitation and re-use of data generated through public investment.

It has been consistently and repeatedly demonstrated in a wide range of domains that the most effective way of exploiting the outputs of research innovation, be they physical samples, or digital data, to support further research, to drive innovation, or to support economic activity globally is to make those outputs freely available with no restrictive terms. That is, the most effective way to use research data to drive economic activity and innovation at a national level is to give the data away.

The current IP environment means that in specific cases, such as where there is very strong evidence of a patentable result with demonstrated potential, that the optimisation of outcomes does require protection of the IP. There are also situations where privacy and other legal considerations mean that data cannot be released or not be fully released. These should however be seen as the exception rather than the rule.

(3) How could Federal agencies take into account inherent differences between scientific disciplines and different types of digital data when developing policies on the management of data?

At the Federal level only very high-level policy decisions should be taken. These should provide direction and strategy but enable tactics and the details of implementation to be handled at agency or community levels. What both the Federal Agencies and coordination bodies such as OSTP can provide is an oversight and, where appropriate, funding support to maintain, develop, and expand interoperability between developing standards in different communities. Federal agencies can also effectively provide an oversight function that supports activities that enhance interoperability.

Local custom, dialects, and community practice will always differ and it is generally unproductive to enforce standardisation on implementation details. The policy objectives should be to set the expectations and the frameworks within local implementation can be developed and approaches to developing criteria against which those local implementations can be assessed.

(4) How could agency policies consider differences in the relative costs and benefits of long-term stewardship and dissemination of different types of data resulting from federally funded research?

Prior to assessing differences in performance and return on investment it will be necessary to provide data gathering frameworks and to develop significant expertise in the detailed assessment of the data gathered. A general principle that should be considered is that the administrative and performance data related to accessibility and re-use of research data should provide an outstanding exemplar of best practice in terms of accessibility, curation, discoverability, and re-usability.

The first step in cost benefit analysis must be to develop an information and data base that supports that analysis. This will mean tracking and aggregating forms of data use that are available today (download counts, citations) as well as developing mechanisms for tracking the use and impact of data in ways that are either challenging or impossible today (data use in policy development, impact of data in clinical practice guidelines).

Only once this assessment data framework is in place can detailed process of cost benefit analysis be seriously considered. Differences will exist in the measurable and imponderable return on investment in data availability, and also in the timeframes over which these returns are realised. We have only a very limited understanding of these issues today.

(5) How can stakeholders (e.g., research communities, universities, research institutions, libraries, scientific publishers) best contribute to the implementation of data management plans?

If stakeholders have serious incentives to optimise the use and re-use of data then all players will seek to gain competitive advantage through making the highest quality contributions. An appropriate incentives framework obviates the need to attempt to design in or pre-suppose how different stakeholders can, will, or should best contribute going forward.

(6) How could funding mechanisms be improved to better address the real costs of preserving and making digital data accessible?

As with all research outputs there should be a clear obligation on researchers to plan on a best efforts basis to publish these (as in make public) in a form that most effectively support access and re-use tensioned against the resources available. Funding agencies should make clear that they expect communication of research outputs to be a core activity for their funded research, that researchers and their institutions will be judged based on their performance in optimising the choices they make in selecting the appropriate modes of communication.

Further funding agencies should explicitly set guidance levels on the proportion of a research grant that is expected under normal circumstances to be used to support the communication of outputs. Based on calculations from the Wellcome Trust where projected expenditure on the publication of traditional research papers was around 1-1.5% of total grant costs, it would be reasonable to project total communication costs once data and other research communications are considered of 2-4% of total costs. This guidance and the details of best practice should clearly be adjusted as data is collected on both costs and performance.

(7) What approaches could agencies take to measure, verify, and improve compliance with Federal data stewardship and access policies for scientific research? How can the burden of compliance and verification be minimized?

Ideally compliance and performance will be trackable through automated systems that are triggered as a side effect of activities required for enabling data access. Thus references for new data should be registered with appropriate services to enable discovery by third parties – these services can also be used to support the tracking of these outputs automatically. Frameworks and infrastructure for sharing should be built with tracking mechanisms built in. Much of the aggregation of data at scale can build on the existing work in the STARMETRICS program and draw inspiration from that experience.

Overall it should be possible to reduce the burden of compliance from its current level while gathering vastly more data and information of much higher quality than is currently collected.

(8) What additional steps could agencies take to stimulate innovative use of publicly accessible research data in new and existing markets and industries to create jobs and grow the economy?

There are a variety of proven methods for stimulating innovative use of data at both large and small scale. The first is to make it available. If data is made available at scale then it is highly likely that some of it will be used somewhere. The more direct encouragement of specific uses can be achieved through directed “hack events” that bring together data handling and data production expertise from specific domains. There is significant US expertise in successfully managing these events and generating exciting outcomes. These in turn lead to new startups and new innovation.

There is also a significant growth in the number of data-focussed entrepreneurs who are now veterans of the early development of the consumer web. Many of these have a significant interest in research as well as significant resources and there is great potential for leveraging their experience to stimulate further growth. However this interface does need to be carefully managed as the cultures involved in research data curation and web-scale data mining and exploitation are very different.

(9) What mechanisms could be developed to assure that those who produced the data are given appropriate attribution and credit when secondary results are reported?

The existing norms of the research community that recognise and attribute contributions to further work should be strengthened and supported. While it is tempting to use legal instruments to enforce a need for attribution there is growing evidence that this can lead to inflexible systems that cannot adapt to changing needs. Thus it is better to utilise social enforcement than legal enforcement.

The current good work on data citation and mechanisms for tracking the re-use of data should be supported and expanded. Funders should explicitly require that service providers add capacity for tracking data citation to the products that are purchased for assessment purposes. Where possible the culture of citation should be expanded into the wider world in the form of clinical guidelines, government reports, and policy development papers.

(10) What digital data standards would enable interoperability, reuse, and repurposing of digital scientific data? For example, MIAME (minimum information about a microarray experiment; see Brazma et al., 2001, Nature Genetics 29, 371) is an example of a community-driven data standards effort.

At the highest level there are a growing range of interoperable information transfer formats that can provide machine readable and integratable data transfer including RDF, XML, OWL, JSON and others. My own experience is that attempting to impose global interchange standards is an enterprise doomed to failure and it is more productive to support these standards within existing communities of practice.

Thus the appropriate policy action is to recommend that communities adopt and utilise the most widely used possible set of standards and to support the transitions of practice and infrastructure required to support this adoption. Selecting standards at the highest level is likely to counterproductive. Identifying and disseminating best practice in the development and adoption of standards is however something that is the appropriate remit of federal agencies.

(11) What are other examples of standards development processes that were successful in producing effective standards and what characteristics of the process made these efforts successful?

There is now a significant literature on community development and practice and this should be referred to. Many lessons can also be drawn from the development of effective and successful open source software projects.

(12) How could Federal agencies promote effective coordination on digital data standards with other nations and international communities?

There are a range of global initiatives that communities should engage with. The most effective means of practical engagement will be to identify communities that have a desire to standardise or integrate systems and to support the technical and practical transitions to enable this. For instance there is a widespread desire to support interoperable data formats from analytical instrumentation but few examples of bringing this to transition. Funding could be directed to supporting a specific analytical community and the vendors that support them to apply an existing standard to their work.

(13) What policies, practices, and standards are needed to support linking between publications and associated data?

Development in this area is at an early stage. There is a need to reconsider the form of publication in its widest sense and this will have a significant impact on the forms and mechanisms of linking. This is a time for experimentation and exploration rather than standards development.

 

Enhanced by Zemanta

A collaborative proposal on research metrics

Measuring time
Image by aussiegall via Flickr

tldr: Proposed project to connect metrics builders with those who can most effectively use them to change practice. Interested? Get involved! Proposal doc is here and free to edit.

When we talk about open research practice, more efficient research communication, wider diversity of publication we always come up against the same problem. What’s in it for the jobbing scientist? This is so prevalent that it has been reformulated as “Singh’s Law” (by analogy with Godwin’s law) that any discussion of research practice will inevitably end when someone brings up career advancement or tenure. The question is what do we actually do about this?

The obvious answer is to make these things matter. Research funders have the most power here in that they have the power to influence behaviour through how they distribute resources. If the funder says something is important then the research community will jump to it. The problem of course it that in practice funders have to take their community with them. Radical and rapid change is not usually possible. A step in the right direction would be to provide funders and researchers with effective means of measuring and comparing themselves and their outputs. In particular means of measuring performance in previously funded activities.

There are many current policy initiatives on trying to make these kinds of judgements. There are many technical groups building and discussing different types of metrics. Recently there have also been calls to ensure that the data that underlies these metrics is made available. But there is relatively little connection between these activities. There is an opportunity to connect technical expertise and data with the needs of funders, researchers, and perhaps even the mainstream media and government.

An opportunity has arisen for some funding to support a project here. My proposal is to bring a relevant group of stakeholders together; funders, technologists, scientists, adminstrators, media, publishers, and aggregators, to identify needs and then to actually build some things. Essentially the idea is a BarCamp style day and a bit meeting followed by a two day hackfest. Following on from this the project would fund some full time effort to take the most promising ideas forward.

I’m looking for interested parties. This will be somewhat UK centric just because of logistics and funding but the suggestion has already been made that following up with a similar North American or European project could be interesting. The proposal is available to view and edit as a GoogleDoc. Feel free to add your name, contact me directly, or suggest the names of others (probably better to me directly). I have a long list of people to contact directly as well but feel free to save me the effort.

Ed. Note: This proposal started as a question on Friendfeed where I’ve already got a lot of help and ideas. Hopefully soon I will write another post about collaborative and crowdsourced grant writing and how it has changed since the last time I tried this some years back.

Enhanced by Zemanta

Why the Digital Britain report is a missed opportunity

A few days ago the UK Government report on the future of Britain’s digital infrastructure, co-ordinated by Lord Carter, was released. I haven’t had time to read the whole report, I haven’t even really had time to skim it completely. But two things really leapt out at me.

On page four:

“If, as expected, the volume of digital content will increase 10x to 100x over the next 3 to 5 years then we are on the verge of a big bang in the communications industry that will provide the UK with enormous economic and industrial opportunities”

And on page 18:

“Already today around 7.5% of total UK music album purchases are digital and a smaller but rapidly increasing percentage of film and television consumption is streamed online or downloaded…User-generated and social content will be very significant but should not be the main or only content” – this brought to my attention by Brian Kelly.

The first extract, is to me symptomatic of a serious, even catastrophic lack of ambition and understanding of how the web is changing. If the UK’s digital content only increases by 10-100 fold over the next three years then we will be living in a country lagging behind those that will be experiencing huge economic benefits from getting the web right for their citizens.

But that is just a lack of understanding at core. The Government’s lack of appreciation for how fast this content is growing isn’t really an issue because the Government isn’t an effective content producer online. It would be great if it were, pushing out data, making things happen but they will probably catch up one day, when forced to by events. What is disturbing to me is that second passage. “User generated and social content should not be the main or only content”? It probably already is the main content on the open web, at least by volume, and the volume and traffic rates of user generated content are rising exponentially. But putting that aside, the report appears to be saying that basically the content generated by British citizens, is not, and will not be “good enough”; that it has no real value. Lord Carter hasn’t just said that he doesn’t believe that enough useful content could be produced by “non-professionals”, but that it shouldn’t be produced.

The Digital Britain Unconferences were a brilliant demonstration of how the web can enable democracy by bringing interested people together to debate and respond to specific issues. Rapid, high quality, and grass roots it showed the future of how government’s could actually interact effectively with their citizens. The potential for economic benefits from the web are not in broadcast, are not in professional production, but are in many to many communication and sharing. Selling a few more videos will not get us out of this recession. Letting millions of people add a small amount of value, or have more efficient interactions, could. This report fails to reflect that opportunity. It is a failure of understanding and a failure of imagination. The only saving grace is that, aside from the need for physical infrastructure, the Government is becoming increasingly irrelvant to the debate anyway. The world will move on, and the web will enable it, faster or slower than we expect, and in ways that will be suprising. It will just go that much slower in the UK.

Digital Britain Unconference Oxfordshire – Friday 1 May – RAL

On Friday (yes, that’s this Friday) a series of Unconferences that has been pulled together in response to the Digital Britain Report and Forum will kick of with one being held at the Rutherford Appleton Laboratory, near Didcot. The object of the meeting is to contribute to a coherent and succinct response to the current interim report and to try and get across to Government what a truly Digital Britain would look like. There is another unconference scheduled in Leeds and it is expected that more will follow.

If you are interested in attending the Oxfordshire meeting please register at the Eventbrite page. Registrations will close on Wednesday evening because I need to finalise the list for security the day before the meeting. I will send directions to registered attendees first thing on Thursday morning. In terms of the conduct of the unconference itself please bear in mind the admonishment of Alan Patrick:

One request I’d make – the other organisers are too polite to say it, but I will – one of the things that the Digital Britain team has made clear is that they will want feedback that is “positive, concise, based in reality and sent in as soon as possible”. That “based in reality” bit (that mainly means economics) puts a responsibility on us all to ensure all of us as attendees are briefed and educated on the subject before attending the unconference – ie come prepared, no numpties please, as that will dilute the hard work of others.

For information see the links on the right hand side of the main unconference series website, or search on “Digital Britain”

My Bad…or how far should the open mindset go?

So while on the train yesterday in somewhat pre-caffeinated state I stuck my foot in it somewhat. Several others have written (Nils Reinton, Bill Hooker, Jon Eisen, Hsien-Hsien Lei, Shirley Wu) on the unattributed use of an image that was put together by Ricardo Vidal for the DNA Network of blogs. The company that did this are selling hokum. No question of that. Now the logo is in fact clearly marked as copyright on Flickr but even if it were marked as CC-BY then the company would be in violation of the license for not attributing. But, despite the fact that it is clearly technically wrong, I felt that the outrage being expressed was inconsistent with the general attitude that materials should be shared, re-useable, and available for re-purposing.

So in the related Friendfeed thread I romped in, offended several people (particularly by using the word hypocritical which I should not have done, like I said, pre-caffeine) and had to back up and re-think what it was I was trying to say. Actually this is a good thing about Friendfeed, the rapid fire discussion can encourage semi-baked comments and ideas which are then leapt on and need to be more carefully thought through and refined. In science criticism is always valuable, agreement is often a waste of time.

So at core my concern is largely about the apparent message that can be sent by a group of “open” activists objecting about the violation of the copyright of a member of their community. As I wrote further down in the comments;

“…There is a danger that this kind of thing comes across as ‘everything should be pd [pubic domain] but when my mate copyrights something and you violate it I will jump down your throat’. The subtext being it is ok to violate copyright for ‘good’ reasons but not for ‘bad’ reasons… “

It is crucially important to me that when you argue that an area of law is poorly constructed, ineffective or having unexpected consequences, that you scrupulously operate within that law, while not criticising those who cut corners. At the same time if I argue that the risks of having people ‘steal’ my work are outweighed by the benefits of sharing then I should roll with the punches when bad stuff does happen.There is the specific issue that what was done is a breach of copyright as well and then the general issue that if people were more able to do this kind of thing that it would be good. The fact that it was used for a nasty service preying on people’s fears is at one level neither here nor there (or rather the moral rights issue is I think a separate, and rather complicated one that will not fit in this particular margin, does the use of the logo misrepresent Ricardo? Does it misrepresent the DNA network – who remember don’t own it?).

More broadly I think there is a mindset that goes with the way the web works and the way that sharing works that means we need to get away from the idea of the object or the work as property.The value of objects lies only in their scarcity, or their lack of presence. With the advent of the world’s greatest copying machine, no digital object need be scarce. It is not the object that has value, because it can be infinitely copied for near zero cost, it is the skill and expertise in putting the object together that has value. The argument of the “commonists” is that you will spend more on using licences and secrecy to protect objects than you could be making by finding the people who need your skills to make just the thing that they need, right now. If this is true it presumably holds for data, for scientific papers, for photos, for video, for software, for books, and for logos.

The argument that I try to promote (and many others do much better) is that we need to get away from the concepts and language of ownership of these digital objects. That even thinking in terms of it being “mine” is counterproductive and actually reduces value. It may be the case that there are limits to where these arguments hold, and if there is it probably has something to do with the intrinsic timeframe of the production cycle for a class of objects, but that is a thought for another time. What worried me was that people seemed to be using language that is driven by thinking about propery and scarcity; “theft”, “stealing”. In my view we should be talking about “service quality”, “delivery time”, and “availability”. This is where value lies on the net, not in control, and not in ownership of objects.

None of which is to say that people should not be completely free to license work which they produce in any way that they choose, and I will defend their right to do this. But at the same time I will work to persuade these same people that some types of license are counterproductive, particularly those that attempt to control content. If you beleive that science is better for the things that make it up being shared and re-used, that the value of a person’s work is increased by others re-using this why shouldn’t that apply to other types of work? The key thing is a consistent and clear message.

I try to be consistent, and I am by no means always successful, but its a work in progress.  Anyone is free to re-use and re-purpose anything I generate in whatever way they choose. If I disagree with the use I will say so. If it is unattributed I might comment, and I might name names, but I won’t call in the lawyers. If I am inconsistent I invite, and indeed expect, people to say so. I would hope that criticism would come from the friendly faces before it comes from people with another agenda. That, at the end of the day, is the main benefit of being open. It’s all just error checking in the end.

A personal view of Open Science – Part IV – Policies and standards

This is the fourth and final part of the serialisation of a draft paper on Open Science. The other parts are here – Part IPart IIPart III

A question that needs to be asked when contemplating any major change in practice is the balance and timing of ‘bottom up’ versus ‘top-down’ approaches for achieving that change. Scientists are notoriously un-responsive to decrees and policy initiatives but as has been discussed they are also inherently conservative and generally resistant to change led from within the community as well. For those advocating the widespread, and ideally rapid, adoption of more open practice in science it will be important to strike the right balance between calling for mandates and conditions for funding or journal submission and of simply adopting these practices in their own work. While the motivation behind the adoption of data sharing policies by funders such as the UK research councils is to be applauded it is possible for such intiatives to be counterproductive if the policies are not supported by infrastructure development, appropriate funding, and appropriate enforcement. Equally, standards and policy statements can send a powerful message on the aspirations of funders to make the research they fund more widely available and, for the most part, when funders speak, scientists listen.

One Approach for Mainstream Adoption – The fully supported paper

There are two broad approaches to standards that are currently being discussed. The first of these is aimed at mainstream acceptance and uptake and can be described as ‘The fully supported paper’. This is a concept that is simple on the surface but very complex to implement in practice. In essence it is the idea that the claims made in a peer reviewed paper in the conventional literature should be fully supported by a publically accessible record of all the background data, methodology, and data analysis procedures that contribute to those claims. On one level this is only a slightly increased in requirements from the Brussels Declaration made by the Internaional Association of Scientific, Technical, and Medical Publishers in 2007 which states;

Raw research data should be made freely available to all researchers. Publishers encourage the public posting of the raw data outputs of research. Sets or sub-sets of data that are submitted with a paper to a journal should wherever possible be made freely accessible to other scholars

http://www.stm-assoc.org/brussels-declaration/

The degree to which this declaration is supported by publishers and the level to which different journals require their authors to adhere to it is a matter for debate but the principle of availability of background data has been accepted by a broad range of publishers. It is therefore reasonable to consider the possibility of making the public posting of data as a requirement for submission. At a simple level this is already possible. For specific types of data repositories already exist and in many cases most journals require submission of these data types to recognised respositories. More generally it is possible to host data sets in some institutional repositories and with the expected announcement of a large scale data hosting service from Google the argument that this is not practicable is becoming unsustainable. While such datasets may have limited discoverability and limited metadata, they will at least be discoverable from the papers that reference them. It is reasonable to expect sufficent context to be provided in the published paper to make the data useable.

However the data itself, except in specific cases, is not enough to be useful to other researchers. The detail of how that data was collected and how it was processed are critical for making a proper analysis of whether the claims made in a paper to be properly judged. Once again we come to the problem of recording the process of research and then presenting that in a form which is both detailed enough to be widely useful but not so dense as to be impenetrable. The technical challenges of delivering a fully supported paper are substantial. However it is difficult to argue that this shouldn’t be available. If claims made in the scientific literature cannot be fully verified can they be regarded as scientific? Once again – while the target is challenging – it is simply a proposal to do good science, properly communicated.

Aspirational Standards – celebrating best practice in open science

While the fully supported paper would be a massive social and technical step forward it in many ways is no more open than the current system. It does not deal with the problem of unpublished or unsuccessful studies that may never find a home in a traditional peer reviewed paper. As discussed above the ‘fully supported paper’ is not really ‘open science’; it is just good science. What then are the requirements, or standards for ‘open science’. Does there need to be a certificate or a set of requirements that need to be met before a project, individual, or institution can claim they are doing Open Science. Or is Open Science simply too generic and prone to misinterpretation?

I would argue that while ‘Open Science’ is a very generic term it has real value as a rallying point or banner. It is a term which generates significant positive reaction amongst the general public, the mainstream media, and large sections of the research community. Its very vagueness also allows some flexibility making it possible to welcome contributions from publishers, scientists, and funders which while not 100% open are nonetheless positive and helpful. Within this broad umbrella it is then possible to look at defining or recomending practices and standards and giving these specific labels for identification.

The main work in the area of defining relevant practices and standards has been carried out by Science Commons and the Open Knowledge Foundation. Science Commons have published four ‘Principles for Open Science‘ which focus on the availability and accessiblity of published literature, research tools, and data, and the development of cyberinfrastructure to make this possible. These four principles currently do no explicitly include the availability of process, which has been covered in detail above, but provide a clear set of criteria which could form the basis of standards. Broadly speaking research projects, individuals, or institutions that deliver on these principles could be said to be doing Open Science. The Open Knowledge Definiton, developed by the Open Knowledge Foundation, is another useful touchstone here. Another possible defining criterion for Open Science is that all the relevant material is made available under licenses that adhere to the definition.

The devil, naturally, lies in the details. Are embargoes on data and methodology appropriate, and if so, in what fields and how should they be constructed? For data that cannot be released should specific exceptions be made, or special arrangments made to hold data in secure repositories? Where the same group is doing open and commercial research how should the divisions between these projects be defined and declared? These details are important, and will take time to work out. In the short term it is therefore probably more effective to identify and celebrate examples of open science, define best practice and observe how it works (and does not work) in the real world. This will raise the profile of Open Science without making it immediately an exclusive preserve of those with the luxury of radically changing practice. It enables examples of best practice to be held up as aspirational standards, providing the goals for others to work towards, and the impetus for the tool and infrastructure development that will support them. Many government funders are starting to introduce data sharing mandates, generally with very weak wording, but in most cases these refer to the expectation that funded research will adhere to the standard of ‘best practice’ in the relevant field. At this stage of development it may be more productive to drive adoption throgh the strategic support of improving best practice in a wide range of fields than to attempt to define strict standards.

Summary

The community advocating more open practice in scientific research is growing in size and influence. The major progress made in the past 12-18 months by the Open Access movement and the development of deposition and data sharing mandates by a range of research funders show that real progress is being made in increasing access to both the finished products of research and the materials that support them. While there have been significant successes this remains a delicate moment. There is a risk of over enthusiasm driving expectations which cannot be delivered and of alienating the mainstream community that we wish to draw in. The fears and concerns of researchers in widening access to their work need to be addressed sensitively and seriously, pointing out the benefits but also acknowledging the risks involved in adopting these practices.

It will not be enough to develop tools and infrastructure that, if adopted, would revolutionize science communication. Those tools must be built with an understanding of how scientists work today, and with the explicit aim of embedding these tools in existing workflows. The need for, and the benefits of, adopting controlled vocabularies needs to be sold much more effectively to the mainstream scientific community. The ontologies community also needs to recognise that there are cases and areas where the use of strict controlled vocabularies is not appropriate. Web 2.0 and Semantic web technologies are not competitors but are complementary approaches that are appropriate in different contexts. Again, the right question to ask is ‘what do scientists do? And what can we do to make that work better?’; not how can we make scientists see they need to do things the ‘right’ way.

Finally, it is my belief that now is not the time to set out specific and strict standards of what qualifies as Open Science. It is the right time to discuss the details of what these standards might look like. It is the right time to look at examples of best practice; to celebrate these and to see what can be learnt from them, but with our current lack of experience, and lack of knowledge of what the unintended consequences of specific standards might be, it is too early to pin down the details of those standards. It is a good time to be clearly articulating the specific aspirations of the movement, and to provide goals that communities can aggregate around; the fully supported paper, the Science Commons principles, and the Open Knowledge Definition are all useful starting points. Open Science is gathering momentum, and that is a good thing. But equally it is a good time to take stock, identify the best course forward, and make sure that we ar carrying as many people forward with use as we can.