Open Access Progress: Anecdotes from close to home

nanoparticles
Solution extinction measurements of (A) faceted (B) and Au@MNPs and (C) photos of the particles. From Silva et al Chem. Commun., 2016 DOI:10.1039/C6CC03225G

It has become rather fashionable in some circles to decry the complain about the lack of progress on Open Access. Particularly to decry the apparent failure of UK policies to move things forward. I’ve been guilty of frustration at various stages in the past and one thing I’ve always found useful is thinking back to where things were. So with that in mind here’s an anecdote or two that suggests not just progress but a substantial shift in the underlying practice.

I live with a chemist, a group not known for their engagement with Open Access. More than most other disciplines in my experience there is a rigid hierarchy of journals, a mechanistic view of productivity, and – particularly in those areas not awash with pharmaceutical funding – not a huge amount of money. Combine that with a tendency to think everything is – or at least should be – patentable (which tends to rule out preprints) and this is not fertile ground for OA advocacy.

Over the years we’ve had our fair share of disagreements. A less than ideal wording on the local institutional mandate meant that archiving was off the menu for a while (the agreement to deposit required all staff to deposit but also required the depositor to take personal responsibility for any copyright breaches) and a lack of funds (and an institutional decision to concentrate RCUK funds and RSC vouchers on only the journals at the top of that rigid hierarchy) meant that OA publication in the journals of choice was not feasible either. That argument about whether you choose to pay an APC or buy reagents for the student was not a hypothetical in our household.

But over the past year things have shifted. A few weeks ago: “You know, I just realised my last two papers were published Open Access”. The systems and the funds are starting to work, are starting to reach even into those corners  of resistance, yes even into chemistry. Yes it’s still the natural sciences, and yes it’s only two articles out of who knows how many (I’m not the successful scientist in the house), buts its a quite substantial shift from it being out totally out of the question.

But around about the same time something that I found even more interesting. Glimpsed over a shoulder I saw something I found odd…searching on a publisher website, which is strange enough, and also searching only for Open Access content. A query raised the response: “Yeah, these CC BY articles is great, I can use the images directly in my lectures without having to worry; I just cite the article, which after all I would have obviously done anyway”. It turns out that with lecture video capture now becoming standard universities are getting steadily more worried about copyright. The Attribution licensed content meant there was no need to worry.

Sure these are just anecdotes but they’re indicative to me of a shift in the narrative. A shift from “this is expensive and irrelevant to me” to “the system takes care of it and I’m seeing benefits”. Of course we can complain that its costing too much, that much of the system is flakey at best and absent at worst, or that the world could be so much better. We can and should point to all the things that are sub-optimal. But just as the road may stretch out some distance ahead, and there may be roadblocks and barriers in front of us, there is also a long stretch of road behind, with the barriers cleared or overcome.

As much as anything it was the sense of “that’s just how things are now” that made me feel like real progress has been made. If that is spreading, even if slowly, then the shift towards a new normal may finally be underway.

The Panton Principles: Finding agreement on the public domain for published scientific data

Drafters of the Panton principlesI had the great pleasure and privilege of announcing the launch of the Panton Principles at the Science Commons Symposium – Pacific Northwest on Saturday. The launch of the Panton Principles, many months after they were first suggested is really largely down to the work of Jonathan Gray. This was one of several projects that I haven’t been able to follow through properly on and I want to acknowledge the effort that Jonathan has put into making that happen. I thought it might be helpful to describe where they came from, what they are intended to do and perhaps just as importantly what they don’t.

The Panton Principles aim to articulate a view of what best practice should be with respect to data publication for science. They arose out of an ongoing conversation between myself Peter Murray-Rust and Rufus Pollock. Rufus founded the Open Knowledge Foundation, an organisation that seeks to promote and support open culture, open source, and open science, with the emphasis on the open. The OKF position on licences has always been that share-alike provisions are an acceptable limitation to complete freedom to re-use content. I have always taken the Science Commons position that share-alike provisions, particularly on data have the potential to make it difficult or impossible to get multiple datasets or systems to interoperate. In another post I will explore this disagreement which really amounts to a different perspective on the balance of the risks and consequences of theft vs things not being used or useful. Peter in turn is particularly concerned about the practicalities – really wanting a straightforward set of rules to be baked right into publication mechanisms.

The Principles came out of a discussion in the Panton Arms a pub near to the Chemistry Department of Cambridge University, after I had given a talk in the Unilever Centre for Molecular Informatics. We were having our usual argument trying to win the others over when we actually turned to what we could agree on. What sort of statement could we make that would capture the best parts of both positions with a focus on science and data. We focussed further by trying to draw out one specific issue. Not the issue or when people should share results, or the details of how, but the mechanisms that should be used for re-use. The principles are intended to focus on what happens when a decision has been made to publish data and where we assume that the wish is for that data to be effectively re-used.

Where we found agreement was that for science, and for scientific data, and particularly science funded by public investment, that the public domain was the best approach and that we would all recommend it. We brought John Wilbanks in both to bring the views of Creative Commons and to help craft the words. It also made a good excuse to return to the pub. We couldn’t agree on everything – we will never agree on everything – but the form of words chosen – that placing data explicitly, irrevocably, and legally in the public domain satisfies both the Open Knowledge Definition and the Science Commons Principles for Open Data was something that we could all personally sign up to.

The end result is something that I have no doubt is imperfect. We have borrowed inspiration from the Budapest Declaration, but there are three B’s. Perhaps it will take three P’s to capture all the aspects that we need. I’m certainly up for some meetings in Pisa or Portland, Pittsburgh or Prague (less convinced about Perth but if it works for anyone else it would make my mother happy). For me it captures something that we agree on – a way forwards towards making the best possible practice a common and practical reality. It is something I can sign up to and I hope you will consider doing so as well.

Above all, it is a start.

Reblog this post [with Zemanta]

It’s not easy being clear…

There has been some debate going backwards and forwards over the past few weeks about licensing, peoples expectations, and the extent to which researchers can be expected to understand, or want to understand, the details of legal terms, licensing and other technical minutiae. It is reasonable for scientific researchers not to wish to get into the details. One of the real successes of Creative Commons has been to provide a relatively small set of reasonably clear terms that enable people to express their wishes about what people can do with their work. But even here there is the potential for significant confusion as demonstrated by the work that CC is doing on the perception of what “non commercial” means.

The end result of this is two-fold. Firstly people are genuinely confused about what to do and a result they give up. In giving up there is often an unspoken assumption that “people will understand what I want/mean”. Two examples yesterday illustrated exactly how misguided this can be and showed the importance of being clear, and thinking about, what you want people to do with your content and information.

The first was pointed out by Paulo Nuin who linked to a post on The Matrix Cookbook, a blog and PDF containing much useful information on matrix transforms. The post complained that Amazon were selling a Kindle version of the PDF, apparently without asking permission or even bothering to inform the authors. So far, so big corporation. But digging a little deeper I went to the front page of the site and found this interesting “license”:

“License? No, there is no license. It is provided as a knowledge sharing project for anyone to use. But if you use it in an academic or research like context, we would love to be cited appropriately.”

Now I would intepret this as meaning that the authors had intended to place the work in the public domain. They clearly felt that while educational and research re-use was fine that commercial use was not. I would guess that someone at Amazon read the statement “there is no license” and felt that it was free to re-use. It seems odd that they wouldn’t email the authors to notify them but if it were public domain there is no requirement to. Rude, yes. Theft? Well it depends on your perspective. Going back today the authors have made a significant change to the “license”:

It is provided as a knowledge sharing project for anyone to use. But if you use it in an academic or research like context, we would love to be cited appropriately. And NO, you are not allowed to make money on it by reselling The Matrix Cookbook in any form or shape.

Had the authors made the content CC-BY-NC then their intentions would have been much clearer. My personal belief is that an NC license would be counter-productive (meaning the work couldn’t be used for teaching at a fee charging college or for research funded by a commercial sponsor for instance) but the point of the CC licenses is to give people these choices. What is important is that people make those choices and make them clear.

The second example related to identity. As part of an ongoing discussion involving online commenting genereg, a Friendfeed user, linked to their blog which included their real name. Mr Gunn, the nickname used by Dr William Gunn online wrote a blog post in which he referred to genereg’s contribution by linking to their blog from their real name [subsequently removed on request]. I probably would have done the same, wanting to ascribe the contribution clearly to the “real person” so they get credit for it. Genereg objected to this feeling that as their real name wasn’t directly in that conversational context it was inappropriate to use it.

So in my view, “Genereg” was a nickname that someone was happy to have connected with their real name, while in their view this was inappropriate. No-one is right or wrong here, we are evolving the rules of conduct more or less as we go and frankly, identity is a mess. But this wasn’t clear to me or to Mr Gunn. I am often uncomfortable with trying to tell whether a specific person who has linked two apparently separate identities is happy with that link being public, has linked the two by mistake, or just regards one as an alias. And you can’t ask in public forum can you?

What links these, and this week’s other fracas, is confusion over people’s expectations. The best way to avoid this is to be as clear as you possibly can. Don’t assume that everyone thinks the same way that you do. And definitely don’t assume that what is obvious to you is obvious to everyone else. When it comes to content, make a clear statement of your expectations and wishes, preferably using a widely recognized and understood licenses. If you’re reading this at OWW you should be seeing my nice shiny new cc0 waiver in the right hand navbar (I haven’t figured how to get it into the RSS feed yet). Most of my slidesets at Slideshare are CC-BY-SA. I’d prefer them to be CC-BY but most include images with CC-BY-SA licenses which (try to make sure) I respect. Overall I try to make the work I generate as widely re-usable as possible and aim to make that as clear as possible.

There are no such tools to make clear statements about how you wish your identity to be treated (and perhaps there should be). But a plain english statement on the appropriate profile page might be useful “I blog under a pseudonym because…and I don’t want my identity revealed”…”Bunnykins is the Friendfeed handle of Professor Serious Person”. Consider whether what you are doing is sending mixed messages or potentially confusing. Personally I like to keep things simple so I just use my real name or variants of it. But that is clearly not for everyone.

Above all, try to express clearly what you expect and wish to happen. Don’t expect others necessarily to understand where you’re coming from. It is very easy for one person’s polite and helpful to be another person’s deeply offensive. When you put something online, think about how you want people to use it, think about how you don’t want people to use it (and remember you may need to balance the allowing of one against the restricting of the other) and make those as clear as you possibly can, where possible using a statement or license that is widely recognized and has had some legal attention at some point like the CC licenses, cc0 waiver, or the PDDL. Clarity helps everyone. If we get this wrong we may end up with a web full of things we can’t use.

And before anyone else gets in to tell me I’ve made plenty of unjustified, and plain wrong, assumptions about other people’s views before. Pot. Kettle. Black. Welcome to being human.

My Bad…or how far should the open mindset go?

So while on the train yesterday in somewhat pre-caffeinated state I stuck my foot in it somewhat. Several others have written (Nils Reinton, Bill Hooker, Jon Eisen, Hsien-Hsien Lei, Shirley Wu) on the unattributed use of an image that was put together by Ricardo Vidal for the DNA Network of blogs. The company that did this are selling hokum. No question of that. Now the logo is in fact clearly marked as copyright on Flickr but even if it were marked as CC-BY then the company would be in violation of the license for not attributing. But, despite the fact that it is clearly technically wrong, I felt that the outrage being expressed was inconsistent with the general attitude that materials should be shared, re-useable, and available for re-purposing.

So in the related Friendfeed thread I romped in, offended several people (particularly by using the word hypocritical which I should not have done, like I said, pre-caffeine) and had to back up and re-think what it was I was trying to say. Actually this is a good thing about Friendfeed, the rapid fire discussion can encourage semi-baked comments and ideas which are then leapt on and need to be more carefully thought through and refined. In science criticism is always valuable, agreement is often a waste of time.

So at core my concern is largely about the apparent message that can be sent by a group of “open” activists objecting about the violation of the copyright of a member of their community. As I wrote further down in the comments;

“…There is a danger that this kind of thing comes across as ‘everything should be pd [pubic domain] but when my mate copyrights something and you violate it I will jump down your throat’. The subtext being it is ok to violate copyright for ‘good’ reasons but not for ‘bad’ reasons… “

It is crucially important to me that when you argue that an area of law is poorly constructed, ineffective or having unexpected consequences, that you scrupulously operate within that law, while not criticising those who cut corners. At the same time if I argue that the risks of having people ‘steal’ my work are outweighed by the benefits of sharing then I should roll with the punches when bad stuff does happen.There is the specific issue that what was done is a breach of copyright as well and then the general issue that if people were more able to do this kind of thing that it would be good. The fact that it was used for a nasty service preying on people’s fears is at one level neither here nor there (or rather the moral rights issue is I think a separate, and rather complicated one that will not fit in this particular margin, does the use of the logo misrepresent Ricardo? Does it misrepresent the DNA network – who remember don’t own it?).

More broadly I think there is a mindset that goes with the way the web works and the way that sharing works that means we need to get away from the idea of the object or the work as property.The value of objects lies only in their scarcity, or their lack of presence. With the advent of the world’s greatest copying machine, no digital object need be scarce. It is not the object that has value, because it can be infinitely copied for near zero cost, it is the skill and expertise in putting the object together that has value. The argument of the “commonists” is that you will spend more on using licences and secrecy to protect objects than you could be making by finding the people who need your skills to make just the thing that they need, right now. If this is true it presumably holds for data, for scientific papers, for photos, for video, for software, for books, and for logos.

The argument that I try to promote (and many others do much better) is that we need to get away from the concepts and language of ownership of these digital objects. That even thinking in terms of it being “mine” is counterproductive and actually reduces value. It may be the case that there are limits to where these arguments hold, and if there is it probably has something to do with the intrinsic timeframe of the production cycle for a class of objects, but that is a thought for another time. What worried me was that people seemed to be using language that is driven by thinking about propery and scarcity; “theft”, “stealing”. In my view we should be talking about “service quality”, “delivery time”, and “availability”. This is where value lies on the net, not in control, and not in ownership of objects.

None of which is to say that people should not be completely free to license work which they produce in any way that they choose, and I will defend their right to do this. But at the same time I will work to persuade these same people that some types of license are counterproductive, particularly those that attempt to control content. If you beleive that science is better for the things that make it up being shared and re-used, that the value of a person’s work is increased by others re-using this why shouldn’t that apply to other types of work? The key thing is a consistent and clear message.

I try to be consistent, and I am by no means always successful, but its a work in progress.  Anyone is free to re-use and re-purpose anything I generate in whatever way they choose. If I disagree with the use I will say so. If it is unattributed I might comment, and I might name names, but I won’t call in the lawyers. If I am inconsistent I invite, and indeed expect, people to say so. I would hope that criticism would come from the friendly faces before it comes from people with another agenda. That, at the end of the day, is the main benefit of being open. It’s all just error checking in the end.

Data is free or hidden – there is no middle ground

Science commons and other are organising a workshop on Open Science issues as a satellite meeting of the European Science Open Forum meeting in July. This is pitched as an opportunity to discuss issues around policy, funding, and social issues with an impact on the ‘Open Research Agenda’. In preparation for that meeting I wanted to continue to explore some of the conflicts that arise between wanting to make data freely available as soon as possible and the need to protect the interests of the researchers that have generated data and (perhaps) have a right to the benefits of exploiting that data.

John Cumbers proposed the idea of a ‘Protocol’ for open science that included the idea of a ‘use embargo’; the idea that when data is initially made available, no-one else should work on it for a specified period of time. I proposed more generally that people could ask that people leave data alone for any particular period of time, but that there ought to be an absolute limit on this type of embargo to prevent data being tied up. These kinds of ideas revolve around the need to forge community norms – standards of behaviour that are expected, and to some extent enforced, by a community. The problem is that these need to evolve naturally, rather than be imposed by committee. If there isn’t community buy in then proposed standards have no teeth.

An alternative approach to solving the problem is to adopt some sort ‘license’. A legal or contractual framework that creates obligation about how data can be used and re-used. This could impose embargoes of the type that John suggested, perhaps as flexible clauses in the license. One could imagine an ‘Open data – six month analysis embargo’ license. This is attractive because it apparently gives you control over what is done with your data while also allowing you to make it freely available. This is why people who first come to the table with an interest in sharing content always start with CC-BY-NC. They want everyone to have their content, but not to make money out of it. It is only later that people realise what other effects this restriction can have.

I had rejected the licensing approach because I thought it could only work in a walled garden, something which goes against my view of what open data is about. More recently John Wilbanks has written some wonderfully clear posts on the nature of the public domain, and the place of data in it, that make clear that it can’t even work in a walled garden. Because data is in the public domain, no contractual arrangement can protect your ability to exploit that data, it can only give you a legal right to punish someone who does something you haven’t agreed to. This has important consequences for the idea of Open Science licences and standards.

If we argue as an ‘Open Science Movement’ that data is in and must remain in the public domain then, if we believe this is in the common good, we should also argue for the widest possible interpretation of what is data. The results of an experiment, regardless of how clever its design might be, are a ‘fact of nature’, and therefore in the public domain (although not necessarily publically available). Therefore if any person has access to that data they can do whatever the like with it as long as they are not bound by a contractual arrangement. If someone breaks a contractual arrangement and makes the data freely available there is no way you can get that data back. You can punish the person who made it available if they broke a contract with you. But you can’t recover the data. The only way you can protect the right to exploit data is by keeping it secret. The is entirely different to creative content where if someone ignores or breaks licence terms then you can legally recover the content from anyone that has obtained it.

Why does this matter to the Open Science movement? Aren’t we all about making the data available for people to do whatever anyway? It matters because you can’t place any legal limitations on what people do with data you make available. You can’t put something up and say ‘you can only use this for X’ or ‘you can only use it after six months’ or even ‘you must attribute this data’. Even in a walled garden, once there is one hole, the entire edifice is gone. The only way we can protect the rights of those who generate data to benefit from exploiting it is through the hard work of developing and enforcing community norms that provide clear guidelines on what can be done. It’s that or simply keep the data secret.

What is important is that we are clear about this distinction between legal and ethical protections. We must not tell people that their data can be protected because essentially they can’t. And this is a real challenge to the ethos of open data because it means that our only absolutely reliable method for protecting people is by hiding data. Strong community norms will, and do, help but there is a need to be careful about how we encourage people to put data out there. And we need to be very strong in condemning people who do the ‘wrong’ thing. Which is why a discussion on what we believe is ‘right’ and ‘wrong’ behaviour is incredibly important. I hope that discussion kicks off in Barcelona and continues globally over the next few months. I know that not everyone can make the various meetings that are going on – but between them and the blogosphere and the ‘streamosphere‘ we have the tools, the expertise, and hopefully the will, to figure these things out.

Related articles

Zemanta Pixie

More on the science exchance – or building and capitalising a data commons

Image from Wikipedia via ZemantaBanknotes from all around the World donated by visitors to the British Museum, London

Following on from the discussion a few weeks back kicked off by Shirley at One Big Lab and continued here I’ve been thinking about how to actually turn what was a throwaway comment into reality:

What is being generated here is new science, and science isn’t paid for per se. The resources that generate science are supported by governments, charities, and industry but the actual production of science is not supported. The truly radical approach to this would be to turn the system on its head. Don’t fund the universities to do science, fund the journals to buy science; then the system would reward increased efficiency.

There is a problem at the core of this. For someone to pay for access to the results, there has to be a monetary benefit to them. This may be through increased efficiency of their research funding but that’s a rather vague benefit. For a serious charitable or commercial funder there has to be the potential to either make money, or at least see that the enterprise could become self sufficient. But surely this means monetizing the data somehow? Which would require restrictive licences, which is not at the end what we’re about.

The other story of the week has been the, in the end very useful, kerfuffle caused by ChemSpider moving to a CC-BY-SA licence, and the confusion that has been revealed regarding data, licencing, and the public domain. John Wilbanks, whose comments on the ChemSpider licence, sparked the discussion has written two posts [1, 2] which I found illuminating and have made things much clearer for me. His point is that data naturally belongs in the public domain and that the public domain and the freedom of the data itself needs to be protected from erosion, both legal, and conceptual that could be caused by our obsession with licences. What does this mean for making an effective data commons, and the Science Exchange that could arise from it, financially viable? Continue reading “More on the science exchance – or building and capitalising a data commons”

Protocols for Open Science

interior detail, stata center, MIT. just outside science commons offices.

One of the strong messages that came back from the workshop we held at the BioSysBio meeting was that protocols and standards of behaviour were something that people would appreciate having available. There are many potential issues that are raised by the idea of a ‘charter’ or ‘protocol’ for open science but these are definitely things that are worth talking about. I thought I would through a few ideas out and see where they go. There are some potentially serious contradictions to be worked through. Continue reading “Protocols for Open Science”