A Prison Dilemma

Saint Foucault

I am currently on holiday. You can tell this because I’m writing, reading and otherwise doing things that I regard as fun. In particular I’ve been catching up on some reading. I’ve been meaning to read Danah Boyd‘s It’s Complicated for some time (and you can see some of my first impressions in the previous post) but I had held off because I wanted to buy a copy.

That may seem a strange statement. Danah makes a copy of the book available on her website as a PDF (under a CC BY-NC license) so I could (and in the end did) just grab a copy from there. But when it comes to books like this I prefer to pay for a copy, particularly where the author gains a proportion of their livelihood from publishing. Now I could buy a hardback or paperback edition but we have enough physical books. I can buy a Kindle edition from Amazon.co.uk but I object violently to paying a price similar to the paperback for something I can only read within Amazon software or hardware, and where Amazon can remove my access at any time.

In the end I gave up – I downloaded the PDF and read that. As I read it I found a quote that interested me. The quote was from Michel Foucault’s Discipline and Punish, a study of the development of the modern prison system – the quote if anyone is interested was about people’s response to being observed and was interesting in the context of research assessment.

Once I’d embarrassed myself by asking a colleague who knows about this stuff whether Foucault was someone you read, or just skimmed the summary version, I set out again to find myself a copy. Foucault died in 1984 so I’m less concerned about paying for a copy but would have been happy to buy a reasonably priced and well formatted ebook. But again the only source was Amazon. In this case its worse than for Boyd’s book. You can only buy the eBook from the US Amazon store, which requires a US credit card. Even if I was happy with the Amazon DRM and someone was willing to buy the copy for me I would be technically violating territorial rights in obtaining that copy.

It was ironic that all this happened the same week that the European Commission released its report on submissions to the Public Consultation on EU Copyright Rules. The report quickly develops a pattern. Representatives on public groups, users and research users describe a problem with the current way that copyright works. Publishers and media organisations say there is no problem. This goes on and on for virtually every question asked:

In the print sector, book publishers generally consider that territoriality is not a factor in their business, as authors normally provide a worldwide exclusive licence to the publishers for a certain language. Book publishers state that only in the very nascent eBooks markets some licences are being territorially restricted.

As a customer I have to say its a factor for me. I can’t get the content in the form I want. I can’t get it with the rights I want, which means I can’t get the functionality I want. And I often can’t get it in the place I want. Maybe my problem isn’t important enough or there aren’t enough people like me for publishers to care. But with traditional scholarly monograph publishing apparently in a death spiral it seems ironic that these markets aren’t being actively sought out. When books only sell a few hundred copies every additional sale should matter. When books like Elinor Ostrom’s Governing the Commons aren’t easily available then significant revenue opportunities are being lost.

Increasingly it is exactly the relevant specialist works in social sciences and humanities that I’m interested in getting my hands on. I don’t have access to an academic library, the nearest I might get access to is a University focussed on science and technology and in any case the chance of any specific scholarly monograph being in a given academic library is actually quite low. Inter-library loans are brilliant but I can’t wait a week to check something.

I spent nearly half a day trying to find a copy of Foucault’s book that was in the form I wanted with the rights I wanted. I’ve spent hours trying to find a copy of Ostrom’s as well. In both cases it is trivial to find a copy online – took me around 30 seconds. In both cases its relatively easy to find a second hand print copy. I guess for traditional publishers its easy to dismiss me as part of a small market, one that’s hard to reach and not worth the effort. After all, what would I know, I’m just the customer.

 

Driving UK Research – Is copyright a help or a hindrance?

© is the copyright symbol
Image via Wikipedia

The following is my contribution to a collection prepared by the British Library and released today at the Wellcome Trust, called “Driving UK Research. Is copyright a help or a hindrance?”  - Press Release – Document[pdf] – which is being released under a CC-BY-NC license. The British Library kindly allowed authors to retain copyright on their contributions so I am here releasing the text into the public domain via a CCZero waiver. I would also like to acknowledge the contribution of Chris Morrison in editing and improving the piece.

If I want to be confident that this text will be used to its full extent  I am going to have to republish it separately to this collection. Not because the collection uses  restrictive rights management or licences, it actually uses a relatively liberal copyright licence. No, the problem is copyright itself and the way it interacts with how we create knowledge in the 21st century.

Until recently we would use texts or data by reading, taking notes, making photocopies, and then writing down new insights. We would refer to the originals by citing them. A person making limited copies or taking notes (perhaps quoting the text) does not breach copyright because of the notion of “fair dealing”. Making copies of reasonable portions of a work is explicitly not a violation of copyright. If it were we wouldn’t be able to do any useful work at all.

Today, scholarship and research cannot effectively proceed via manual human processes. There is simply too much for us to handle. On the other hand we have excellent computer systems that can, to some extent at least, take these notes for us. Automated assistants that can read the text for us, that can do text mining, data aggregation and indexing allowing us to cope with the volume of information. As these tools improve we have an opportunity to radically increase the speed of the innovation cycle, using the human brain for what it is best at: insight and creative thinking; and using machines for what they are best at: indexing, checking, collecting.

The problem is that to do this those machines need to take a copy of the whole of the text and in doing so they trigger copyright. Even though the collection you are reading is released under a Creative Commons licence that allows non-commercial use, no-one can take a copy, find an interesting sentence, and then index it if they are going to make money. Google are not allowed to check what is here and index it for us.

Or perhaps they are. Perhaps this does come under “fair use” in the US. Or maybe it does, but not in the UK. What about Australia? Or Brazil? All with slightly different copyright law and a slightly different relationship between copyright and contract law. Even if current legal opinion says it is allowed a future court case could change that. The only way I can be sure that my text is available into the future is to give up the copyright altogether.

To build effectively on the scientific and cultural data being generated today we need computers. If a human were doing the job it would clearly be covered by fair dealing. What we need is a clear and explicit statement that machine based analysis for the purpose of indexing, mining, or collecting references is a fair dealing exception, even where a full copy is taken. There clearly need to be boundaries. The entire work should not be kept or distributed. As with existing fair dealing we could have guidelines on amounts kept or quoted: perhaps no more than 5% of a work. These could easily be developed and be compatible with existing fair dealing guidance.

We risk stifling the development of new tools, both commercial and academic, and new knowledge under the weight of a legal regime that was designed to cope with the printing press. At the same time a simple statement that this kind of analysis is fair dealing will provide certainty without damaging the interests of copyright holders or complicating copyright law. These new uses will ultimately bring more traffic, and perhaps more customers, to the primary documents. By taking the simple and easy step of making automated analysis an allowable fair dealing exception everyone wins.

Enhanced by Zemanta

An open letter to Lord Mandelson

Lord Mandelson is the UK minister for Business Innovation and Skills which includes the digital infrastructure remit. He recently announced that a version of the “three strikes” approach to combatting illegal firesharing, with the sanction being removal of internet access, would be applied in the UK. This is a copy of a letter I have sent to Lord Mandelson via the wonderful site www.writetothem.com that provides an easy way to write to UK parliamentarians. If you have an interest in the issue I suggest you do the same.

Lord Mandelson

House of Lords

Palace of Westminster

4 September 2009

Dear Lord Mandelson

I am writing to protest the decision taken by yourself to impose a “three strikes” approach to online rights and monopoly violations with an ultimate sanction requiring service providers to remove internet access. I am not a UK citizen but have lived in the UK for ten years and regard it as my home. I have a direct interest in the use of new technologies for communication, particularly in scientific research, and a vested interest in the long term competitiveness of the UK and its ability to support continued innovation in this area.

Your decision is wrong. Not because copyright violation should be allowed or respected and not because the main stream content industry should be ashamed that it makes money. It is wrong because it will stifle the development of new forms of creativity and the development of entirely new industries. As an advocate of Open Access scientific publication and copyright reform I am critical of the the current system of rights and monopolies but I work hard to respect the rights of content producers. And it is very hard work. Even as someone with some expertise in copyright and licensing, to do this right, requires time and effort. When I write, or prepare presentations, I spend significant amounts of time identifying work I can re-use, checking that licences are compatible, and making sure I license my own derivative work in a way that respects the rights of those people  whose work I have built on.

New forms of creativity are developing that re-use and re-purpose existing content but in fact this is not new at all. Re-use and re-purposing in culture has a grand tradition from Homer, via Don Quixote to Romeo and Juliet, from Brahms’ Haydn variations to Hendrix’s version of the Star Spangled Banner. In my own field all science and technology is derivative. It builds constantly on the work of others. But the internet makes new forms of re-use possible. New types of value creation are also made possible.  Re-use of images, video, and text, as well as ideas and data are enabling the development of new forms of business, new types of innovation in ways that are very challenging to predict. Your proposal will stifle this innovation by creating an environment of fear around re-use and by privileging certain classes and types of content and producer over the generators of new and innovative products. Those who do not care will ignore and circumvent the rules by technical means. And those who are exploring new types of derivative work, new types of innovative content, will be discouraged by the atmosphere of fear and uncertainty created by your policy.

Nonetheless it is important that the rights of content producers are respected. The key is finding the right balance between the needs to existing industries and individuals involved in the creation of new content and new industries. I would suggest that the key to any protection mechanism is parity. Large and traditional content producers, if given additional rights over those currently provided by law, must also respect equivalent rights for the small and new media producer.

This can be simply achieved by providing a similar three strikes mechanisms for traditional media. Thus if a television broadcaster uses, without appropriate attributions or licensing, video, images, or text taken by an individual then they should have their broadcast licence revoked. Similarly if print media utilise text from bloggers or Wikipedia without appropriate licensing or attribution, then the rights holders should be able to revoke their paper supply. Paper suppliers to the print media would be required to implement systems to enable online authors to register complaints and would be responsible for imposing these sanctions.

Clearly such a system is farcical, creating a nightmare of bureaucracy and heavy handed sanctions that stifle experimentation and economic activity. Yet it is analogous to what you have proposed. Only you are imposing this to protect a mature set of industries with no real long term growth potential while stifling the potential of a whole new class of industries and innovation with massive growth potential over the next few decades.

Your proposal is wrong for purely economic reasons. It is wrong because it will stifle a major opportunity for economic growth right at the point where we need it most. And it is wrong because as a government your role is not to legislate to protect business models but to regulate in a way that balances the risks of damage in one sector against the potential for encouraging new sectors to develop. I respectfully suggest that you have got that balance wrong.  I disagreed with much in Lord Carter’s report but perhaps the best measure of its balance was the equally vociferous criticism it received from both sides of the debate. This to me suggests that it forms a productive basis on which to move forward.

Yours sincerely

Cameron Neylon

More on “theft” and the problem of identity

Following my hopefully getting towards three-quarters baked post there has been more helpful comments and discussion both here and on friendfeed. I wanted to pick out a specific issue that has come up in both places. At Friendfeed the discussion ran into the question of plagiarism more generally and why it is bad. Anders Norgaard made the point that plagiarism is bad regardless of whether it breaks rules or not and a discussion on why that is followed.  I think the conclusion we came to is that plagiarism reduces value by making it more difficult to find the right person with the right expertise when you need something done. It reduces the value of the body of work in helping you find the person who can do the job that you need doing.

David Crotty, in a comment on the blog post makes a comment that I think probes the same issues:

Do you mind if I start a blog called “Science in the open” and pretend that my name is “Cameron Neylon” and then fill that blog with dreadful, hateful nonsense? After all, your name and your blog’s name aren’t limited physical resources, right?    Does ownership extend to your online identity?  Isn’t using someone else’s logo a misrepresentation of identity?

Now this is important for two reasons, firstly because it probes the extreme end of my argument that “objects that can be infinitely copied should not be treated as property” and also because it revolves around the issue of identity. Reliable identity lies at the core of building the trust networks that make social web tools work. Does that mean it is one area where the full weight of property based law should be brought to bear? So I think this is worth unpicking in detail.

So, let’s start with the honest answer. If this happened I would be angry and upset. I would be likely to storm around the office/house a bit and possibly rant at people and objects that were unfortunate enough to cross my path. But after, hopefully, calming down a bit I hope I would follow something like the following course.

  1. Write the person a polite note explaining that they seem to have both the same name and same name of blog and that this probably is bad for both of us as there is the potential for confusion. Ambiguity is bad because it reduces trust in attribution. As I used these names first I would ask them to consider changing. I would assume it was a simple coincidence, a mistake made in good faith.
  2. If they did not I would dissociate myself publically from their work making a clear statement about where my work could be found. I would consider changing the name of my blog (after all it is the feed that people follow – does anyone care that much what it is called?), but not my name.
  3. If it was clear that this was a case of deliberate misrepresentation I would present the evidence that this was the case and request the help of the community to make that very publically clear.

My case is that allowing the free re-use of my name and my blog name ought to add value on average. Indeed my experience thus far is that, allowing people to use these names, to point to me and the work I have generated has indeed been net positive. I’ve never objected to people quoting me, using my name, reproducing blog posts, or whatever. Whether it’s “fair use” or a copyright violation, or appropriately licensed re-use is irrelevant. It’s all good because it brings more interested people to my blog and to me. One negative experience would probably not actually tip that balance.Several nasty ones might.

The key here is that the real resource is me. I am not infinitely replicable, no-one else can write my posts. The name is just a pointer. An important pointer and one which I will defend, in as much as I will try to make clear what I think and why I think it, as well as to be clear about who I am and why I say what I do. Someone who plagiarizes my work or reproduces it without attribution or someone who deliberately misrepresents what I write reduces the value of my work because they reduce the ability of people who are looking for someone with my expertise to use that work to find me.

But it is not the reproduction of work that is the problem here, it is the misrepresentation of its origin, either by an author falsely claiming it as theirs, or by some mis-attributing someone else’s work or views to me. The problem is not the act of copying but the act of lying. The problem with lying is that it reduces trust, the problem with reducing trust is that it reduces the value of the networks we used to find things that are useful and the people who have the expertise to make them. Identity is crucial to trust and trust is what adds value to networks. Very few things reduce the value of web based networks more effectively than lying about identity.

We will never build a perfect system that solves this problem. My belief though is that it will be more effective to build strong social and technical systems rather than to apply the rules of “ownership” to my name. Do I own my name? No idea. Will I defend my name and ask others to help me do that if someone attacks it? Yes. Will I use the best technical systems to try to be clear about who I am in all the places where I act? Well I could do better on this, but then a lot of us could really. I will work to build trust in my name, in my brand if you like, and if that trust is attacked I will defend it.

So where does this leave the story of Ricardo’s logo? Well the first point was the plagiarism of the image. This breaks the link between the image and the author which reduces its use to Ricardo. The lack of attribution means that people who think “what a cool logo” will not be able to find Ricardo to do them a cool logo of their own. But it is not the copying per se which does the damage but the plagiarism, the lack of attribution. Arguably, as the community leaps to Ricardo’s defence (and points out what a cool logo it is) he actually benefits from a raised profile across a wider community. I had seen a few examples of his work before but hadn’t realised how many he had done and how good they are. Ricardo pointed out in the original Friendfeed thread that the reason the image was copyright was that he was making a living at the time from design. It is not inconceivable he may be better placed to do that now than he was before the logo was misappropriated. That is for Ricardo to decide though, not me.

Does the use of the logo by a company selling hokum misrepresent Ricardo? Well given they didn’t attribute it to him not directly. But let’s imagine that the image was CC-BY and that the company did attribute it. Arguably Ricardo would not want to be associated with that and that would be fair enough but there wouldn’t be anything he could do about it from a legal perspective. Because the image is actually copyright all rights reserved he can prevent these kinds of re-use. Or can at least in principle. He retains control in a way that CC-BY licenses do not allow. My argument is that to legally defend this position would take much more money and energy than clearly and publically distancing yourself from the re-use of the work. And probably wouldn’t be much more effective. Furthermore my argument is that the good that comes from allowing re-use outweighs the bad. The re-use of your work actually gives you a platform to distance yourself from that re-use if you so choose. Once that is made clear it is just more good publicity for you.

More importantly if you believe, as I do, in the value of allowing re-use then you cannot reasonably pick and choose who and what re-uses are appropriate. Consistency requires that you allow re-use that you do and do not disagree with. I may not approve of that re-use, and it is perfectly reasonable to say so, but that gives me no right to object. To mis-quote Hall channelling Voltaire “I disagree with the way you have re-used my work, but I will defend your right to do so and the value you add by doing it ” – and no I will not defend it to the death. I don’t take it that seriously…

My Bad…or how far should the open mindset go?

So while on the train yesterday in somewhat pre-caffeinated state I stuck my foot in it somewhat. Several others have written (Nils Reinton, Bill Hooker, Jon Eisen, Hsien-Hsien Lei, Shirley Wu) on the unattributed use of an image that was put together by Ricardo Vidal for the DNA Network of blogs. The company that did this are selling hokum. No question of that. Now the logo is in fact clearly marked as copyright on Flickr but even if it were marked as CC-BY then the company would be in violation of the license for not attributing. But, despite the fact that it is clearly technically wrong, I felt that the outrage being expressed was inconsistent with the general attitude that materials should be shared, re-useable, and available for re-purposing.

So in the related Friendfeed thread I romped in, offended several people (particularly by using the word hypocritical which I should not have done, like I said, pre-caffeine) and had to back up and re-think what it was I was trying to say. Actually this is a good thing about Friendfeed, the rapid fire discussion can encourage semi-baked comments and ideas which are then leapt on and need to be more carefully thought through and refined. In science criticism is always valuable, agreement is often a waste of time.

So at core my concern is largely about the apparent message that can be sent by a group of “open” activists objecting about the violation of the copyright of a member of their community. As I wrote further down in the comments;

“…There is a danger that this kind of thing comes across as ‘everything should be pd [pubic domain] but when my mate copyrights something and you violate it I will jump down your throat’. The subtext being it is ok to violate copyright for ‘good’ reasons but not for ‘bad’ reasons… “

It is crucially important to me that when you argue that an area of law is poorly constructed, ineffective or having unexpected consequences, that you scrupulously operate within that law, while not criticising those who cut corners. At the same time if I argue that the risks of having people ‘steal’ my work are outweighed by the benefits of sharing then I should roll with the punches when bad stuff does happen.There is the specific issue that what was done is a breach of copyright as well and then the general issue that if people were more able to do this kind of thing that it would be good. The fact that it was used for a nasty service preying on people’s fears is at one level neither here nor there (or rather the moral rights issue is I think a separate, and rather complicated one that will not fit in this particular margin, does the use of the logo misrepresent Ricardo? Does it misrepresent the DNA network – who remember don’t own it?).

More broadly I think there is a mindset that goes with the way the web works and the way that sharing works that means we need to get away from the idea of the object or the work as property.The value of objects lies only in their scarcity, or their lack of presence. With the advent of the world’s greatest copying machine, no digital object need be scarce. It is not the object that has value, because it can be infinitely copied for near zero cost, it is the skill and expertise in putting the object together that has value. The argument of the “commonists” is that you will spend more on using licences and secrecy to protect objects than you could be making by finding the people who need your skills to make just the thing that they need, right now. If this is true it presumably holds for data, for scientific papers, for photos, for video, for software, for books, and for logos.

The argument that I try to promote (and many others do much better) is that we need to get away from the concepts and language of ownership of these digital objects. That even thinking in terms of it being “mine” is counterproductive and actually reduces value. It may be the case that there are limits to where these arguments hold, and if there is it probably has something to do with the intrinsic timeframe of the production cycle for a class of objects, but that is a thought for another time. What worried me was that people seemed to be using language that is driven by thinking about propery and scarcity; “theft”, “stealing”. In my view we should be talking about “service quality”, “delivery time”, and “availability”. This is where value lies on the net, not in control, and not in ownership of objects.

None of which is to say that people should not be completely free to license work which they produce in any way that they choose, and I will defend their right to do this. But at the same time I will work to persuade these same people that some types of license are counterproductive, particularly those that attempt to control content. If you beleive that science is better for the things that make it up being shared and re-used, that the value of a person’s work is increased by others re-using this why shouldn’t that apply to other types of work? The key thing is a consistent and clear message.

I try to be consistent, and I am by no means always successful, but its a work in progress.  Anyone is free to re-use and re-purpose anything I generate in whatever way they choose. If I disagree with the use I will say so. If it is unattributed I might comment, and I might name names, but I won’t call in the lawyers. If I am inconsistent I invite, and indeed expect, people to say so. I would hope that criticism would come from the friendly faces before it comes from people with another agenda. That, at the end of the day, is the main benefit of being open. It’s all just error checking in the end.

Data is free or hidden – there is no middle ground

Science commons and other are organising a workshop on Open Science issues as a satellite meeting of the European Science Open Forum meeting in July. This is pitched as an opportunity to discuss issues around policy, funding, and social issues with an impact on the ‘Open Research Agenda’. In preparation for that meeting I wanted to continue to explore some of the conflicts that arise between wanting to make data freely available as soon as possible and the need to protect the interests of the researchers that have generated data and (perhaps) have a right to the benefits of exploiting that data.

John Cumbers proposed the idea of a ‘Protocol’ for open science that included the idea of a ‘use embargo’; the idea that when data is initially made available, no-one else should work on it for a specified period of time. I proposed more generally that people could ask that people leave data alone for any particular period of time, but that there ought to be an absolute limit on this type of embargo to prevent data being tied up. These kinds of ideas revolve around the need to forge community norms – standards of behaviour that are expected, and to some extent enforced, by a community. The problem is that these need to evolve naturally, rather than be imposed by committee. If there isn’t community buy in then proposed standards have no teeth.

An alternative approach to solving the problem is to adopt some sort ‘license’. A legal or contractual framework that creates obligation about how data can be used and re-used. This could impose embargoes of the type that John suggested, perhaps as flexible clauses in the license. One could imagine an ‘Open data – six month analysis embargo’ license. This is attractive because it apparently gives you control over what is done with your data while also allowing you to make it freely available. This is why people who first come to the table with an interest in sharing content always start with CC-BY-NC. They want everyone to have their content, but not to make money out of it. It is only later that people realise what other effects this restriction can have.

I had rejected the licensing approach because I thought it could only work in a walled garden, something which goes against my view of what open data is about. More recently John Wilbanks has written some wonderfully clear posts on the nature of the public domain, and the place of data in it, that make clear that it can’t even work in a walled garden. Because data is in the public domain, no contractual arrangement can protect your ability to exploit that data, it can only give you a legal right to punish someone who does something you haven’t agreed to. This has important consequences for the idea of Open Science licences and standards.

If we argue as an ‘Open Science Movement’ that data is in and must remain in the public domain then, if we believe this is in the common good, we should also argue for the widest possible interpretation of what is data. The results of an experiment, regardless of how clever its design might be, are a ‘fact of nature’, and therefore in the public domain (although not necessarily publically available). Therefore if any person has access to that data they can do whatever the like with it as long as they are not bound by a contractual arrangement. If someone breaks a contractual arrangement and makes the data freely available there is no way you can get that data back. You can punish the person who made it available if they broke a contract with you. But you can’t recover the data. The only way you can protect the right to exploit data is by keeping it secret. The is entirely different to creative content where if someone ignores or breaks licence terms then you can legally recover the content from anyone that has obtained it.

Why does this matter to the Open Science movement? Aren’t we all about making the data available for people to do whatever anyway? It matters because you can’t place any legal limitations on what people do with data you make available. You can’t put something up and say ‘you can only use this for X’ or ‘you can only use it after six months’ or even ‘you must attribute this data’. Even in a walled garden, once there is one hole, the entire edifice is gone. The only way we can protect the rights of those who generate data to benefit from exploiting it is through the hard work of developing and enforcing community norms that provide clear guidelines on what can be done. It’s that or simply keep the data secret.

What is important is that we are clear about this distinction between legal and ethical protections. We must not tell people that their data can be protected because essentially they can’t. And this is a real challenge to the ethos of open data because it means that our only absolutely reliable method for protecting people is by hiding data. Strong community norms will, and do, help but there is a need to be careful about how we encourage people to put data out there. And we need to be very strong in condemning people who do the ‘wrong’ thing. Which is why a discussion on what we believe is ‘right’ and ‘wrong’ behaviour is incredibly important. I hope that discussion kicks off in Barcelona and continues globally over the next few months. I know that not everyone can make the various meetings that are going on – but between them and the blogosphere and the ‘streamosphere‘ we have the tools, the expertise, and hopefully the will, to figure these things out.

Related articles

Zemanta Pixie