More on “theft” and the problem of identity

Following my hopefully getting towards three-quarters baked post there has been more helpful comments and discussion both here and on friendfeed. I wanted to pick out a specific issue that has come up in both places. At Friendfeed the discussion ran into the question of plagiarism more generally and why it is bad. Anders Norgaard made the point that plagiarism is bad regardless of whether it breaks rules or not and a discussion on why that is followed.  I think the conclusion we came to is that plagiarism reduces value by making it more difficult to find the right person with the right expertise when you need something done. It reduces the value of the body of work in helping you find the person who can do the job that you need doing.

David Crotty, in a comment on the blog post makes a comment that I think probes the same issues:

Do you mind if I start a blog called “Science in the open” and pretend that my name is “Cameron Neylon” and then fill that blog with dreadful, hateful nonsense? After all, your name and your blog’s name aren’t limited physical resources, right?    Does ownership extend to your online identity?  Isn’t using someone else’s logo a misrepresentation of identity?

Now this is important for two reasons, firstly because it probes the extreme end of my argument that “objects that can be infinitely copied should not be treated as property” and also because it revolves around the issue of identity. Reliable identity lies at the core of building the trust networks that make social web tools work. Does that mean it is one area where the full weight of property based law should be brought to bear? So I think this is worth unpicking in detail.

So, let’s start with the honest answer. If this happened I would be angry and upset. I would be likely to storm around the office/house a bit and possibly rant at people and objects that were unfortunate enough to cross my path. But after, hopefully, calming down a bit I hope I would follow something like the following course.

  1. Write the person a polite note explaining that they seem to have both the same name and same name of blog and that this probably is bad for both of us as there is the potential for confusion. Ambiguity is bad because it reduces trust in attribution. As I used these names first I would ask them to consider changing. I would assume it was a simple coincidence, a mistake made in good faith.
  2. If they did not I would dissociate myself publically from their work making a clear statement about where my work could be found. I would consider changing the name of my blog (after all it is the feed that people follow – does anyone care that much what it is called?), but not my name.
  3. If it was clear that this was a case of deliberate misrepresentation I would present the evidence that this was the case and request the help of the community to make that very publically clear.

My case is that allowing the free re-use of my name and my blog name ought to add value on average. Indeed my experience thus far is that, allowing people to use these names, to point to me and the work I have generated has indeed been net positive. I’ve never objected to people quoting me, using my name, reproducing blog posts, or whatever. Whether it’s “fair use” or a copyright violation, or appropriately licensed re-use is irrelevant. It’s all good because it brings more interested people to my blog and to me. One negative experience would probably not actually tip that balance.Several nasty ones might.

The key here is that the real resource is me. I am not infinitely replicable, no-one else can write my posts. The name is just a pointer. An important pointer and one which I will defend, in as much as I will try to make clear what I think and why I think it, as well as to be clear about who I am and why I say what I do. Someone who plagiarizes my work or reproduces it without attribution or someone who deliberately misrepresents what I write reduces the value of my work because they reduce the ability of people who are looking for someone with my expertise to use that work to find me.

But it is not the reproduction of work that is the problem here, it is the misrepresentation of its origin, either by an author falsely claiming it as theirs, or by some mis-attributing someone else’s work or views to me. The problem is not the act of copying but the act of lying. The problem with lying is that it reduces trust, the problem with reducing trust is that it reduces the value of the networks we used to find things that are useful and the people who have the expertise to make them. Identity is crucial to trust and trust is what adds value to networks. Very few things reduce the value of web based networks more effectively than lying about identity.

We will never build a perfect system that solves this problem. My belief though is that it will be more effective to build strong social and technical systems rather than to apply the rules of “ownership” to my name. Do I own my name? No idea. Will I defend my name and ask others to help me do that if someone attacks it? Yes. Will I use the best technical systems to try to be clear about who I am in all the places where I act? Well I could do better on this, but then a lot of us could really. I will work to build trust in my name, in my brand if you like, and if that trust is attacked I will defend it.

So where does this leave the story of Ricardo’s logo? Well the first point was the plagiarism of the image. This breaks the link between the image and the author which reduces its use to Ricardo. The lack of attribution means that people who think “what a cool logo” will not be able to find Ricardo to do them a cool logo of their own. But it is not the copying per se which does the damage but the plagiarism, the lack of attribution. Arguably, as the community leaps to Ricardo’s defence (and points out what a cool logo it is) he actually benefits from a raised profile across a wider community. I had seen a few examples of his work before but hadn’t realised how many he had done and how good they are. Ricardo pointed out in the original Friendfeed thread that the reason the image was copyright was that he was making a living at the time from design. It is not inconceivable he may be better placed to do that now than he was before the logo was misappropriated. That is for Ricardo to decide though, not me.

Does the use of the logo by a company selling hokum misrepresent Ricardo? Well given they didn’t attribute it to him not directly. But let’s imagine that the image was CC-BY and that the company did attribute it. Arguably Ricardo would not want to be associated with that and that would be fair enough but there wouldn’t be anything he could do about it from a legal perspective. Because the image is actually copyright all rights reserved he can prevent these kinds of re-use. Or can at least in principle. He retains control in a way that CC-BY licenses do not allow. My argument is that to legally defend this position would take much more money and energy than clearly and publically distancing yourself from the re-use of the work. And probably wouldn’t be much more effective. Furthermore my argument is that the good that comes from allowing re-use outweighs the bad. The re-use of your work actually gives you a platform to distance yourself from that re-use if you so choose. Once that is made clear it is just more good publicity for you.

More importantly if you believe, as I do, in the value of allowing re-use then you cannot reasonably pick and choose who and what re-uses are appropriate. Consistency requires that you allow re-use that you do and do not disagree with. I may not approve of that re-use, and it is perfectly reasonable to say so, but that gives me no right to object. To mis-quote Hall channelling Voltaire “I disagree with the way you have re-used my work, but I will defend your right to do so and the value you add by doing it ” – and no I will not defend it to the death. I don’t take it that seriously…

Southampton Open Science Workshop 31 August and 1 September

An update on the Workshop that I announced previously. We have a number of people confirmed to come down and I need to start firming up numbers. I will be emailing a few people over the weekend so sorry if you get this via more than one route. The plan of attack remains as follows:

Meet on evening of Sunday 31 August in Southampton, most likely at a bar/restaurant near the University to coordinate/organise the details of sessions.

Commence on Monday at ~9:30 and finish around 4:30pm (with the option of discussion going into the evening) with three or four sessions over the course of the day broadly divided into the areas of tools, social issues, and policy. We have people interested and expert in all of these areas coming so we should be able to to have a good discussion. The object is to keep it very informal but to keep the discussion productive. Numbers are likely to be around 15-20 people. For those not lucky enough to be in the area we will aim to record and stream the sessions, probably using a combination of dimdim, mogulus, and slideshare. Some of these may require you to be signed into our session so if you are interested drop me a line at the account below.

To register for the meeting please send me an email to my gmail account (cameronneylon). To avoid any potential confusion, even if you have emailed me in the past week or so about this please email again so that I have a comprehensive list in one place. I will get back to you with a request via PayPal for £15 to cover coffees and lunch for the day (so if you have a PayPal account you want to use please send the email from that address). If there is a problem with the cost please state so in your email and we will see what we can do. We can suggest options for accomodation but will ask you to sort it out for yourself.

I have set up a wiki to discuss the workshop which is currently completely open access. If I see spam or hacking problems I will close it down to members only (so it would be helpful if you could create an account) but hopefully it might last a few weeks in the open form. Please add your name and any relevant details you are happy to give out to the Attendees page and add any presentations or demos you would be interested in giving, or would be interested in hearing about, on the Programme suggestion page.

Policy and technology for e-science – A forum on on open science policy

I’m in Barcelona at a satellite meeting of the EuroScience Open Forum organised by Science Commons and a number of their partners.  Today is when most of the meeting will be with forums on ‘Open Access Today’, ‘Moving OA to the Scientific Enterprise:Data, materials, software’, ‘Open access in the the knowledge network’, and ‘Open society, open science: Principle and lessons from OA’. There is also a keynote from Carlos Morais-Pires of the European Commission and the lineup for the panels is very impressive.

Last night was an introduction and social kickoff as well. James Boyle (Duke Law School, Chair of board of directors of Creative Commons, Founder of Science commons) gave a wonderful talk (40 minutes, no slides, barely taking breath) where his central theme was the relationship between where we are today with open science and where international computer networks were in 1992. He likened making the case for open science today with that of people suggesting in 1992 that the networks would benefit from being made freely accessible, freely useable, and based on open standards. The fears that people have today of good information being lost in a deluge of dross, of their being large quantities of nonsense, and nonsense from people with an agenda, can to a certain extent be balanced against the idea that to put it crudely, that Google works. As James put it (not quite a direct quote) ‘You need to reconcile two statements; both true. 1) 99% of all material on the web is incorrect, badly written, and partial. 2) You probably  haven’t opened an encylopedia as a reference in ten year.

James gave two further examples, one being the availability of legal data in the US. Despite the fact that none of this is copyrightable in the US there are thriving businesses based on it. The second, which I found compelling, for reasons that Peter Murray-Rust has described in some detail. Weather data in the US is free. In a recent attempt to get long term weather data a research effort was charged on the order of $1500, the cost of the DVDs that would be needed to ship the data, for all existing US weather data. By comparison a single German state wanted millions for theirs. The consequence of this was that the European data didn’t go into the modelling. James made the point that while the European return on investment for weather data was a respectable nine-fold, that for the US (where they are giving it away remember) was 32 times. To me though the really compelling part of this argument is if that data is not made available we run the risk of being underwater in twenty years with nothing to eat. This particular case is not about money, it is potentially about survival.

Finally – and this you will not be surprised was the bit I most liked – he went on to issue a call to arms to get on and start building this thing that we might call the data commons. The time has come to actually sit down and start to take these things forward, to start solving the issues of reward structures, of identifying business models, and to build the tools and standards to make this happen. That, he said was the job for today. I am looking forward to it.

I will attempt to do some updates via twitter/friendfeed (cameronneylon on both) but I don’t know how well that will work. I don’t have a roaming data tariff and the charges in Europe are a killer so it may be a bit sparse.

Data is free or hidden – there is no middle ground

Science commons and other are organising a workshop on Open Science issues as a satellite meeting of the European Science Open Forum meeting in July. This is pitched as an opportunity to discuss issues around policy, funding, and social issues with an impact on the ‘Open Research Agenda’. In preparation for that meeting I wanted to continue to explore some of the conflicts that arise between wanting to make data freely available as soon as possible and the need to protect the interests of the researchers that have generated data and (perhaps) have a right to the benefits of exploiting that data.

John Cumbers proposed the idea of a ‘Protocol’ for open science that included the idea of a ‘use embargo’; the idea that when data is initially made available, no-one else should work on it for a specified period of time. I proposed more generally that people could ask that people leave data alone for any particular period of time, but that there ought to be an absolute limit on this type of embargo to prevent data being tied up. These kinds of ideas revolve around the need to forge community norms – standards of behaviour that are expected, and to some extent enforced, by a community. The problem is that these need to evolve naturally, rather than be imposed by committee. If there isn’t community buy in then proposed standards have no teeth.

An alternative approach to solving the problem is to adopt some sort ‘license’. A legal or contractual framework that creates obligation about how data can be used and re-used. This could impose embargoes of the type that John suggested, perhaps as flexible clauses in the license. One could imagine an ‘Open data – six month analysis embargo’ license. This is attractive because it apparently gives you control over what is done with your data while also allowing you to make it freely available. This is why people who first come to the table with an interest in sharing content always start with CC-BY-NC. They want everyone to have their content, but not to make money out of it. It is only later that people realise what other effects this restriction can have.

I had rejected the licensing approach because I thought it could only work in a walled garden, something which goes against my view of what open data is about. More recently John Wilbanks has written some wonderfully clear posts on the nature of the public domain, and the place of data in it, that make clear that it can’t even work in a walled garden. Because data is in the public domain, no contractual arrangement can protect your ability to exploit that data, it can only give you a legal right to punish someone who does something you haven’t agreed to. This has important consequences for the idea of Open Science licences and standards.

If we argue as an ‘Open Science Movement’ that data is in and must remain in the public domain then, if we believe this is in the common good, we should also argue for the widest possible interpretation of what is data. The results of an experiment, regardless of how clever its design might be, are a ‘fact of nature’, and therefore in the public domain (although not necessarily publically available). Therefore if any person has access to that data they can do whatever the like with it as long as they are not bound by a contractual arrangement. If someone breaks a contractual arrangement and makes the data freely available there is no way you can get that data back. You can punish the person who made it available if they broke a contract with you. But you can’t recover the data. The only way you can protect the right to exploit data is by keeping it secret. The is entirely different to creative content where if someone ignores or breaks licence terms then you can legally recover the content from anyone that has obtained it.

Why does this matter to the Open Science movement? Aren’t we all about making the data available for people to do whatever anyway? It matters because you can’t place any legal limitations on what people do with data you make available. You can’t put something up and say ‘you can only use this for X’ or ‘you can only use it after six months’ or even ‘you must attribute this data’. Even in a walled garden, once there is one hole, the entire edifice is gone. The only way we can protect the rights of those who generate data to benefit from exploiting it is through the hard work of developing and enforcing community norms that provide clear guidelines on what can be done. It’s that or simply keep the data secret.

What is important is that we are clear about this distinction between legal and ethical protections. We must not tell people that their data can be protected because essentially they can’t. And this is a real challenge to the ethos of open data because it means that our only absolutely reliable method for protecting people is by hiding data. Strong community norms will, and do, help but there is a need to be careful about how we encourage people to put data out there. And we need to be very strong in condemning people who do the ‘wrong’ thing. Which is why a discussion on what we believe is ‘right’ and ‘wrong’ behaviour is incredibly important. I hope that discussion kicks off in Barcelona and continues globally over the next few months. I know that not everyone can make the various meetings that are going on – but between them and the blogosphere and the ‘streamosphere‘ we have the tools, the expertise, and hopefully the will, to figure these things out.

Related articles

Zemanta Pixie

Attribution for all! Mechanisms for citation are the key to changing the academic credit culture

A reviewer at the National Institutes of Health evaluates a grant proposal.Image via Wikipedia

Once again a range of conversations in different places have collided in my feed reader. Over on Nature Networks, Martin Fenner posted on Researcher ID which lead to a discussion about attribution and in particular Martin’s comment that there was a need to be able to link to comments and the necessity of timestamps. Then DrugMonkey posted a thoughtful blog about the issue of funding body staff introducing ideas from unsuccessful grant proposals they have handled to projects which they have a responsibility in guiding. Continue reading “Attribution for all! Mechanisms for citation are the key to changing the academic credit culture”

Somewhat more complete report on BioSysBio workshop

The Queen's Tower, Imperial CollegeImage via Wikipedia

This has taken me longer than expected to write up. Julius Lucks, John Cumbers, and myself lead a workshop on Open Science on Monday 21st at the BioSysBio meeting at Imperial College London.  I had hoped to record screencast, audio, and possibly video as well but in the end the laptop I am working off couldn’t cope with both running the projector and Camtasia at the same time with reasonable response rates (its a long story but in theory I get my ‘proper’ laptop back tomorrow so hopefully better luck next time). We had somewhere between 25 and 35 people throughout most of the workshop and the feedback was all pretty positive. What I found particularly exciting was that, although the usual issues of scooping, attribution, and the general dishonestly of the scientific community were raised, they were only in passing, with a lot more of the discussion focussing on practical issues. Continue reading “Somewhat more complete report on BioSysBio workshop”

BioSysBio conference and workshop

Tomorrow myself and a few of the usual suspects, who I have finally met in person are giving a workshop on ‘Open Science’ as part of BioSysBio 2008. If anyone else who I haven’t met yet is about at the meeting then feel free to introduce yourself, even if you can’t make it to the workshop. The workshop abstract is up on OpenWetWare if you want to have a look. I hope to be able to record screencast and video of the session to make it available to all of you who can’t make it. If you want to make comments in advance or raise any issues then drop a comment here or in the usual places.