How to waste public money in one easy step…

Oleic Acid
Image via Wikipedia

Peter Murray-Rust has sparked off another round in the discussion of the value that publishers bring to the scholarly communication game and told a particular story of woe and pain inflicted by the incumbent publishers. On the day he posted that I had my own experience of just how inefficient and ineffective our communication systems are by wasting the better part of the day trying to find some information. I thought it might be fun to encourage people to post their own stories of problems and frustrations with access to the literature and the downstream issues that creates, so here is mine.

I am by no means a skilled organic chemist but I’ve done a bit of synthesis in my time and I certainly know enough to be able to read synthetic chemistry papers and decide whether a particular synthesis is accessible. So on this particular day I was interested in deciding whether it was easy or difficult to make deuterated mono-olein. This molecule can be made by connecting glycerol to oleic acid. Glycerol is cheap and I should have in my hands some deuterated oleic acid in the next month or so. The chemistry for connecting acids to alcohols is straightforward, I’ve even done it myself, but this is a slightly special case. Firstly the standard methods tend to be wasteful of the acid, which in my case is the expensive bit. The second issue is that glycerol has three alcohol groups. I only want to modify one, leaving the other two unchanged, so it is important to find a method that gives me mostly what I want and only a little of what I don’t.

So the question for me is: is there a high yielding reaction that will give me mostly what I want, while wasting as little as possible of the oleic acid? And if there is a good technique is it accessible given the equipment I have in the lab? Simple question, quick trip to Google Scholar, to find reams of likely looking papers, not one of which I had full text access to. The abstracts are nearly useless in this case because I need to know details of yields and methodology so I had several hundred papers, and no means of figuring out which might be worth an inter-library loan. I spent hours trying to parse the abstracts to figure out which were the most promising and in the end I broke…I asked someone to email me a couple of pdfs because I knew they had access. Bear in mind what I wanted to do was spend a quick 30 minutes or so to decide whether this was pursuing in detail. What is took was about three hours, which at full economic cost of my time comes to about £250. That’s about £200 of UK taxpayers money down the toilet because, on the site of the UKs premiere physical and biological research facilities I don’t have access to those papers. Yes I could have asked someone else to look but that would have taken up their time.

But you know what’s really infuriating. I shouldn’t even have been looking at the papers at all when I’m doing my initial search. What I should have been able to do was ask the question:

Show me all syntheses of mono-olein ranked first by purity of the product and secondly by the yield with respect to oleic acid.

There should be a database where I can get this information. In fact there is. But we can’t afford access to the ACS’ information services here. These are incredibly expensive because it used to be necessary for this information to be culled from papers by hand. But today that’s not necessary. It could be done cheaply and rapidly. In fact I’ve seen it done cheaply and rapidly by tools developed in Peter’s group that get around ~95% accuracy and ~80% recall over synthetic organic chemistry. Those are hit rates that would have solved my problem easily and effectively.

Unfortunately despite the fact those tools exist, despite the fact that they could be deployed easily and cheaply, and that they could save researchers vast amounts of time research is being held back by a lack of access to the literature, and where there is access by contracts that prevent us collating, aggregating, and analysing our own work. The public pays for the research to be done, the public pays for researchers to be able to read it, and in most cases the public has to pay again if they should want to read it. But what is most infuriating is the way the public pays yet again when I and a million other scientists waste our time, the public’s time, because the tools that exist and work cannot be deployed.

How many researchers in the UK or world wide are losing hours or even days every week because of these inefficiencies. How many new tools or techniques are never developed because they can’t legally be deployed? And how many hundreds of millions of dollars of public money does that add up to?

Enhanced by Zemanta

More on “theft” and the problem of identity

Following my hopefully getting towards three-quarters baked post there has been more helpful comments and discussion both here and on friendfeed. I wanted to pick out a specific issue that has come up in both places. At Friendfeed the discussion ran into the question of plagiarism more generally and why it is bad. Anders Norgaard made the point that plagiarism is bad regardless of whether it breaks rules or not and a discussion on why that is followed.  I think the conclusion we came to is that plagiarism reduces value by making it more difficult to find the right person with the right expertise when you need something done. It reduces the value of the body of work in helping you find the person who can do the job that you need doing.

David Crotty, in a comment on the blog post makes a comment that I think probes the same issues:

Do you mind if I start a blog called “Science in the open” and pretend that my name is “Cameron Neylon” and then fill that blog with dreadful, hateful nonsense? After all, your name and your blog’s name aren’t limited physical resources, right?    Does ownership extend to your online identity?  Isn’t using someone else’s logo a misrepresentation of identity?

Now this is important for two reasons, firstly because it probes the extreme end of my argument that “objects that can be infinitely copied should not be treated as property” and also because it revolves around the issue of identity. Reliable identity lies at the core of building the trust networks that make social web tools work. Does that mean it is one area where the full weight of property based law should be brought to bear? So I think this is worth unpicking in detail.

So, let’s start with the honest answer. If this happened I would be angry and upset. I would be likely to storm around the office/house a bit and possibly rant at people and objects that were unfortunate enough to cross my path. But after, hopefully, calming down a bit I hope I would follow something like the following course.

  1. Write the person a polite note explaining that they seem to have both the same name and same name of blog and that this probably is bad for both of us as there is the potential for confusion. Ambiguity is bad because it reduces trust in attribution. As I used these names first I would ask them to consider changing. I would assume it was a simple coincidence, a mistake made in good faith.
  2. If they did not I would dissociate myself publically from their work making a clear statement about where my work could be found. I would consider changing the name of my blog (after all it is the feed that people follow – does anyone care that much what it is called?), but not my name.
  3. If it was clear that this was a case of deliberate misrepresentation I would present the evidence that this was the case and request the help of the community to make that very publically clear.

My case is that allowing the free re-use of my name and my blog name ought to add value on average. Indeed my experience thus far is that, allowing people to use these names, to point to me and the work I have generated has indeed been net positive. I’ve never objected to people quoting me, using my name, reproducing blog posts, or whatever. Whether it’s “fair use” or a copyright violation, or appropriately licensed re-use is irrelevant. It’s all good because it brings more interested people to my blog and to me. One negative experience would probably not actually tip that balance.Several nasty ones might.

The key here is that the real resource is me. I am not infinitely replicable, no-one else can write my posts. The name is just a pointer. An important pointer and one which I will defend, in as much as I will try to make clear what I think and why I think it, as well as to be clear about who I am and why I say what I do. Someone who plagiarizes my work or reproduces it without attribution or someone who deliberately misrepresents what I write reduces the value of my work because they reduce the ability of people who are looking for someone with my expertise to use that work to find me.

But it is not the reproduction of work that is the problem here, it is the misrepresentation of its origin, either by an author falsely claiming it as theirs, or by some mis-attributing someone else’s work or views to me. The problem is not the act of copying but the act of lying. The problem with lying is that it reduces trust, the problem with reducing trust is that it reduces the value of the networks we used to find things that are useful and the people who have the expertise to make them. Identity is crucial to trust and trust is what adds value to networks. Very few things reduce the value of web based networks more effectively than lying about identity.

We will never build a perfect system that solves this problem. My belief though is that it will be more effective to build strong social and technical systems rather than to apply the rules of “ownership” to my name. Do I own my name? No idea. Will I defend my name and ask others to help me do that if someone attacks it? Yes. Will I use the best technical systems to try to be clear about who I am in all the places where I act? Well I could do better on this, but then a lot of us could really. I will work to build trust in my name, in my brand if you like, and if that trust is attacked I will defend it.

So where does this leave the story of Ricardo’s logo? Well the first point was the plagiarism of the image. This breaks the link between the image and the author which reduces its use to Ricardo. The lack of attribution means that people who think “what a cool logo” will not be able to find Ricardo to do them a cool logo of their own. But it is not the copying per se which does the damage but the plagiarism, the lack of attribution. Arguably, as the community leaps to Ricardo’s defence (and points out what a cool logo it is) he actually benefits from a raised profile across a wider community. I had seen a few examples of his work before but hadn’t realised how many he had done and how good they are. Ricardo pointed out in the original Friendfeed thread that the reason the image was copyright was that he was making a living at the time from design. It is not inconceivable he may be better placed to do that now than he was before the logo was misappropriated. That is for Ricardo to decide though, not me.

Does the use of the logo by a company selling hokum misrepresent Ricardo? Well given they didn’t attribute it to him not directly. But let’s imagine that the image was CC-BY and that the company did attribute it. Arguably Ricardo would not want to be associated with that and that would be fair enough but there wouldn’t be anything he could do about it from a legal perspective. Because the image is actually copyright all rights reserved he can prevent these kinds of re-use. Or can at least in principle. He retains control in a way that CC-BY licenses do not allow. My argument is that to legally defend this position would take much more money and energy than clearly and publically distancing yourself from the re-use of the work. And probably wouldn’t be much more effective. Furthermore my argument is that the good that comes from allowing re-use outweighs the bad. The re-use of your work actually gives you a platform to distance yourself from that re-use if you so choose. Once that is made clear it is just more good publicity for you.

More importantly if you believe, as I do, in the value of allowing re-use then you cannot reasonably pick and choose who and what re-uses are appropriate. Consistency requires that you allow re-use that you do and do not disagree with. I may not approve of that re-use, and it is perfectly reasonable to say so, but that gives me no right to object. To mis-quote Hall channelling Voltaire “I disagree with the way you have re-used my work, but I will defend your right to do so and the value you add by doing it ” – and no I will not defend it to the death. I don’t take it that seriously…