Binary decisions are a real problem in a grey-scale world

Peer Review Monster
Image by Gideon Burton via Flickr

I recently made the most difficult decision I’ve had to take thus far as a journal editor. That decision was ultimately to accept the paper; that probably doesn’t sound like a difficult decision until I explain that I made this decision despite a referee saying I should reject the paper with no opportunity for resubmission not once, but twice.

One of the real problems I have with traditional pre-publication peer review is the way it takes a very nuanced problem around a work which has many different parts and demands that you take a hard yes/no decision. I could point to many papers that will probably remain unpublished where the methodology or the data might have been useful but there was disagreement about the interpretation. Or where there was no argument except that perhaps this was the wrong journal (with no suggestion of what the right one might be). Recently we had a paper rejected because we didn’t try to make up some spurious story about the biological reason for an interesting physical effect. Of course, we wanted to publish in a biologically slanted journal because that’s where it might come to the attention of people with ideas about what the biological relevance was.

So the problem is two-fold. Firstly that the paper is set up in a way that requires it to go forward or to fail as a single piece, despite the fact that one part might remain useful while another part is clearly wrong. The second is that this decision is binary, there is no way to “publish with reservations about X”, in most cases indeed no way to even mark which parts of the paper were controversial within the review process.

Thus when faced with this paper where, in my opinion, the data reported were fundamentally sound and well expressed but the intepretation perhaps more speculative than the data warranted, I was torn. The guidelines of PLoS ONE are clear: conclusions must be supported by valid evidence. Yet the data, even if the conclusions are proven wrong, are valuable in their own right. The referee objected fundamentally to the strength of the conclusion as well as having some doubts about the way those conclusions were drawn.

So we went through a process of couching the conclusions in much more careful terms, a greater discussion of the caveats and alternative interpretations. Did this fundamentally change the paper? Not really. Did it take a lot of time? Yes, months in the end. But in the end it felt like a choice between making the paper fit the guidelines, or blocking the publication of useful data. I hope the disagreement over the interpretation of the results and even the validity of the approach will play out in the comments for the paper or in the wider literature.

Is there a solution? Well I would argue that if we published first and then reviewed later this would solve many problems. Continual review and markup as well as modification would match what we actually do as our ideas change and the data catches up and propels us onwards. But making it actually happen? Still very hard work and a long way off.

In any case, you can always comment on the paper if you disagree with me. I just have.

Enhanced by Zemanta

Metrics and Money

Crane Paper Company in Dalton produces the pap...
Image via Wikipedia

David Crotty, over at Scholarly Kitchen has an interesting piece on metrics, arguing that many of these have not been thought through because they don’t provide concrete motivation to researchers to care about them. Really he’s focused mainly on exchange mechanisms, means of persuading people that doing high quality review is worth their while by giving them something in exchange, but the argument extends to all sorts of metrics. Why would you care about any given measure if achieving on it doesn’t translate into more resources, time, or glory?

You might expect me to disagree with a lot of this but for the most part I don’t. Any credible metric has to be real, it has to mean something. It has to matter. This is why connecting funders, technologists, data holders, and yes, even publishers, is at the core of the proposal I’m working with at the moment. We need funders to want to have access to data and to want to reward performance on those measures. If there’s money involved then researchers will follow.

Any time someone talks about a “system” using a language of currency there is a key question you have to ask; can this “value” can be translated into real money. If it can’t then it is unlikely people will take it seriously. Currency has to be credible, it has to be taken seriously or it doesn’t work. How much is the cash in your pocket actually worth? Cash has to embody transferable value and many of these schemes don’t provide anything more than a basic barter.

But equally the measures of value, or of cost have to be real. Confidence in the reality of community measures is crucial, and this is where I part company with David, because at the centre of his argument is what seems to me a massive hole.

“The Impact Factor, flawed though it may be, at least tries to measure something that directly affects career advancement–the quality and impact of one’s research results.  It’s relevant because it has direct meaning toward determining the two keys to building a scientific career, jobs and funding.”

The second half of this I agree with (but resent). But it depends absolutely on the first part being widely believed. And the first part simply isn’t true. The Thomson Reuters Journal Impact Factor does not try to to measure the quality and impact of individual research results. TR are shouting this from the treetops at the moment. We know that it is at best an extremely poor measure of individual performance in practice. In economic terms, our dependence on the JIF is a bubble. And bubbles burst.

The reason people are working on metrics is because they figure that replacing one rotten measure at the heart of the system with ones that are self-evidently technically superior should be easy. Of course this isn’t true. Changing culture, particularly reward culture is very difficult. You have to tackle the self-reinforcement that these measures thrive on – and you need to work very carefully to allow the bubble to deflate in a controlled fashion.

There is one further point where I disagree with David. He asks a rhetorical question:

“Should a university deny tenure to a researcher who is a poor peer reviewer, even if he brings in millions of dollars in grants each year and does groundbreaking research?  Should the NIH offer to fund poorly designed research proposals simply because the applicant is well-liked and does a good job interpreting the work of others?”

It’s interesting that David even asks these questions, because the answers seem obvious, self evident even. The answer, at least the answer to the underlying question, is in both cases yes. The ultimate funders should fund people who excel at review even if they are poor at other parts of the enterprise. The work of review must be valued or it simply won’t be done. I have heard heads of university departments tell researchers to do less reviewing and write more grants. And I can tell you in the current UK research funding climate that review is well off the top of my priority list. If there is no support for reviewing then in the current economic climate we will see less of it done; if there is no space in our community for people who excel at reviewing then who will teach it? Or do we continue the pernicious myth that the successful scientist is a master of all of their trades? Aside from anything else basic economics tells us that specialisation leads to efficiency gains, even when one of the specialists is a superior practitioner in both areas. Shouldn’t we be seeking those efficiency gains?

Because the real question is not whether reviewers should be funded, by someone in some form, but what the relative size of that investment should be in a balanced portfolio that covers all the contributions needed across the scientific enterprise. The question is how we balance, these activities. How do we tension them? And the answer to that is that we need a market. And to have a functioning market we need a functioning currency. That currency may just be money, but reputation can be converted, albeit not directly, into funds. Successfully hacking research reputation will make a big difference to more effective tensioning between different important and valid scientific research roles, and that’s why people are so interested in trying to do it.

Enhanced by Zemanta

Some notes on Open Access Week

Open Access logo, converted into svg, designed...
Image via Wikipedia

Open Access Week kicks off for the fourth time tomorrow with events across the globe. I was honoured to be asked to contribute to the SPARC video that will be released tomorrow. The following are a transcription of my notes – not quite what I said but similar. The video was released at 9:00am US Eastern Time on Monday 18 October.

It has been a great year for Open Access. Open Access publishers are steaming ahead, OA mandates are spreading and growing and the quality and breadth of repositories is improving across institutions, disciplines, and nations. There have problems and controversies as well, many involving shady publishers seeking to take advantage of the Open Access brand, but even this in its way is a measure of success.

Beyond traditional publication we’ve also seen great strides made in the publication of a wider diversity of research outputs. Open Access to data, to software, and to materials is moving up the agenda. There have been real successes. The Alzheimer’s Disease Network showed what can change when sharing becomes a part of the process. Governments and Pharmaceutical companies are releasing data. Publicly funded researchers are falling behind by comparison!

For me although these big stories are important, and impressive, it is the little wins that matter. The thousands or millions of people who didn’t have to wait to read a paper, who didn’t need to write an email to get a dataset, who didn’t needlessly repeat and experiment known not to work. Every time a few minutes, a few hours, a few weeks, months, or years is saved we deliver more for the people who pay for this research. These small wins are the hardest to measure, and the hardest to explain, but they make up the bulk of the advantage that open approaches bring.

But perhaps the most important shift this year is something more subtle. Each morning I listen to the radio news, and every now and then there is a science story. These stories are increasingly prefaced with “…the research, published in the journal of…” and increasingly that journal is Open Access. A long running excuse for not referring the wider community to original literature has been its inaccessibility. That excuse is gradually disappearing. But more importantly there is a whole range of research outcomes that people, where they are interested, where they care enough to dig deeper, can inform themselves about. Research that people can use to reach their own conclusions about their health, the environment, technology, or society.

I find it difficult to see this as anything but a good thing, but nonetheless we need to recognize that it brings challenges. Challenges of explaining clearly, challenges in presenting the balance of evidence in a useful form, but above all challenges of how to effectively engage those members of the public who are interested in the details of the research. The web has radically changed the expectations of those who seek and interact with information. Broadcast is no longer enough. People expect to be able to talk back.

The last ten years of the Open Access movement has been about how to make it possible for people to touch, read, and interact with the outputs of research. Perhaps the challenge for the next ten years is to ask how we can create access opportunities to the research itself. This won’t be easy, but then nothing that is worthwhile ever is.

Open Access Week 2010 from SPARC on Vimeo.

Enhanced by Zemanta

The truth…well most of the truth anyway

Notebook collection
Image by Dvortygirl via Flickr

I had a pretty ropey day today. Failing for 45 minutes to do some simple algebra. Transformation overnight that didn’t work…again…and just generally being a bit down what with the whole imploding British science funding situation. But in the midst of this we did one very cool and simple experiment, one that worked, and one that actually has some potentially really interesting implications.

The only things is…I can’t tell you about it. It’s not my project and the result is potentially patentable. At one level this is great…”Neylon in idea that might have practical application shock!”…but as a person who believes in the value of sharing results and information as fast as is practical it is frustrating. Jean-Claude when talking about Open Notebook Science has used a description which I think captures the value here. “If it’s not in the notebook it hasn’t been done”. I can’t really live up to that statement.

I don’t have any simple answers to this. The world is not a simple place to live in in practice. My job is fundamentally about collaborating and supporting other scientists. I enjoy this but it does mean that a lot of what I do is really other people’s projects – and mostly ones where I don’t control the publication schedule. The current arrangement of the international IP system effectively mandates secrecy for as long as can be managed, exactly the opposite of what it was intended to do. The ONS badges developed by Jean-Claude, Andrew, and Shirley are great but they solve the communication problem, the one of explaining what I have done, not the philosophical one, that I can do what I feel I should. And for reasons too tedious to go into it’s not straightforward to put them on my notebook.

Like I say, life in the real world is complicated. As much as anything this is simply a marker to say that not everything I do is made public. I do what I can when I can, with what I can. But it’s a long way from perfect.

But hey, at least I had an idea that worked!

Enhanced by Zemanta

A little bit of federated Open Notebook Science

Girl Reading a Letter at an Open Window
Image via Wikipedia

Jean-Claude Bradley is the master when it comes to organising collaborations around diverse sets of online tools. The UsefulChem and Open Notebook Science Challenge projects both revolved around the use of wikis, blogs, GoogleDocs, video, ChemSpider and whatever tools are appropriate for the job at hand. This is something that has grown up over time but is at least partially formally organised. At some level the tools that get used are the ones Jean-Claude decides will be used and it is in part his uncompromising attitude to how the project works (if you want to be involved you interact on the project’s terms) that makes this work effectively.

At the other end of the spectrum is the small scale, perhaps random collaboration that springs up online, generates some data and continues (or not) towards something a little more organised. By definition such “projectlets” will be distributed across multiple services, perhaps uncoordinated, and certainly opportunistic. Just such a project has popped up over the past week or so and I wanted to document it here.

I have for some time been very interested in the potential of visualising my online lab notebook as a graph. The way I organise the notebook is such that it, at least in a sense, automatically generates linked data and for me this is an important part of its potential power as an approach. I often use a very old graph visualisation in talks I give out the notebook as a way of trying to indicate the potential which I wrote about previously, but we’ve not really taken it any further than that.

A week or so ago, Tony Hirst (@psychemedia) left a comment on a blog post which sparked a conversation about feeds and their use for generating useful information. I pointed Tony at the feeds from my lab notebook but didn’t take it any further than that. Following this he posted a series of graph visualisations of the connections between people tweeting at a set of conferences and then the penny dropped for me…sparking this conversation on twitter.

@psychemedia You asked about data to visualise. I should have thought about our lab notebook internal links! What formats are useful? [link]

@CameronNeylon if the links are easily scrapeable, it’s easy enough to plot the graph eg http://blog.ouseful.info/2010/08/30/the-structure-of-ouseful-info/ [link]

@psychemedia Wouldn’t be too hard to scrape (http://biolab.isis.rl.ac.uk/camerons_labblog) but could possibly get as rdf or xml if it helps? [link]

@CameronNeylon structured format would be helpful… [link]

At this point the only part of the whole process that isn’t publicly available takes place as I send an email to find out how to get an XML download of my blog and then report back  via Twitter.

@psychemedia Ok. XML dump at http://biolab.isis.rl.ac.uk/camerons_labblog/index.xml but I will try to hack some Python together to pull the right links out [link]

Tony suggests I pull out the date and I respond that I will try to get the relevant information into some sort of JSON format, and I’ll try to do that over the weekend. Friday afternoons being what they are and Python being what is I actually manage to do this much quicker than I expect and so I tweet that I’ve made the formatted data, raw data, and script publicly available via DropBox. Of course this is only possible because Tony tweeted the link above to his own blog describing how to pull out and format data for Gephi and it was easy for me to adapt his code to my own needs, an open source win if there ever was one.

Despite the fact that Tony took the time out to put the kettle on and have dinner and I went to a rehearsal by the time I went to bed on Friday night Tony had improved the script and made it available (with revisions) via a Gist, identified some problems with the data, and posted an initial visualisation. On Saturday morning I transfer Tony’s alterations into my own code, set up a local Git repository, push to a new Github repository, run the script over the XML dump as is (results pushed to Github). I then “fix” the raw data by manually removing the result of a SQL insertion attack – note that because I commit and push to the remote repository I get data versioning for free – this “fixing” is transparent and recorded. Then I re-run the script, pushing again to Github. I’ve just now updated the script and committed once more following further suggestions from Tony.

So over a couple of days we used Twitter for communication, DropBox, GitHub, Gists, and Flickr for sharing data and code, and the whole process was carried out publicly. I wouldn’t have even thought to ask Tony about this if he hadn’t  been publicly posting his visualisations (indeed I remember but can’t find an ironic tweet from Tony a few weeks back about it would be clearly much better to publish in a journal in 18 months time when no-one could even remember what the conference he was analysing was about…).

So another win for open approaches. Again, something small, something relatively simple, but something that came together because people were easily connected in a public space and were routinely sharing research outputs, something that by default spread into the way we conducted the project. It never occurred to me at the time, I was just reaching for the easiest tool at each stage, but at every stage every aspect of this was carried out in the open. It was just the easiest and most effective way to do it.

Enhanced by Zemanta

A collaborative proposal on research metrics

Measuring time
Image by aussiegall via Flickr

tldr: Proposed project to connect metrics builders with those who can most effectively use them to change practice. Interested? Get involved! Proposal doc is here and free to edit.

When we talk about open research practice, more efficient research communication, wider diversity of publication we always come up against the same problem. What’s in it for the jobbing scientist? This is so prevalent that it has been reformulated as “Singh’s Law” (by analogy with Godwin’s law) that any discussion of research practice will inevitably end when someone brings up career advancement or tenure. The question is what do we actually do about this?

The obvious answer is to make these things matter. Research funders have the most power here in that they have the power to influence behaviour through how they distribute resources. If the funder says something is important then the research community will jump to it. The problem of course it that in practice funders have to take their community with them. Radical and rapid change is not usually possible. A step in the right direction would be to provide funders and researchers with effective means of measuring and comparing themselves and their outputs. In particular means of measuring performance in previously funded activities.

There are many current policy initiatives on trying to make these kinds of judgements. There are many technical groups building and discussing different types of metrics. Recently there have also been calls to ensure that the data that underlies these metrics is made available. But there is relatively little connection between these activities. There is an opportunity to connect technical expertise and data with the needs of funders, researchers, and perhaps even the mainstream media and government.

An opportunity has arisen for some funding to support a project here. My proposal is to bring a relevant group of stakeholders together; funders, technologists, scientists, adminstrators, media, publishers, and aggregators, to identify needs and then to actually build some things. Essentially the idea is a BarCamp style day and a bit meeting followed by a two day hackfest. Following on from this the project would fund some full time effort to take the most promising ideas forward.

I’m looking for interested parties. This will be somewhat UK centric just because of logistics and funding but the suggestion has already been made that following up with a similar North American or European project could be interesting. The proposal is available to view and edit as a GoogleDoc. Feel free to add your name, contact me directly, or suggest the names of others (probably better to me directly). I have a long list of people to contact directly as well but feel free to save me the effort.

Ed. Note: This proposal started as a question on Friendfeed where I’ve already got a lot of help and ideas. Hopefully soon I will write another post about collaborative and crowdsourced grant writing and how it has changed since the last time I tried this some years back.

Enhanced by Zemanta

Now about that filter…

A talk given in two slightly different forms at the NFAIS annual meeting 2010 (where I followed Clay Shirkey, hence the title) and at the Society for General Microbiology in Edinburgh in March. In the first case the talk was part of a panel of presentations intended to give the view of “scholars” to the information professionals. In the second it was part of a session looking at the application of web based tools to research and education.

Abstract (NFAIS meeting): There was more scientific information generated in the past five years than has previously existed, and scientists are simply not coping. Despite the fact that the web was built to enable the communication of data and information between scientists the scientific community has been very slow in exploiting its full capabilities. I will discuss the development and state of the art of collaborative communication and filtering tools and their use by the scientific research community. The reasons for the lack of penetration and resistance to change will be discussed along with the outlines of what a fully functional system would like and most importantly the way this would enable more effective and efficient research.

Enhanced by Zemanta

Free…as in the British Museum

Great Court - Quadrangle and Sydney Smirke's 1...
Image via Wikipedia

Richard Stallman and Richard Grant, two people who I wouldn’t ever have expected to group together except based on their first name, have recently published articles that have made me think about what we mean when we talk about “Open” stuff. In many ways this is a return right to the beginning of this blog, which started with a post in which I tried to define my terms as I understood them at the time.

In Stallman’s piece he argues that “open” as in “open source” is misleading because it sounds limiting. It makes it sound as though the only thing that matters is having access to the source code. He dismisses the various careful definitions of open as specialist pleading, definitions that only the few are aware of, and that using them will confuse most others. He is of course right, no matter how carefully we define open it is such a commonly used word and so open to interpretation itself that there will always be ambiguity.

Many efforts have been made in various communities to find new and more precise terms, “gratis” and “libre”, “green” vs “gold”, but these never stick, largely because the word “open” captures the imagination in a way more precise terms do not, and largely because these terms capture the issues that divide us, rather than those that unite us.

So Stallman has a point but he then goes on to argue that “free” does not suffer from the same issues because it does capture an important aspect of Free Software. I can’t agree here because it seems clear to me we have exactly the same confusions. “Free as in beer”, “free as in free speech” capture exactly the same types of confusion, and indeed exactly the same kind of issues as all the various subdefinitions of open. But worse than that it implies these things are in fact free, that they don’t actually cost anything to produce.

In Richard Grant’s post he argues against the idea that the Faculty of 1000, a site that provides expert assessment of researcher papers by a hand picked group of academics, “should be open access”. His argument is largely pragmatic, that running the service costs money. That money needs to be recovered in some way or there would be no service. Now we can argue that there might be more efficient and cheaper ways of providing that service but it is never going to be free. The production of the scholarly literature is likewise never going to be free. Archival, storage, people keeping the system running, just the electricity, these all cost money and that has to come from somewhere.

It may surprise overseas readers but access to many British museums is free to anyone. The British Museum, National Portrait Gallery and others are all free to enter. That they are not “free” in terms of cost is obvious. This access is subsidised by the taxpayer. The original collection of the British Museum was in fact donated to the British people, but in taking that collection on the government was accepting a liability. One that continues to run into millions of pounds a year, just to stop the collection from falling apart, let alone enhancing, displaying it, or researching it.

The decision to make these museums openly accessible is in part ideological, but it can also be framed as a pragmatic decision. Given the enormous monetary investment there is a large value in subsidising free access to maximise the social benefits that universal access can provide. Charging for access would almost certainly increase income, or at least decrease costs, but there would be significant opportunity cost in terms of social return on investment by barring access.

Those of us who argue for Open Access to the scholarly literature or for Open Data, Process, Materials or whatever need to be careful that we don’t pretend this comes free. We also need to educate ourselves more about the costs. Writing costs money, peer review costs money, editing the formats, running the web servers, and providing archival services costs money. And it costs money whether it is done by publishers operating a subscription or  author-pays business models, or by institutional or domain repositories. We can argue for Open Access approaches on economic efficiency grounds, and we can argue for it based on maximizing social return on investment, essentially that for a small additional investment, over and above the very large existing investment in research, significant potential social benefits will arise.

Open Access scholarly literature is free like the British Museum or a national monument like the Lincoln Memorial is free. We should strive to bring costs down as far as we can. We should defend the added value of investing in providing free access to view and use content. But we should never pretend that those costs don’t exist.

Enhanced by Zemanta

Warning: Misusing the journal impact factor can damage your science!

The front and back of a UK cigarette packet (i...
Image via Wikipedia

I had a bit of a rant at a Science Online London panel session on Saturday with Theo Bloom, Brian Derby, and Phil Lord which people seemed to like so it seemed worth repeating here. As usual when discussing scientific publishing the dreaded issue of the Journal Impact Factor came up. While everyone complains about metrics I’ve found that people in general seem remarkably passive when it comes to challenging their use. Channeling Björn Brembs more than anything else I said something approximately like the following.

It seems bizarre that we are still having this discussion. Thomson-Reuters say that the JIF shouldn’t be used for judging individual researchers, Eugene Garfield, the man who invented the JIF has consistently said it should never be used to judge individual researchers. Even a cursory look at the basic statistics should tell any half-competent scientist with an ounce of quantitative analysis in their bones that the Impact Factor of journals in which a given researcher publishes tells you nothing whatsoever about the quality of their work.

Metrics are unlikely to go away – after all, if we didn’t have them we might have to judge people’s work by actually reading it – but as professional measurers and analysts of the world we should be embarrassed to use JIFs to measure people and papers. It is quite simply bad science. It is also bad management. If our managers and leaders have neither the competence nor the integrity to use appropriate measurement tools then they should be shamed into doing so. If your managers are not competent to judge the quality of your work without leaning on spurious measures your job and future is in serious jeopardy. But more seriously, if as professional researchers we don’t have the integrity to challenge the fundamental methodological flaws in using JIFs to judge people and the appalling distortion of scientific communication that this creates then I question whether our own research methodology can be trusted either.

My personal belief is that we should be focussing on developing effective and diverse measures of the re-use of research outputs. By measuring use rather than merely prestige we can go much of the way of delivering on the so-called impact agenda, optimising our use of public funds to generate outcomes but while retaining some say over the types of outcomes that are important and what timeframes they are measured over. But whether or not you agree with my views it seems to me critical that we, as hopefully competent scientists, at least debate what it is we are trying to optimise and what are the appropriate things we should be trying to measure so we can work on providing reliable and sensible ways of doing that.

Enhanced by Zemanta