Binary decisions are a real problem in a grey-scale world

Peer Review Monster
Image by Gideon Burton via Flickr

I recently made the most difficult decision I’ve had to take thus far as a journal editor. That decision was ultimately to accept the paper; that probably doesn’t sound like a difficult decision until I explain that I made this decision despite a referee saying I should reject the paper with no opportunity for resubmission not once, but twice.

One of the real problems I have with traditional pre-publication peer review is the way it takes a very nuanced problem around a work which has many different parts and demands that you take a hard yes/no decision. I could point to many papers that will probably remain unpublished where the methodology or the data might have been useful but there was disagreement about the interpretation. Or where there was no argument except that perhaps this was the wrong journal (with no suggestion of what the right one might be). Recently we had a paper rejected because we didn’t try to make up some spurious story about the biological reason for an interesting physical effect. Of course, we wanted to publish in a biologically slanted journal because that’s where it might come to the attention of people with ideas about what the biological relevance was.

So the problem is two-fold. Firstly that the paper is set up in a way that requires it to go forward or to fail as a single piece, despite the fact that one part might remain useful while another part is clearly wrong. The second is that this decision is binary, there is no way to “publish with reservations about X”, in most cases indeed no way to even mark which parts of the paper were controversial within the review process.

Thus when faced with this paper where, in my opinion, the data reported were fundamentally sound and well expressed but the intepretation perhaps more speculative than the data warranted, I was torn. The guidelines of PLoS ONE are clear: conclusions must be supported by valid evidence. Yet the data, even if the conclusions are proven wrong, are valuable in their own right. The referee objected fundamentally to the strength of the conclusion as well as having some doubts about the way those conclusions were drawn.

So we went through a process of couching the conclusions in much more careful terms, a greater discussion of the caveats and alternative interpretations. Did this fundamentally change the paper? Not really. Did it take a lot of time? Yes, months in the end. But in the end it felt like a choice between making the paper fit the guidelines, or blocking the publication of useful data. I hope the disagreement over the interpretation of the results and even the validity of the approach will play out in the comments for the paper or in the wider literature.

Is there a solution? Well I would argue that if we published first and then reviewed later this would solve many problems. Continual review and markup as well as modification would match what we actually do as our ideas change and the data catches up and propels us onwards. But making it actually happen? Still very hard work and a long way off.

In any case, you can always comment on the paper if you disagree with me. I just have.

Enhanced by Zemanta

Now about that filter…

A talk given in two slightly different forms at the NFAIS annual meeting 2010 (where I followed Clay Shirkey, hence the title) and at the Society for General Microbiology in Edinburgh in March. In the first case the talk was part of a panel of presentations intended to give the view of “scholars” to the information professionals. In the second it was part of a session looking at the application of web based tools to research and education.

Abstract (NFAIS meeting): There was more scientific information generated in the past five years than has previously existed, and scientists are simply not coping. Despite the fact that the web was built to enable the communication of data and information between scientists the scientific community has been very slow in exploiting its full capabilities. I will discuss the development and state of the art of collaborative communication and filtering tools and their use by the scientific research community. The reasons for the lack of penetration and resistance to change will be discussed along with the outlines of what a fully functional system would like and most importantly the way this would enable more effective and efficient research.

Enhanced by Zemanta

P ≠ NP and the future of peer review

Decomposition method (constraint satisfaction)
Image via Wikipedia

“We demonstrate the separation of the complexity class NP from its subclass P. Throughout our proof, we observe that the ability to compute a property on structures in polynomial time is intimately related to the statistical notions of conditional independence and sufficient statistics. The presence of conditional independencies manifests in the form of economical parametrizations of the joint distribution of covariates. In order to apply this analysis to the space of solutions of random constraint satisfaction problems, we utilize and expand upon ideas from several fields spanning logic, statistics, graphical models, random ensembles, and statistical physics.”

Vinay Deolalikar [pdf]

No. I have no idea either, and the rest of the document just gets more confusing for a non-mathematician. Nonetheless the online maths community has lit up with excitement as this document, claiming to prove one of the major outstanding theorems in maths circulated. And in the process we are seeing online collaborative post publication peer review take off.

It has become easy to say that review of research after it has been published doesn’t work. Many examples have failed, or been partially successful. Most journals with commenting systems still get relatively few comments on the average paper. Open peer review tests have generally been judged a failure. And so we stick with traditional pre-publication peer review despite the lack of any credible evidence that it does anything except cost around a few billion pounds a year.

Yesterday, Bill Hooker, not exactly a nay-sayer when it comes to using the social web to make research more effective wrote:

“…when you get into “likes” etc, to me that’s post-publication review — in other words, a filter. I love the idea, but a glance at PLoS journals (and other experiments) will show that it hasn’t taken off: people just don’t interact with the research literature (yet?) in a way that makes social filtering effective.”

But actually the picture isn’t so negative. We are starting to see examples of post-publication peer review and see it radically out-perform traditional pre-publication peer review. The rapid demolition [1, 2, 3] of the JACS hydride oxidation paper last year (not least pointing out that the result wasn’t even novel) demonstrated the chemical blogosphere was more effective than peer review of one of the premiere chemistry journals. More recently 23andMe issued a detailed, and at least from an outside perspective devastating, peer review (with an attempt at replication!) of a widely reported Science paper describing the identification of genes associated with longevity. This followed detailed critiques from a number of online writers.

These, though were of published papers, demonstrating that a post-publication approach can work, but not showing it working for an “informally published” piece of research such as a a blog post or other online posting. In the case of this new mathematical proof, the author Vinay Deolalikar, apparently took the standard approach that one does in maths, sent a pre-print to a number of experts in the field for comments and criticisms. The paper is not in the ArXiv and was in fact made public by one of the email correspondents. The rumours then spread like wildfire, with widespread media reporting, and widespread online commentary.

Some of that commentary was expert and well informed. Firstly a series of posts appeared stating that the proof is “credible”. That is, that it was worth deeper consideration and the time of experts to look for holes. There appears a widespread skepticism that the proof will be correct, including a $200,000 bet from Scott Aaronson, but also a widespread view that it nonetheless is useful, that it will progress the field in a helpful way even if it is wrong.

After this first round, there have been summaries of the proof, and now the identification of potential issues is occurring (see RJLipton for a great summary). As far as I can tell these issues are potentially extremely subtle and will require the attention of the best domain experts to resolve. In a couple of cases these experts have already potentially “patched” the problem, adding their own expertise to contribute to the proof. And in the last couple of hours as Michael Nielsen pointed out to me there is the beginning of a more organised collaboration to check through the paper.

This is collaborative, and positive peer review, and it is happening at web scale. I suspect that there are relatively few experts in the area who aren’t spending some of their time on this problem this week. In the market for expert attention this proof is buying big, as it should be. An important problem is getting a good going over and being tested, possibly to destruction, in a much more efficient manner than could possibly be done by traditional peer review.

There are a number of objections to seeing this as a generalizable to other research problems and fields. Firstly, maths has a strong pre-publication communication and review structure which has been strengthened over the years by the success of the ArXiv. Moreover there is a culture of much higher standards of peer review in maths, review which can take years to complete. Both of these encourage circulation of drafts to a wider community than in most other disciplines, priming the community for distributed review to take place.

The other argument is that only high profile work will get this attention, only high profile work will get reviewed, at this level, possibly at all. Actually I think this is a good thing. Most papers are never cited, so why should they suck up the resource required to review them? Of those that are or aren’t published whether they are useful to someone, somewhere, is not something that can be determined by one or two reviewers. Whether they are useful to you is something that only you can decide. The only person competent to review which papers you should look at in detail is you. Sorry.

Many of us have argued for some time that post-publication peer review with little or no pre-publication review is the way forward. Many have argued against this on practical grounds that we simply can’t get it to happen, there is no motivation for people to review work that has already been published. What I think this proof, and the other stories of online review tell us is that these forms of review will grow of their own accord, particularly around work that is high profile. My hope is that this will start to create an ecosystem where this type of commenting and review is seen as valuable. That would be a more positive route than the other alternative, which seems to be a wholesale breakdown of the current system as the workloads rise too high and the willingness of people to contribute drops.

The argument always brought forward for peer review is that it improves papers. What interests me about the online activity around Deolalikar’s paper is that there is a positive attitude. By finding the problems, the proof can be improved, and new insights found, even if the overall claim is wrong. If we bring a positive attitude to making peer review work more effectively and efficiently then perhaps we can find a good route to improving the system for everyone.

Enhanced by Zemanta

It’s not information overload, nor is it filter failure: It’s a discovery deficit

Clay Shirky
Image via Wikipedia

Clay Shirky’s famous soundbite has helped to focus on minds on the way information on the web needs to be tackled and a move towards managing the process of selecting and prioritising information. But in the research space I’m getting a sense that it is fuelling a focus on preventing publication in a way that is analogous to the conventional filtering process involved in peer reviewed publication.

Most recently this surfaced at Chronicle of Higher Education, to which there were many responses, Derek Lowe’s being one of the most thought out. But this is not isolated.

@JISC_RSC_YH: How can we provide access to online resources and maintain quality of content?  #rscrc10 [twitter via@branwenhide]

Me: @branwenhide @JISC_RSC_YH isn’t the point of the web that we can decouple the issues of access and quality from each other? [twitter]

There is a widely held assumption that putting more research onto the web makes it harder to find the research you are looking for. Publishing more makes discovery easier.

The great strength of the web is that you can allow publication of anything at very low marginal cost without limiting the ability of people to find what they are interested in, at least in principle. Discovery mechanisms are good enough, while being a long way from perfect, to make it possible to mostly find what you’re looking for while avoiding what you’re not looking for.  Search acts as a remarkable filter over the whole web through making discovery possible for large classes of problem. And high quality search algorithms depend on having a lot of data.

It is very easy to say there is too much academic literature – and I do. But the solution which seems to be becoming popular is to argue for an expansion of the traditional peer review process. To prevent stuff getting onto the web in the first place. This is misguided for two important reasons. Firstly it takes the highly inefficient and expensive process of manual curation and attempts to apply it to every piece of research output created. This doesn’t work today and won’t scale as the diversity and sheer number of research outputs increases tomorrow. Secondly it doesn’t take advantage of the nature of the web. They way to do this efficiently is to publish everything at the lowest cost possible, and then enhance the discoverability of work that you think is important. We don’t need publication filters, we need enhanced discovery engines. Publishing is cheap, curation is expensive whether it is applied to filtering or to markup and search enhancement.

Filtering before publication worked and was probably the most efficient place to apply the curation effort when the major bottleneck was publication. Value was extracted from the curation process of peer review by using it reduce the costs of layout, editing, and printing through simple printing less.  But it created new costs, and invisible opportunity costs where a key piece of information was not made available. Today the major bottleneck is discovery. Of the 500 papers a week I could read, which ones should I read, and which ones just contain a single nugget of information which is all I need? In the Research Information Network study of costs of scholarly communication the largest component of publication creation and use cycle was peer review, followed by the cost of finding the articles to read which represented some 30% of total costs. On the web, the place to put in the curation effort is in enhancing discoverability, in providing me the tools that will identify what I need to read in detail, what I just need to scrape for data, and what I need to bookmark for my methods folder.

The problem we have in scholarly publishing is an insistence on applying this print paradigm publication filtering to the web alongside an unhealthy obsession with a publication form, the paper, which is almost designed to make discovery difficult. If I want to understand the whole argument of a paper I need to read it. But if I just want one figure, one number, the details of the methodology then I don’t need to read it, but I still need to be able to find it, and to do so efficiently, and at the right time.

Currently scholarly publishers vie for the position of biggest barrier to communication. The stronger the filter the higher the notional quality. But being a pure filter play doesn’t add value because the costs of publication are now low. The value lies in presenting, enhancing, curating the material that is published. If publishers instead vied to identify, markup, and make it easy for the right people to find the right information they would be working with the natural flow of the web. Make it easy for me to find the piece of information, feature work that is particularly interesting or important, re-intepret it so I can understand it coming from a different field, preserve it so that when a technique becomes useful in 20 years the right people can find it. The brand differentiator then becomes which articles you choose to enhance, what kind of markup you do, and how well you do it.

All of these are things that publishers already do. And they are services that authors and readers will be willing to pay for. But at the moment the whole business and marketing model is built around filtering, and selling that filter. By impressing people with how much you are throwing away. Trying to stop stuff getting onto the web is futile, inefficient, and expensive. Saving people time and money by helping them find stuff on the web is an established and successful business model both at scale, and in niche areas. Providing credible and respected quality measures is a viable business model.

We don’t need more filters or better filters in scholarly communications – we don’t need to block publication at all. Ever. What we need are tools for curation and annotation and re-integration of what is published. And a framework that enables discovery of the right thing at the right time. And the data that will help us to build these. The more data, the more reseach published, the better. Which is actually what Shirky was saying all along…

Enhanced by Zemanta

In defence of author-pays business models

Latest journal ranking in the biological sciences
Image by cameronneylon via Flickr

There has been an awful lot recently written and said about author-pays business models for scholarly publishing and a lot of it has focussed on PLoS ONE.  Most recently Kent Anderson has written a piece on Scholarly Kitchen that contains a number of fairly serious misconceptions about the processes of PLoS ONE. This is a shame because I feel this has muddled the much more interesting question that was intended to be the focus of his piece. Nonetheless here I want to give a robust defence of author pays models and of PLoS ONE in particular. Hopefully I can deal with the more interesting question, how radical should or could PLoS be, in a later post.

A common charge leveled at author-payment funded journals is that they are pushed in the direction of being non-selective. The figure that PLoS ONE publishes around 70% of the papers it receives is often given as a demonstration of this. There are a range of reasons why this is nonsense. The first and simplest is that the evidence we have suggests that of papers rejected from journals between 50% and 95% of them are ultimately published elsewhere [1, 2 (pdf), 3, 4]. The cost of this trickle down, a result of the use of subjective selection criteria of “importance”, is enormous in authors’ and referees’ time and represents a significant potential opportunity cost in terms of lost time. PLoS ONE seeks to remove this cost by simply asking “should this be published?” In the light of the figures above it seems that 70% is a reasonable proportion of papers that are probably “basically ok but might need some work”.

The second presumption is that the peer review process is somehow “light touch”. This is perhaps the result of some mis-messaging that went on early in the history of PLoS ONE but it is absolute nonsense. As both an academic editor and an author I would argue that the peer review process is as rigorous as I have experienced at any other journal (and I do mean any other journal).

As an author I have two papers published in PLoS ONE, both went through at least one round of revision, and one was initially rejected. As an editor I have seen two papers withdrawn after the initial round of peer review, presumably not because the authors felt that the required changes represented a “light touch”. I have rejected one and have never accepted a paper without revision. Every paper I have edited has had at least one external peer reviewer and I try to get at least two. Several papers have gone through more than one cycle of revision with one going through four. Figures provided by Pete Binfield (comment from Pete about 20 comments in) suggest that this kind of proportion is about average for PLoS ONE Academic Editors. The difference between PLoS ONE and other journals is that I look for what is publishable in a submission and work with the authors to bring that out rather than taking delight in rejecting some arbitrary proportion of submissions and imagining that this equates to a quality filter. I see my role as providing a service.

The more insidious claim made is that there is a link between this supposed light touch review and the author pays models; that there is pressure on those who make the publication decision to publish as much as possible. Let me put this as simply as possible. The decision whether to publish is mine as an Academic Editor and mine alone. I have never so much as discussed my decision on a paper with the professional staff at PLoS and I have never received any payment whatsoever from PLoS (with the possible exception of two lunches and one night’s accommodation for a PLoS meeting I attended – and I missed the drinks reception…). If I ever perceived pressure to accept or was offered inducements to accept papers I would resign immediately and publicly as an AE.

That an author pays model has the potential to create a conflict of interest is clear. That is why, within reputable publishers, structures are put in place to reduce that risk as far as is possible, divorcing the financial side from editorial decision making, creating Chinese walls between editorial and financial staff within the publisher.  The suggestion that my editorial decisions are influenced by the fact the authors will pay is, to be frank, offensive, calling into serious question my professional integrity and that of the other AEs. It is also a slightly strange suggestion. I have no financial stake in PLoS. If it were to go under tomorrow it would make no difference to my take home pay and no difference to my finances. I would be disappointed, but not poorer.

Another point that is rarely raised is that the author pays model is much more widely used than people generally admit. Page charges and colour charges for many disciplines are of the same order as Open Access publication charges. The Journal of Biological Chemistry has been charging page rates for years while increasing publication volume. Author fees of one sort or another are very common right across the biological and medical sciences literature. And it is not new. Bill Hooker’s analysis (here and here) of these hidden charges bears reading.

But the core of the argument for author payments is that the market for scholarly publishing is badly broken. Until the pain of the costs of publication is directly felt by those making the choice of where to (try to) publish we will never change the system. The market is also the right place to have this out. It is value for money that we should be optimising. Let me illustrate with an example. I have heard figures of around £25,000 given as the level of author charge that would be required to sustain Cell, Nature, or Science as Open Access APC supported journals. This is usually followed by a statement to the effect “so they can’t possibly go OA because authors would never pay that much”.

Let’s unpack that statement.

If authors were forced to make a choice between the cost of publishing in these top journals versus putting that money back into their research they would choose the latter. If the customer actually had to make the choice to pay the true costs of publishing in these journals, they wouldn’t…if journals believed that authors would see the real cost as good value for money, many of them would have made that switch years ago. Subscription charges as a business model have allowed an appallingly wasteful situation to continue unchecked because authors can pretend that there is no difference in cost to where they publish, they accept that premium offerings are value for money because they don’t have to pay for them. Make them make the choice between publishing in a “top” journal vs a “quality” journal and getting another few months of postdoc time and the equation changes radically. Maybe £25k is good value for money. But it would be interesting to find out how many people think that.

We need a market where the true costs are a factor in the choices of where, or indeed whether, to formally publish scholarly work. Today, we do not have that market and there is little to no pressure to bring down publisher costs. That is why we need to move towards an author pays system.

Reblog this post [with Zemanta]

Peer review: What is it good for?

Peer Review Monster
Image by Gideon Burton via Flickr

It hasn’t been a real good week for peer review. In the same week that the Lancet fully retract the original Wakefield MMR article (while keeping the retraction behind a login screen – way to go there on public understanding of science), the main stream media went to town on the report of 14 stem cell scientists writing an open letter making the claim that peer review in that area was being dominated by a small group of people blocking the publication of innovative work. I don’t have the information to actually comment on the substance of either issue but I do want to reflect on what this tells us about the state of peer review.

There remains much reverence of the traditional process of peer review. I may be over interpreting the tenor of Andrew Morrison’s editorial in BioEssays but it seems to me that he is saying, as many others have over the years “if we could just have the rigour of traditional peer review with the ease of publication of the web then all our problems would be solved”.  Scientists worship at the altar of peer review, and I use that metaphor deliberately because it is rarely if ever questioned. Somehow the process of peer review is supposed to sprinkle some sort of magical dust over a text which makes it “scientific” or “worthy”, yet while we quibble over details of managing the process, or complain that we don’t get paid for it, rarely is the fundamental basis on which we decide whether science is formally published examined in detail.

There is a good reason for this. THE EMPEROR HAS NO CLOTHES! [sorry, had to get that off my chest]. The evidence that peer review as traditionally practiced is of any value at all is equivocal at best (Science 214, 881; 1981, J Clinical Epidemiology 50, 1189; 1998, Brain 123, 1954; 2000, Learned Publishing 22, 117; 2009). It’s not even really negative. That would at least be useful. There are a few studies that suggest peer review is somewhat better than throwing a dice and a bunch that say it is much the same. It is at its best at dealing with narrow technical questions, and at its worst at determining “importance” is perhaps the best we might say. Which for anyone who has tried to get published in a top journal or written a grant proposal ought to be deeply troubling. Professional editorial decisions may in fact be more reliable, something that Philip Campbell hints at in his response to questions about the open letter [BBC article]:

Our editors […] have always used their own judgement in what we publish. We have not infrequently overruled two or even three sceptical referees and published a paper.

But there is perhaps an even more important procedural issue around peer review. Whatever value it might have we largely throw away. Few journals make referee’s reports available, virtually none track the changes made in response to referee’s comments enabling a reader to make their own judgement as to whether a paper was improved or made worse. Referees get no public credit for good work, and no public opprobrium for poor or even malicious work. And in most cases a paper rejected from one journal starts completely afresh when submitted to a new journal, the work of the previous referees simply thrown out of the window.

Much of the commentary around the open letter has suggested that the peer review process should be made public. But only for published papers. This goes nowhere near far enough. One of the key points where we lose value is in the transfer from one journal to another. The authors lose out because they’ve lost their priority date (in the worse case giving the malicious referees the chance to get their paper in first). The referees miss out because their work is rendered worthless. Even the journals are losing an opportunity to demonstrate the high standards they apply in terms of quality and rigor – and indeed the high expectations they have of their referees.

We never ask what the cost of not publishing a paper is or what the cost of delaying publication could be. Eric Weinstein provides the most sophisticated view of this that I have come across and I recommend watching his talk at Science in the 21st Century from a few years back. There is a direct cost to rejecting papers, both in the time of referees and the time of editors, as well as the time required for authors to reformat and resubmit. But the bigger problem is the opportunity cost – how much that might have been useful, or even important, is never published? And how much is research held back by delays in publication? How many follow up studies not done, how many leads not followed up, and perhaps most importantly how many projects not refunded, or only funded once the carefully built up expertise in the form of research workers is lost?

Rejecting a paper is like gambling in a game where you can only win. There are no real downside risks for either editors or referees in rejecting papers. There are downsides, as described above, and those carry real costs, but those are never borne by the people who make or contribute to the decision. Its as though it were a futures market where you can only lose if you go long, never if you go short on a stock. In Eric’s terminology those costs need to be carried, we need to require that referees and editors who “go short” on a paper or grant are required to unwind their position if they get it wrong. This is the only way we can price in the downside risks into the process. If we want open peer review, indeed if we want peer review in its traditional form, along with the caveats, costs and problems, then the most important advance would be to have it for unpublished papers.

Journals need to acknowledge the papers they’ve rejected, along with dates of submission. Ideally all referees reports should be made public, or at least re-usable by the authors. If full publication, of either the submitted form of the paper or the referees report is not acceptable then journals could publish a hash of the submitted document and reports against a local key enabling the authors to demonstrate submission date and the provenance of referees reports as they take them to another journal.

In my view referees need to be held accountable for the quality of their work. If we value this work we should also value and publicly laud good examples. And conversely poor work should be criticised. Any scientist has received reviews that are, if not malicious, then incompetent. And even if we struggle to admit it to others we can usually tell the difference between critical, but constructive (if sometimes brutal), and nonsense. Most of us would even admit that we don’t always do as good a job as we would like. After all, why should we work hard at it? No credit, no consequences, why would you bother? It might be argued that if you put poor work in you can’t expect good work back out when your own papers and grants get refereed. This again may be true, but only in the long run, and only if there are active and public pressures to raise quality. None of which I have seen.

Traditional peer review is hideously expensive. And currently there is little or no pressure on its contributors or managers to provide good value for money. It is also unsustainable at its current level. My solution to this is to radically cut the number of peer reviewed papers probably by 90-95% leaving the rest to be published as either pure data or pre-prints. But the whole industry is addicted to traditional peer reviewed publications, from the funders who can’t quite figure out how else to measure research outputs, to the researchers and their institutions who need them for promotion, to the publishers (both OA and toll access) and metrics providers who both feed the addiction and feed off it.

So that leaves those who hold the purse strings, the funders, with a responsibility to pursue a value for money agenda. A good place to start would be a serious critical analysis of the costs and benefits of peer review.

Addition after the fact: Pointed out in the comments that there are other posts/papers I should have referred to where people have raised similar ideas and issues. In particular Martin Fenner’s post at Nature Network. The comments are particularly good as an expert analysis of the usefulness of the kind of “value for money” critique I have made. Also a paper in the Arxiv from Stefano Allesina. Feel free to mention others and I will add them here.

Reblog this post [with Zemanta]

A question of trust

I have long being sceptical of the costs and value delivered by our traditional methods of peer review. This is really on two fronts, firstly that the costs, where they have been estimated are extremely high, representing a multi-billion dollar subsidy by governments of the scholarly publishing industry. Secondly the value that is delivered through peer review, the critical analysis of claims, informed opinion on the quality of the experiments, is largely lost. At best it is wrapped up in the final version of the paper. At worst it is simply completely lost to the final end user. A part of this, which the more I think about the more I find bizarre is that the whole process is carried on under a shroud of secrecy. This means that as an end user, as I do not know who the peer reviewers are, and do not necessarily know what  process has been followed or even the basis of the editorial decision to publish. As a result I have no means of assessing the quality of peer review for any given journal, let alone any specific paper.

Those of us who see this as a problem have a responsibility to provide credible and workable alternatives to traditional peer review. So far despite many ideas we haven’t, to be honest, had very much success. Post-publication commenting, open peer review, and Digg like voting mechanisms have been explored but have yet to have any large success in scholarly publishing. PLoS is leading the charge on presenting article level metrics for all of its papers, but these remain papers that have also been through a traditional peer review process. Very little that is both radical with respect to the decision and means of publishing and successful in getting traction amongst scientists has been seen as yet.

Out on the real web it has taken non-academics to demonstrate the truly radical when it comes to publication. Whatever you may think of the accuracy of Wikipedia in your specific area, and I know it has some weaknesses in several of mine, it is the first location that most people find, and the first location that most people look for, when searching for factual information on the web. Roderic Page put up some interesting statistics when he looked this week at the top hits for over 5000 thousand mammal names in Google. Wikipedia took the top spot 48% of the time and was in the top 10 in virtually every case (97%). If you want to place factual information on the web Wikipedia should be your first port of call. Anything else is largely a waste of your time and effort. This doesn’t incidentally mean that other sources are not worthwhile or have a place, but that people need to work with the assumption that people’s first landing point will be Wikipedia.

“But”, I hear you say, “how do we know whether we can trust a given Wikipedia article, or specific statements in it?”

The traditional answer has been to say you need to look in the logs, check the discussion page, and click back the profiles of the people who made specific edits. However this in inaccessible to many people, simply because they do not know how to process the information. Very few universities have an “Effective Use of Wikipedia 101” course. Mostly because very few people would be able to teach it.

So I was very interested in an article on Mashable about marking up and colouring Wikipedia text according to its “trustworthiness”. Andrew Su kindly pointed me in the direction of the group doing the work and their papers and presentations. The system they are using, which can be added to any MediaWiki installation measures two things, how long a specific piece of text has stayed in situ, and who either edited it, or left it in place. People who write long lasting edits get higher status, and this in turn promotes the text that they have “approved” by editing around but not changing.

This to me is very exciting because it provides extra value and information for both users and editors without requiring anyone to do any more work than install a plugin. The editors and writers simply continue working as they have. The user can access an immediate view of the trustworthiness of the article with a high level of granularity, essentially at the level of single statements. And most importantly the editor gets a metric, a number that is consistently calculated across all editors, that they can put on a CV. Editors are peer reviewers, they are doing review, on a constantly evolving and dynamic article that can both change in response to the outside world and also be continuously improved. Not only does the Wikipedia process capture most of the valuable aspects of traditional peer review, it jettisons many of the problems. But without some sort of reward it was always going to be difficult to get professional scientists to be active editors. Trust metrics could provide that reward.

Now there are many questions to ask about the calculation of this “karma” metric, should it be subject biased so we know that highly ranked editors have relevant expertise, or should it be general so as to discourage highly ranked editors from modifying text that is outside of their expertise? What should the mathematics behind it be? It will take time clearly for such metrics to be respected as a scholarly contribution, but equally I can see the ground shifting very rapidly towards a situation where a lack of engagement, a lack of interest in contributing to the publicly accessible store of knowledge, is seen as a serious negative on a CV. However this particular initiative pans out it is to me this is one of the first and most natural replacements for peer review that could be effective within dynamic documents, solving most of the central problems without requiring significant additional work.

I look forward to the day when I see CVs with a Wikipedia Karma Rank on them. If you happen to be applying for a job with me in the future, consider it a worthwhile thing to include.

Pub-sub/syndication patterns and post publication peer review

I think it is fair to say that even those of us most enamored of post-publication peer review would agree that its effectiveness remains to be demonstrated in a convincing fashion. Broadly speaking there are two reasons for this; the first is the problem of social norms for commenting. As in there aren’t any. I think it was Michael Nielsen who referred to the “Kabuki Dance of scientific discourse”. It is entirely allowed to stab another member of the research community in the back, or indeed the front, but there are specific ways and forums in which it is acceptable to do. No-one quite knows what the appropriate rules are for commenting on online fora, as best described most recently by Steve Koch.

My feeling is that this is a problem that will gradually go away as we evolve norms of behaviour in specific research communities. The current “rules” took decades to build up. It should not be surprising if it takes a few years or more to sort out an adapted set for online interactions. The bigger problem is the one that is usually surfaced as “I don’t have any time for this kind of thing”. This in turn can be translated as, “I don’t get any reward for this”. Whether that reward is a token for putting on your CV, actual cash, useful information coming back to you, or just the warm feeling that someone else found your comments useful, rewards are important for motivating people (and researchers).

One of the things that links these two together is a sense of loss of control over the comment. Commenting on journal web-sites is just that, commenting on the journal’s website. The comment author has “given up” their piece of value, which is often not even citeable, but also lost control over what happens to their piece of content. If you change your mind, even if the site allows you to delete it, you have no way of checking whether it is still in the system somewhere.

In a sense, when the Web 2.0 world was built it was got nearly precisely wrong for personal content. For me Jon Udell has written most clearly about this when he talks about the publish-subscribe pattern for successful frameworks. In essence I publish my content and you choose to subscribe to it. This works well for me, the blogger, at this site, but it is not so great for the commenter who has to leave their comment to my tender mercies on my site. It would be better if the commenter could publish their comment and I could syndicate it back to my blog. This creates all sorts of problems; it is challenging for you to aggregate your own comments together and you have to rely on the functionality of specific sites to help you follow responses to your comments. Jon wrote about this better than I can in his blog post.

So a big part of the problem could be solved if people streamed their own content. This isn’t going to happen quickly in the general sense of everyone having a web server of their own – it still remains too difficult for even moderately skilled people to be bothered doing this. Services will no doubt appear in the future but current broadcast services like twitter offer a partial solution (its “my” twitter account, I can at least pretend to myself that I can delete all of it). The idea of using something like the twitter service at microrevie.ws as suggested by Daniel Mietchen this week can go a long way towards solving the problem. This takes a structured tweet of the form @hreview {Object};{your review} followed optionally by a number of asterisks for a star rating. This doesn’t work brilliantly for papers because of problems with the length of references for the paper, even with shortened dois, the need for sometimes lengthy reviews and the shortness of tweets. Additionally the twitter account is not automatically associated with a unique research contributor ID. However the principle of the author of the review controlling their own content, while at the same time making links between themselves and that content in a linked open data kind of way is extremely powerful.

Imagine a world in which your email outbox or local document store is also webserver (via any one of an emerging set of tools like Wave, DropBox, or Opera Unite). You can choose who to share your review with and change that over the time. If you choose to make it public the journal, or the authors can give you some form of credit. It is interesting to think that author-side charges could perhaps be reduced for valuable reviews. This wouldn’t work in a naive way, with $10 per review, because people would churn out  large amounts of rubbish reviews, but if those reviews are out on the linked data web then their impact can be measured by their page rank and the authors rewarded accordingly.

Rewards and control linked together might provide a way of solving the problem – or at least of solving it faster than we are at the moment.

Fantasy Science Funding: How do we get peer review of grant proposals to scale?

This post is both a follow up to last week’s post on the cost’s of peer review and a response to Duncan Hull‘s post of nine or so months ago proposing a game of “Fantasy Science Funding“. The game requires you to describe how you would distribute the funding of the BBSRC if you were a benign (or not so benign) dictator. The post and the discussion should be read bearing in mind my standard disclaimer.

Peer review is in crisis. Anyone who tells you otherwise either has their head in the sand or is trying to sell you something. Volumes are increasing, quality of review is decreasing. The willingness of scientists to take on refereeing is increasingly the major problem for those who commission it. This is a problem for peer reviewed publication but the problems for the reviewing of funding applications are far worse.

For grant review, the problems that are already evident in scholarly publishing, fundamentally the increasing volume, are exacerbated by the fact that success rates for grans are falling and that successful grants are increasingly in the hands of a smaller number of people in a smaller number of places. Regardless of whether you agree with this process of concentrating grant funding this creates a very significant perception problem. If the perception of your referees is that they have no chance of getting funding, why on earth should they referee

Is this really happening? Well in the UK chemistry community last year there was an outcry when two EPSRC grant rounds in a row had success rates of 10% or lower. Bear in mind this was the success rate of grants that made it to panel, i.e. it is an upper bound, assuming there weren’t any grants removed at an earlier stage. As you can imagine there was significant hand wringing and a lot of jumping up and down but what struck me was two statements I heard made. The first, was from someone who had sat on one of the panels, was that it “raises the question of whether it is worth our time to attend panel meetings”. The second was the suggestion that the chemistry community could threaten to unilaterally withdraw from EPSRC peer review. These sentiments are now being repeated on UK mailing lists in response to EPSRC’s most recent changes to grant submission guidelines. Whether serious or not, credible or not, this shows that the compact of community contribution to the review process is perilously close to breaking down.

The research council response to this is to attempt to reduce the number of grant proposals, generally by threatening to block those who have a record of serial rejection. This will fail. With success rates as low as they are, and with successful grants concentrated in the hands of the few, most academics are serial failures. The only way departments can increase income is by increasing the volume and quality of grant applications. With little effective control over quality the focus will necessarily be on increasing volume. The only way research councils will control this is either by making applications a direct cost to departments, or by reducing the need of academics to apply.

The cost of refereeing is enormous and largely hidden. But it pales into insignificance compared to the cost of applying for grants. Low success rates make the application process an immense waste of departmental resources. The approximate average cost of running a UK academic for a year is £100,000. If you assume that each academic writes one grant per year and that this takes around two weeks full time work that amounts to ~£4k per academic per year. If there are 100,000 academics in the UK this is £400M, which with a 20% success rate means that £320M is lost in the UK each year. Let’s say that £100M is a reasonable ballpark figure.

In more direct terms this means that academics who are selected for their ability to do research, are being taken away from what they are good at to play a game which they will on average lose four times out of five. It would be a much more effective use of government funding to have those people actually doing research.

So this is a game of Fantasy Funding, how would I spend the money? Well, rather than discuss my biases about what science is important, which are probably not very interesting, it is perhaps more useful to think about how the system might be changed to reduce these wastages. And there is a simple, if somewhat radical, way of doing this.

Cut the budget in two and distribute half of it directly to academics on a pro-rata basis.

By letting researcher’s focus on getting on with research you will reduce their need for funding and reduce the burden. By setting the bar naturally higher for funding research you still maintain the perception that everyone is in with a chance and reduce the risk of referee drop out due to dis-enchantment with the process. More importantly you enable innovative research by allowing it to keep ticking over and in particular you enable a new type of peer review.

If you look at the amounts of money involved, say a few hundred million pounds for BBSRC, and divide that up amongst all bioscience academics, you end up with figures of a £5-20K per academic per year. Not enough to hire a postdoc, just about enough to run a PhD student (at least at UK rates). But what if you put that together with the money from a few other academics? If you can convince your peers that you have an interesting and fun idea then they can pool funds together. Or perhaps share a technician between two groups so that you don’t lose the entire group memory every time a student leaves? Effective collaboration will lead to a win on all sides.

If these arguments sound familiar it is because they are not so different to the notion of 20% time, best known as a Google policy of having all staff spend some time on personal projects. By supporting low level innovation and enabling small scale judging of ideas and pooling of resources it is possible to enable bottom up innovation of precisely the kind that is stifled by top down peer review.

No doubt there would be many unintended consequences, and probably a lot of wastage, but in amongst that I wouldn’t bet against the occassional brilliant innovation which is virtually impossible in the current climate.

What is clear is that doing nothing is not an option. Look at that EPSRC statement again. People with a long term success rate below 25% will be blocked…I just checked my success rate over the past ten years (about 15% by numbers of grants, 70% by value but that is dominated by one large grant). The current success rate at chemistry panel is around 15%. And that is skewed towards a limited number of people and places.

The system of peer review relies absolutely on the communities agreement to contribute and some level of faith in the outcome. It relies absolutely on trust. That trust is perilously close to a breakdown.

What is the cost of peer review? Can we afford (not to have) high impact journals?

Late last year the Research Information Network held a workshop in London to launch a report, and in many ways more importantly, a detailed economic model of the scholarly publishing industry. The model aims to capture the diversity of the scholarly publishing industry and to isolate costs and approaches to enable the user to ask questions such as “what is the consequence of moving to a 95% author pays model” as well as to simply ask how much money is going in and where it ends up. I’ve been meaning to write about this for ages but a couple of things in the last week have prompted me to get on and do it.

The first of these was an announcement by email [can’t find a copy online at the moment] by the EPSRC, the UK’s main funder of physical sciences and engineering. While the requirement for a two page enconomic impact statement for each grant proposal got more headlines, what struck me as much more important were two other policy changes. The first was that, unless specifically invited, rejected proposals can not be resubmitted. This may seem strange, particularly to US researchers, where a process of refinement and resubmission, perhaps multiple times, is standard, but the BBSRC (UK biological sciences funder) has had a similar policy for some years. The second, frankly somewhat scarey change, is that some proportion of researchers that have a history of rejection will be barred from applying altogether. What is the reason for these changes? Fundamentally the burden of carrying out peer review on all of the submitted proposals is becoming too great.

The second thing was that, for the first time, I have been involved in refereeing a paper for a Nature Publishing Group journal. Now I like to think, like I guess everyone else does, that I do a reasonable job of paper refereeing. I wrote perhaps one and a half sides of A4 describing what I thought was important about the paper and making some specific criticisms and suggestions for changes. The paper went around the loop and on the second revision I saw what the other referees had written; pages upon pages of closely argued and detailed points. Now the other referees were much more critical of the paper but nonetheless this supported a suspicion that I have had for some time, that refereeing at some high impact journals is qualitatively different to what the majority of us receive, and probably deliver; an often form driven exercise with a couple of lines of comments and complaints. This level of quality peer review takes an awful lot of time and it costs money; money that is coming from somewhere. Nonetheless it provides better feedback for authors and no doubt means the end product is better than it would otherwise have been.

The final factor was a blog post from Molecular Philosophy discussing why the author felt Open Access Publishers are, if not doomed to failure, then face a very challenging road ahead. The centre of the argument as I understand it focused around the costs of high impact journals, particularly the costs of selection, refinement, and preparation for print. Broadly speaking I think it is generally accepted that a volume model of OA publication, such as that practiced by PLoS ONE and BMC can be profitable. I think it is also generally accepted that a profitable business model for high impact OA publication has yet to be convincingly demonstrated. The question I would like to ask though is different. The Molecular Philosophy post skips the zeroth order questions. Can we afford high impact publications?

Returning to the RIN funded study and model of scholarly publishing some very interesting points came out [see Daniel Hull’s presentation for most of the data here]. The first of these, which in retrospect is obvious but important, is that the vast majority of the costs of producing a paper are incurred in doing the research it describes (£116G worldwide). The second biggest contributor? Researchers reading the papers (£34G worldwide). Only about 14% of the costs of the total life cycle are actually taken up with costs directly attributable to publication. But that is the 14% we are interested in, so how does it divide up?

The “Scholarly Communication Process” as everything in the middle is termed in the model is divided up into actual publication/distribution costs (£6.4G), access provision costs (providing libraries and internet access, £2.1G) and the costs of researchers looking for articles (£16.4G). Yes, the biggest cost is the time you spend trying to find those papers. Arguably that is a sunk cost in as much as once you’ve decided to do research searching for information is a given, but it does make the point that more efficient searching has the potential to save a lot of money. In any case it is a non-cash cost in terms of journal subscriptions or author charges.

So to find the real costs of publication per se we need to look inside that £6.4. Of the costs of actually publishing the articles the biggest single cost is peer review weighing in at around £1.9G globally, just ahead of fixed “first copy” publication costs of £1.8G. So 29% of the total costs incurred in publication and distribution of scholarly articles arises from the cost of peer review.

There are lots of other interesting points in the reports and models (the UK is a net exporter of peer review, but the UK publishes more articles than would be expected based on its subscription expenditure) but the most interesting aspect of the model is its ability to model changes in the publishing landscape. The first scenario presented is one in which publication moves to being 90% electronic. This actually leads to a fairly modest decrease in costs overall with a total overall saving of a little under £1G (less than 1%). Modeling a move to a 90% author pays model (assuming 90% electronic only) leads to very little change overall, but interestingly that depends significantly on the cost of systems put in place to make author payments. If these are expensive and bureaucratic then the costs can rise as many small payments are more expensive than few big ones. But overall the costs shouldn’t need to change much, meaning if mechanisms can be put in place to move the money around, the business models should ultimately be able to make sense. None of this however helps in figuring out how to manage a transition from one system to another, when for all useful purposes costs are likely to double in the short term as systems are duplicated.

The most interesting scenario, though was the third. What happens as research expands. A 2.5% real increase year on year for ten years was modeled. This may seem profligate in today’s economic situation but with many countries explicitly spending stimulus money on research, or already engaged in large scale increases of structural research funding it may not be far off. This results in 28% more articles, 11% more journals, a 12% increase in subscription costs (assuming of course that only the real cost increases are passed on) and a 25% increase in the costs of peer review (£531M on a base of £1.8G).

I started this post talking about proposal refereeing. The increased cost in refereeing proposals as the volume of science increases would be added on top of that for journals. I think it is safe to say that the increase in cost would be of the same order. The refereeing system is already struggling under the burden. Funding bodies are creating new, and arguably totally unfair, rules to try and reduce the burden, journals are struggling to find referees for paper. Increases in the volume of science, whether they come from increased funding in the western world or from growing, increasingly technology driven, economies could easily increase that burden by 20-30% in the next ten years. I am sceptical that the system, as it currently exists, can cope and I am sceptical that peer review, in its current form is affordable in the medium to long term.

So, bearing in mind Paulo’s admonishment that I need to offer solutions as well as problems, what can we do about this? We need to find a way of doing peer review effectively, but it needs to be more efficient. Equally if there are areas where we can save money we should be doing that. Remember that £16.4G just to find the papers to read? I believe in post-publication peer review because it reduces the costs and time wasted in bringing work to community view and because it makes the filtering and quality assurance of that published work continuous and ongoing. But in the current context it offers significant cost savings. A significant proportion of published papers are never cited. To me it follows from this that there is no point in peer reviewing them. Indeed citation is an act of post-publication peer review in its own right and it has recently been shown that Google PageRank type algorithms do a pretty good job of identifying important papers without any human involvement at all (beyond the act of citation). Of course for PageRank mechanisms to work well the citation and its full context are needed making OA a pre-requisite.

If refereeing can be restricted to those papers that are worth the effort then it should be possible to reduce the burden significantly. But what does this mean for high impact journals? The whole point of high impact journals is that they are hard to get into. This is why both the editorial staff and peer review costs are so high for them. Many people make the case that they are crucial for helping to filter out the important papers (remember that £16.4G again). In turn I would argue that they reduce value by making the process of deciding what is “important” a closed shop, taking that decision away, to a certain extent, from the community where I feel it belongs. But at the end of the day it is a purely economic argument. What is the overall cost of running, supporting through peer review, and paying for, either by subscription or via author charges, a journal at the very top level? What are the benefits gained in terms of filtering and how do they compare to other filtering systems. Do the benefits justify the costs?

If we believe that there are better filtering systems possible, then they need to be built, and the cost benefit analysis done. The opportunity is coming soon to offer different, and more efficient, approaches as the burden becomes too much to handle. We either have to bear the cost or find better solutions.

[This has got far too long already – and I don’t have any simple answers in terms of refereeing grant proposals but will try to put some ideas in another post which is long overdue in response to a promise to Duncan Hull]