August 27, 2008December 30, 2014 by Cameron Neylon

Can post publication peer review work? The PLoS ONE report card

This post is an opinion piece and not a rigorous objective analysis. It is fair to say that I am on the record as and advocate of the principles behind PLoS ONE and am also in favour of post publication peer review and this should be read in that light. [ed I’ve also modified this slightly from the original version because I got myself mixed up in an Excel spreadsheet]

To me, anonymous peer review is, and always has been, broken. The central principle of the scientific method is that claims and data to support those claims are placed, publically, in the view of expert peers. They are examined, and re-examined on the basis of new data, considered and modified as necessary, and ultimately discarded in favour of an improved, or more sophisticated model. The strength of this process is that it is open, allowing for extended discussion on the validity of claims, theories, models, and data. It is a bearpit, but one in which actions are expected to take place in public (or at least community) view. To have as the first hurdle to placing new science in the view of the community a process which is confidential, anonymous, arbitrary, and closed, is an anachronism.

It is, to be fair, an anachronism that was necessary to cope with rising volumes of scientific material in the years after the second world war as the community increased radically in size. A limited number of referees was required to make the system manageable and anonymity was seen as necessary to protect the integrity of this limited number of referees. This was a good solution given the technology of the day. Today, it is neither a good system, nor an efficient system, and we have in principle the ability to do peer review differently, more effectively, and more efficiently. However, thus far most of the evidence suggests that the scientific community dosen’t want to change. There is, reasonably enough, a general attitude that if it isn’t broken it doesn’t need fixing. Nonetheless there is a constant stream of suggestions, complaints, and experimental projects looking at alternatives.

The last 12-24 months have seen some radical experiments in peer review. Nature Publishing Group trialled an open peer review process. PLoS ONE proposed a qualitatively different form of peer reivew, rejecting the idea of ‘importance’ as a criterion for publication. Frontiers have developed a tiered approach where a paper is submitted into the ‘system’ and will gradually rise to its level of importance based on multiple rounds of community review. Nature Precedings has expanded the role and discipline boundaries of pre-print archives and a white paper has been presented to EMBO Council suggesting that the majority of EMBO journals be scrapped in favour of retaining one flagship journal for which papers would be handpicked from a generic repository where authors would submit, along with referees’ reports and author’s response, on payment of a submission charge. Of all of these experiments, none could be said to be a runaway success so far with the possible exception of PLoS ONE. PLoS ONE, as I have written before, succeeded precisely because it managed to reposition the definition of ‘peer review’. The community have accepted this definition, primarily because it is indexed in PubMed. It will be interesting to see how this develops.

PLoS has also been aiming to develop ratings and comment systems for their papers as a way of moving towards some element of post publication peer review. I, along with some others (see full disclosure below) have been granted access to the full set of comments and some analytical data on these comments and ratings. This should be seen in the context of Euan Adie’s discussion of commenting frequency and practice in BioMedCentral journals which broadly speaking showed that around 2% of papers had comments and that these comments were mostly substantive and dealt with the science. How does PLoS ONE compare and what does this tell us about the merits or demerits of post publication peer review?

PLoS ONE has a range of commenting features, including a simple rating system (on a scale of 1-5) the ability to leave freetext notes, comments, and questions, and in keeping with a general Web 2.o feel the ability to add trackbacks, a mechanism for linking up citations from blogs. Broadly speaking a little more than 13% (380 of 2773) of all papers have ratings and around 23% have comments, notes, or replies to either (647 of 2773, not including any from PLoS ONE staff) . Probably unsurprisingly most papers that have ratings also have comments. There is a very weak positive correlation between the number of citations a paper has received (as determined from Google Scholar) and the number of comments (R^2 = 0.02, which is probably dominated by papers with both no citations and no comments, which are mostly recent, none of this is controlled for publication date).

Overall this is consistent with what we’d expect. The majority of papers don’t have either comments or ratings but a significant minority do. What is slightly suprising is that where there is arguably a higher barrier to adding something (click a button to rate versus write a text comment) there is actually more activity. This suggests to me that people are actively uncomfortable with rating papers versus leaving substantive comments. These numbers compare very favourably to those reported by Euan on comments in BioMedCentral but they are not yet moving into the realms of the majority. It should also be noted that there has been a consistentÂ programme at PLoS ONE with the aim of increasing the involvement of the community. Broadly speaking I would say that the data we have suggest that that programme has been a success in raising involvement.

So are these numbers ‘good’? In reality I don’t know. They seem to be an improvement on the BMC numbers arguing that as systems improve and evolve there is more involvement. However, one graph I received seems to indicate that there isn’t an increase in the frequency of comments within PLoS ONE over the past year or so which one would hope to see. Has this been a radical revision of how peer review works? Well not yet certainly, not until the vast majority of papers have ratings, but more importantly not until we have evidence that people are using those ratings. We are not yet in a position where we are about to see a stampede towards radically changed methods of peer review and this is not surprising. Tradition changes slowly – we are still only just becoming used to the idea of the ‘paper’ being something that goes beyond a pdf, embedding that within a wider culture of online rating and the use of those ratings will take some years yet.

So I have spent a number of posts recently discussing the details of how to make web services better for scientists. Have I got anything useful to offer to PLoS ONE? Well I think some of the criteria I suggested last week might be usefully considered. The problem with rating is that it lies outside the existing workflow for most people. I would guess that many users don’t even see the rating panel on the way into the paper. Why would people log into the system to look at a paper? What about making the rating implicit when people bookmark a paper in external services? Why not actually use that as the rating mechanism?

I emphasised the need for a service to be useful to the user before there are any ‘social effects’ present. What can be offered to make the process of rating a paper useful to the single user in isolation? I can’t really see why anyone would find this useful unless they are dealing with huge number of papers and can’t remember which one is which from day to day. It may be useful within groups or journal clubs but all of these require a group to sign up.Â It seems to me that if we can’t frame it as a useful activity for a single person then it will be difficult to get the numbers required to make this work effectively on a community scale.

In that context, I think getting the numbers to around the 10-20% level for either comments or ratings has to be seen as an immense success. I think it shows how difficult it is to get scientists to change their workflows and adopt new services. I also think there will be a lot to learn about how to improve these tools and get more community involvement. I believe strongly that we need to develop better mechanisms for handling peer review and that it will be a very difficult process getting there. But the results will be seen in more efficient dissemination of information and more effective communication of the details of the scientific process. For this PLoS, the PLoS ONE team, as well as other publishers, including BioMedCentral, Nature Publishing Group, and others, that are working on developing new means of communication and improving the ones we have deserve applause. They may not hit on the right answer first off, but the current process of exploring the options is an important one, and not without its risks for any organisation.

Full disclosure: I was approached along with a number of other bloggers to look at the data provided by PLoS ONE and to coordinate the release of blog posts discussing that data. At the time of writing I am not aware of who the other bloggers are, nor have I read what they have written. The data that was provided included a list of all PLoS ONE papers up until 30 July 2008, the number of citations, citeulike bookmarks, trackbacks, comments, and ratings for each paper. I also received a table of all comments and a timeline with number of comments per month. I have been asked not to release the raw data and will honour that request as it is not my data to release. If you would like to see the underlying data please get in contact with Bora Zivkovic.

12 Replies to “Can post publication peer review work? The PLoS ONE report card”

enro says:

August 27, 2008 at 1:31 pm

I guess it should read â€œWhere there is a _lower_ barrier (click a button to rate versus write a text comment) there is more activity.â€
enro says:

August 27, 2008 at 1:31 pm

I guess it should read â€œWhere there is a _lower_ barrier (click a button to rate versus write a text comment) there is more activity.â€
cameronneylon says:

August 27, 2008 at 3:01 pm

Enro, you are right but I have now modified the text somewhat as I actually had the numbers the wrong way around. Turns out there are more user comments than ratings. Teach me to do these kind of things in a hurryâ€¦
cameronneylon says:

August 27, 2008 at 3:01 pm

Enro, you are right but I have now modified the text somewhat as I actually had the numbers the wrong way around. Turns out there are more user comments than ratings. Teach me to do these kind of things in a hurryâ€¦
Peter Binfield says:

August 27, 2008 at 3:41 pm

Also Cameron, now that you corrected your numbers to show that in fact â€œ23% have comments, notes, or replies to eitherâ€ then your statement â€œI think getting the numbers to around the 10% level has to be seen as an immense success.â€ is out of date.

If around 10% was â€œan immense successâ€ then presumably the actual figure (of 23%) is a (insert superlative of your choice) success?

Pete Binfield (Managing Editor of PLoS ONE)
Peter Binfield says:

August 27, 2008 at 3:41 pm

Also Cameron, now that you corrected your numbers to show that in fact â€œ23% have comments, notes, or replies to eitherâ€ then your statement â€œI think getting the numbers to around the 10% level has to be seen as an immense success.â€ is out of date.

If around 10% was â€œan immense successâ€ then presumably the actual figure (of 23%) is a (insert superlative of your choice) success?

Pete Binfield (Managing Editor of PLoS ONE)
cameronneylon says:

August 27, 2008 at 5:19 pm

Oh dear. Now corrected that as well. How about humungous?
cameronneylon says:

August 27, 2008 at 5:19 pm

Oh dear. Now corrected that as well. How about humungous?
Wobbler says:

August 27, 2008 at 9:51 pm

Thought-provoking post, as always. There is so much I want to say/ask, but not sure where to start. Oh, and I may come off as overly negative, but it done out of a sincere interest in understanding how Plos ONE publishing model will overcome their issues and succeed. I personally have a great interest in new peer review/ scholarly communication models, so that may be why I am asking these questions.

â€˜we have in principle the ability to do peer review differently, more effectively, and more efficientlyâ€™ is a bold statement. Being able to improve peer review that distinguishably would truly be the holy grail of scholarly communication (and likely that of science as well). I am not sure we currently have any examples of such a new revolutionary system, though.

I agree that Plos ONEâ€™s publication policy (the scientific/methodological soundness of the paper being the deciding factor for publication) is certainly a more efficient utilization of peer reviewers to get sound papers out there. However, is it also more effective quality wise? Since the peer review system itself does not provide a quality/significance filter, the question is whether their â€œpost publication comments/ratings to determine significance/originality of a paperâ€ can at least provide that same level of effectivity in filtering quality, while maintaining that advantage in efficiency. I have some serious doubts about that. Mostly it has, again, to do with scaling. Concerning both the time and the (initial) amount of people required to (thoroughly) read the papers to be able to rate/comment on it. And the efficiency and effectivity of validating that rating/commenting process itself.

But let me start in a more logical order. First of all, it is just me, or is it not possible to search/browse Plos ONEâ€™s publications based on their ratings/comments? If not, would that not be a good function to provide for their readers? Considering that is Plos ONEâ€™s system to help readers find original/significant papers? That actually replaces the traditional â€œsignificance/originalityâ€ check by peer reviewers of other publishers? Otherwise, I do not really see the point in having ratings in the first place. Of course, when you do have such a search/browse filter system in place, you may also need to analyze the (potentially negative) effect that will have on having scholars reach and rate/comment on the unrated papers. But either way, you cannot expect them to just search/browse their fields and randomly read and rate/comment papers they find (whenever they arrive on result pages with papers absent of any ratings/comments), do you? Because in that case I seriously doubt it is a more efficient system. It certainly is not more effective to find papers that are perceived as more significant by peers.

So I wanted to find some papers with comments/ratings to see what they are like. Which brings me to another point: in the traditional peer review system, you have â€œqualified peersâ€ and the editors (also somewhat qualified as a â€œpeerâ€ I guess) of the journals to assess the significance of the papers. I am now at the registration page of Plos ONE and I am wondering if there is anything that prevents me from signing up and voting articles up/down? Since I am definitely not qualified to assess most, if not all, of Plos ONEâ€™s papers on quality/significance? In other words: how is Plos ONE planning to verify the quality of the quality assessments of the papers expressed through the ratings? In fact, how many ratings per publication would you find accurate or at least as accurate as the traditional means of (initially) determining its significance? Excluding biased ratings from family, colleagues, students etc.?

In the traditional sense, the editor, and thus the journal publisher, is answerable to both the soundness of the science and the significance of it. Which is there for all the readers to witness and judge for themselves. Not doing their job right means an end to their income/continuity. Since Plos ONEâ€™s peer reviewers (and editors?) are only accountable for its scientific soundness, who will be responsible for the assessment of its significance? And what will be the consequences of that? And, to not ignore the all important scaling factor, remember that we are now talking about ALL the published papers (of Plos ONE), like they are in the traditional publishing models.

Anyway, having expressed some of my concerns concerning this model and with it in my mind, onto your data analysis.

â€˜a little more than 13% (380 of 2773) of all papers have ratingsâ€™

How many of them have more than 1 rating? Consider 2 peer reviewers and an editor for the traditional peer review to judge this, would you say at least 3 objective assessments by (qualified) peers are at least required to be on a somewhat even level?

â€˜around 23% have comments, notes, or replies to either (647 of 2773, not including any from PLoS ONE staff)â€™

Idem ditto for this statistic. And again, how well will this scale? When the time comes that all the papers need this metric early after their publications??

Personally, I think PLoSâ€™ success is largely based on it being a high profile name, with no small help from funding institutions, and being Open Access. I do not think they necessarily stand out as a qualitative journal publisher per se because, as they have stated themselves: they are not explicitly looking for significant/original papers. And it is somewhat demonstrated that Open Access journals do get more cites than non OA. In that sense, I think it is not entirely fair to compare PLoSâ€™ â€œsuccessâ€ with commercial publishers. Based on OAâ€™s citation advantage, one could state that PLoS journalsâ€™ impact ratings (i.e. a metric of its â€œsuccessâ€) would have been lower (and its â€œsuccessâ€ less significant) if they had gone commercial.

And that is all fine and well, but what happens if we ever reach 100% Open Access? Then PLoSâ€™ OA advantage is gone, and then they have to do something else to stand out and maintain (or increase) their impact factor (and the reasons to keep on receiving the funding that they do, compared to the other OA publishers/ journals). And when that time comes, I think one of the key factors is how well their filter for quality/ significance of papers works in comparison to other OA journals/publishers. And to me, that is still the biggest issue with Plos ONEâ€™s different publishing model. End rant.

PS. open peer review is not radical per se. The British Medical Journal (pretty high profile in the medical fields) has been toying with the concept of open peer review since 1999 (http://www.bmj.com/cgi/content/full/318/7175/4) and now it has an open peer review process (http://resources.bmj.com/bmj/authors). And the idea itself has been discussed a lot, and they usually run into the same type of for/against arguments. Natureâ€™s experiment did not receive a whole lot of comments and was not seen as a success by them.
Wobbler says:

August 27, 2008 at 9:51 pm

Thought-provoking post, as always. There is so much I want to say/ask, but not sure where to start. Oh, and I may come off as overly negative, but it done out of a sincere interest in understanding how Plos ONE publishing model will overcome their issues and succeed. I personally have a great interest in new peer review/ scholarly communication models, so that may be why I am asking these questions.

â€˜we have in principle the ability to do peer review differently, more effectively, and more efficientlyâ€™ is a bold statement. Being able to improve peer review that distinguishably would truly be the holy grail of scholarly communication (and likely that of science as well). I am not sure we currently have any examples of such a new revolutionary system, though.

I agree that Plos ONEâ€™s publication policy (the scientific/methodological soundness of the paper being the deciding factor for publication) is certainly a more efficient utilization of peer reviewers to get sound papers out there. However, is it also more effective quality wise? Since the peer review system itself does not provide a quality/significance filter, the question is whether their â€œpost publication comments/ratings to determine significance/originality of a paperâ€ can at least provide that same level of effectivity in filtering quality, while maintaining that advantage in efficiency. I have some serious doubts about that. Mostly it has, again, to do with scaling. Concerning both the time and the (initial) amount of people required to (thoroughly) read the papers to be able to rate/comment on it. And the efficiency and effectivity of validating that rating/commenting process itself.

But let me start in a more logical order. First of all, it is just me, or is it not possible to search/browse Plos ONEâ€™s publications based on their ratings/comments? If not, would that not be a good function to provide for their readers? Considering that is Plos ONEâ€™s system to help readers find original/significant papers? That actually replaces the traditional â€œsignificance/originalityâ€ check by peer reviewers of other publishers? Otherwise, I do not really see the point in having ratings in the first place. Of course, when you do have such a search/browse filter system in place, you may also need to analyze the (potentially negative) effect that will have on having scholars reach and rate/comment on the unrated papers. But either way, you cannot expect them to just search/browse their fields and randomly read and rate/comment papers they find (whenever they arrive on result pages with papers absent of any ratings/comments), do you? Because in that case I seriously doubt it is a more efficient system. It certainly is not more effective to find papers that are perceived as more significant by peers.

So I wanted to find some papers with comments/ratings to see what they are like. Which brings me to another point: in the traditional peer review system, you have â€œqualified peersâ€ and the editors (also somewhat qualified as a â€œpeerâ€ I guess) of the journals to assess the significance of the papers. I am now at the registration page of Plos ONE and I am wondering if there is anything that prevents me from signing up and voting articles up/down? Since I am definitely not qualified to assess most, if not all, of Plos ONEâ€™s papers on quality/significance? In other words: how is Plos ONE planning to verify the quality of the quality assessments of the papers expressed through the ratings? In fact, how many ratings per publication would you find accurate or at least as accurate as the traditional means of (initially) determining its significance? Excluding biased ratings from family, colleagues, students etc.?

In the traditional sense, the editor, and thus the journal publisher, is answerable to both the soundness of the science and the significance of it. Which is there for all the readers to witness and judge for themselves. Not doing their job right means an end to their income/continuity. Since Plos ONEâ€™s peer reviewers (and editors?) are only accountable for its scientific soundness, who will be responsible for the assessment of its significance? And what will be the consequences of that? And, to not ignore the all important scaling factor, remember that we are now talking about ALL the published papers (of Plos ONE), like they are in the traditional publishing models.

Anyway, having expressed some of my concerns concerning this model and with it in my mind, onto your data analysis.

â€˜a little more than 13% (380 of 2773) of all papers have ratingsâ€™

How many of them have more than 1 rating? Consider 2 peer reviewers and an editor for the traditional peer review to judge this, would you say at least 3 objective assessments by (qualified) peers are at least required to be on a somewhat even level?

â€˜around 23% have comments, notes, or replies to either (647 of 2773, not including any from PLoS ONE staff)â€™

Idem ditto for this statistic. And again, how well will this scale? When the time comes that all the papers need this metric early after their publications??

Personally, I think PLoSâ€™ success is largely based on it being a high profile name, with no small help from funding institutions, and being Open Access. I do not think they necessarily stand out as a qualitative journal publisher per se because, as they have stated themselves: they are not explicitly looking for significant/original papers. And it is somewhat demonstrated that Open Access journals do get more cites than non OA. In that sense, I think it is not entirely fair to compare PLoSâ€™ â€œsuccessâ€ with commercial publishers. Based on OAâ€™s citation advantage, one could state that PLoS journalsâ€™ impact ratings (i.e. a metric of its â€œsuccessâ€) would have been lower (and its â€œsuccessâ€ less significant) if they had gone commercial.

And that is all fine and well, but what happens if we ever reach 100% Open Access? Then PLoSâ€™ OA advantage is gone, and then they have to do something else to stand out and maintain (or increase) their impact factor (and the reasons to keep on receiving the funding that they do, compared to the other OA publishers/ journals). And when that time comes, I think one of the key factors is how well their filter for quality/ significance of papers works in comparison to other OA journals/publishers. And to me, that is still the biggest issue with Plos ONEâ€™s different publishing model. End rant.

PS. open peer review is not radical per se. The British Medical Journal (pretty high profile in the medical fields) has been toying with the concept of open peer review since 1999 (http://www.bmj.com/cgi/content/full/318/7175/4) and now it has an open peer review process (http://resources.bmj.com/bmj/authors). And the idea itself has been discussed a lot, and they usually run into the same type of for/against arguments. Natureâ€™s experiment did not receive a whole lot of comments and was not seen as a success by them.
Wobbler says:

August 30, 2008 at 1:29 pm

Actually, I would like to retract my â€˜I do not think they necessarily stand out as a qualitative journal publisher per se because, as they have stated themselves: they are not explicitly looking for significant/original papers.â€™ statement. I was erroneously lumping PLoS ONE together with all the other PLoS journals (and their peer review/publishing policies), which is incorrect. PLoS ONE is the only PLoS journal that applies this peer review model, while the other PLoS journals also make an assessment of a researchâ€™s significance.

But considering PLoSâ€™ success is mostly, if not almost entirely, based on the other journals, I would question the success/attention that PLoS ONE has received even more.
Wobbler says:

August 30, 2008 at 1:29 pm

Actually, I would like to retract my â€˜I do not think they necessarily stand out as a qualitative journal publisher per se because, as they have stated themselves: they are not explicitly looking for significant/original papers.â€™ statement. I was erroneously lumping PLoS ONE together with all the other PLoS journals (and their peer review/publishing policies), which is incorrect. PLoS ONE is the only PLoS journal that applies this peer review model, while the other PLoS journals also make an assessment of a researchâ€™s significance.

But considering PLoSâ€™ success is mostly, if not almost entirely, based on the other journals, I would question the success/attention that PLoS ONE has received even more.

Comments are closed.