Home » Blog

What is the cost of peer review? Can we afford (not to have) high impact journals?

3 March 2009 26 Comments

Late last year the Research Information Network held a workshop in London to launch a report, and in many ways more importantly, a detailed economic model of the scholarly publishing industry. The model aims to capture the diversity of the scholarly publishing industry and to isolate costs and approaches to enable the user to ask questions such as “what is the consequence of moving to a 95% author pays model” as well as to simply ask how much money is going in and where it ends up. I’ve been meaning to write about this for ages but a couple of things in the last week have prompted me to get on and do it.

The first of these was an announcement by email [can’t find a copy online at the moment] by the EPSRC, the UK’s main funder of physical sciences and engineering. While the requirement for a two page enconomic impact statement for each grant proposal got more headlines, what struck me as much more important were two other policy changes. The first was that, unless specifically invited, rejected proposals can not be resubmitted. This may seem strange, particularly to US researchers, where a process of refinement and resubmission, perhaps multiple times, is standard, but the BBSRC (UK biological sciences funder) has had a similar policy for some years. The second, frankly somewhat scarey change, is that some proportion of researchers that have a history of rejection will be barred from applying altogether. What is the reason for these changes? Fundamentally the burden of carrying out peer review on all of the submitted proposals is becoming too great.

The second thing was that, for the first time, I have been involved in refereeing a paper for a Nature Publishing Group journal. Now I like to think, like I guess everyone else does, that I do a reasonable job of paper refereeing. I wrote perhaps one and a half sides of A4 describing what I thought was important about the paper and making some specific criticisms and suggestions for changes. The paper went around the loop and on the second revision I saw what the other referees had written; pages upon pages of closely argued and detailed points. Now the other referees were much more critical of the paper but nonetheless this supported a suspicion that I have had for some time, that refereeing at some high impact journals is qualitatively different to what the majority of us receive, and probably deliver; an often form driven exercise with a couple of lines of comments and complaints. This level of quality peer review takes an awful lot of time and it costs money; money that is coming from somewhere. Nonetheless it provides better feedback for authors and no doubt means the end product is better than it would otherwise have been.

The final factor was a blog post from Molecular Philosophy discussing why the author felt Open Access Publishers are, if not doomed to failure, then face a very challenging road ahead. The centre of the argument as I understand it focused around the costs of high impact journals, particularly the costs of selection, refinement, and preparation for print. Broadly speaking I think it is generally accepted that a volume model of OA publication, such as that practiced by PLoS ONE and BMC can be profitable. I think it is also generally accepted that a profitable business model for high impact OA publication has yet to be convincingly demonstrated. The question I would like to ask though is different. The Molecular Philosophy post skips the zeroth order questions. Can we afford high impact publications?

Returning to the RIN funded study and model of scholarly publishing some very interesting points came out [see Daniel Hull’s presentation for most of the data here]. The first of these, which in retrospect is obvious but important, is that the vast majority of the costs of producing a paper are incurred in doing the research it describes (£116G worldwide). The second biggest contributor? Researchers reading the papers (£34G worldwide). Only about 14% of the costs of the total life cycle are actually taken up with costs directly attributable to publication. But that is the 14% we are interested in, so how does it divide up?

The “Scholarly Communication Process” as everything in the middle is termed in the model is divided up into actual publication/distribution costs (£6.4G), access provision costs (providing libraries and internet access, £2.1G) and the costs of researchers looking for articles (£16.4G). Yes, the biggest cost is the time you spend trying to find those papers. Arguably that is a sunk cost in as much as once you’ve decided to do research searching for information is a given, but it does make the point that more efficient searching has the potential to save a lot of money. In any case it is a non-cash cost in terms of journal subscriptions or author charges.

So to find the real costs of publication per se we need to look inside that £6.4. Of the costs of actually publishing the articles the biggest single cost is peer review weighing in at around £1.9G globally, just ahead of fixed “first copy” publication costs of £1.8G. So 29% of the total costs incurred in publication and distribution of scholarly articles arises from the cost of peer review.

There are lots of other interesting points in the reports and models (the UK is a net exporter of peer review, but the UK publishes more articles than would be expected based on its subscription expenditure) but the most interesting aspect of the model is its ability to model changes in the publishing landscape. The first scenario presented is one in which publication moves to being 90% electronic. This actually leads to a fairly modest decrease in costs overall with a total overall saving of a little under £1G (less than 1%). Modeling a move to a 90% author pays model (assuming 90% electronic only) leads to very little change overall, but interestingly that depends significantly on the cost of systems put in place to make author payments. If these are expensive and bureaucratic then the costs can rise as many small payments are more expensive than few big ones. But overall the costs shouldn’t need to change much, meaning if mechanisms can be put in place to move the money around, the business models should ultimately be able to make sense. None of this however helps in figuring out how to manage a transition from one system to another, when for all useful purposes costs are likely to double in the short term as systems are duplicated.

The most interesting scenario, though was the third. What happens as research expands. A 2.5% real increase year on year for ten years was modeled. This may seem profligate in today’s economic situation but with many countries explicitly spending stimulus money on research, or already engaged in large scale increases of structural research funding it may not be far off. This results in 28% more articles, 11% more journals, a 12% increase in subscription costs (assuming of course that only the real cost increases are passed on) and a 25% increase in the costs of peer review (£531M on a base of £1.8G).

I started this post talking about proposal refereeing. The increased cost in refereeing proposals as the volume of science increases would be added on top of that for journals. I think it is safe to say that the increase in cost would be of the same order. The refereeing system is already struggling under the burden. Funding bodies are creating new, and arguably totally unfair, rules to try and reduce the burden, journals are struggling to find referees for paper. Increases in the volume of science, whether they come from increased funding in the western world or from growing, increasingly technology driven, economies could easily increase that burden by 20-30% in the next ten years. I am sceptical that the system, as it currently exists, can cope and I am sceptical that peer review, in its current form is affordable in the medium to long term.

So, bearing in mind Paulo’s admonishment that I need to offer solutions as well as problems, what can we do about this? We need to find a way of doing peer review effectively, but it needs to be more efficient. Equally if there are areas where we can save money we should be doing that. Remember that £16.4G just to find the papers to read? I believe in post-publication peer review because it reduces the costs and time wasted in bringing work to community view and because it makes the filtering and quality assurance of that published work continuous and ongoing. But in the current context it offers significant cost savings. A significant proportion of published papers are never cited. To me it follows from this that there is no point in peer reviewing them. Indeed citation is an act of post-publication peer review in its own right and it has recently been shown that Google PageRank type algorithms do a pretty good job of identifying important papers without any human involvement at all (beyond the act of citation). Of course for PageRank mechanisms to work well the citation and its full context are needed making OA a pre-requisite.

If refereeing can be restricted to those papers that are worth the effort then it should be possible to reduce the burden significantly. But what does this mean for high impact journals? The whole point of high impact journals is that they are hard to get into. This is why both the editorial staff and peer review costs are so high for them. Many people make the case that they are crucial for helping to filter out the important papers (remember that £16.4G again). In turn I would argue that they reduce value by making the process of deciding what is “important” a closed shop, taking that decision away, to a certain extent, from the community where I feel it belongs. But at the end of the day it is a purely economic argument. What is the overall cost of running, supporting through peer review, and paying for, either by subscription or via author charges, a journal at the very top level? What are the benefits gained in terms of filtering and how do they compare to other filtering systems. Do the benefits justify the costs?

If we believe that there are better filtering systems possible, then they need to be built, and the cost benefit analysis done. The opportunity is coming soon to offer different, and more efficient, approaches as the burden becomes too much to handle. We either have to bear the cost or find better solutions.

[This has got far too long already – and I don’t have any simple answers in terms of refereeing grant proposals but will try to put some ideas in another post which is long overdue in response to a promise to Duncan Hull]


  • Anna

    Cameron – if you can find a link to the new policies (I too tried to search the EPSRC website to no avail) I would be most interested, since a restriction on submissions based on failure rate, especially when the failure rate is so high, would be a disaster for academics working outside the “big 3″. But then BBSRC (at least) made it quite clear to us that they don’t actually want to fund anyone else (as did the EPSRC head honcho in his appearance before a parliamentary committee). I would like to check though, in case this is just grants that get rejected by all referees (rather than the funding committee) …

  • Anna

    Cameron – if you can find a link to the new policies (I too tried to search the EPSRC website to no avail) I would be most interested, since a restriction on submissions based on failure rate, especially when the failure rate is so high, would be a disaster for academics working outside the “big 3″. But then BBSRC (at least) made it quite clear to us that they don’t actually want to fund anyone else (as did the EPSRC head honcho in his appearance before a parliamentary committee). I would like to check though, in case this is just grants that get rejected by all referees (rather than the funding committee) …

  • I’ve only seen the email (and it wasn’t sent to me as I’m not a current grant holder) but I can see if I can raise a copy. I haven’t been able to find the actual policy statement anywhere online though. From what I remember the statement was that they expected “a few hundred” UK academics to be excluded. But yes, the details are important. If anyone has a pointer to it I’d appreciate a link.

  • I’ve only seen the email (and it wasn’t sent to me as I’m not a current grant holder) but I can see if I can raise a copy. I haven’t been able to find the actual policy statement anywhere online though. From what I remember the statement was that they expected “a few hundred” UK academics to be excluded. But yes, the details are important. If anyone has a pointer to it I’d appreciate a link.

  • Hi Cameron,
    Thanks for linking to my blog. The figures you are presenting are very interesting, but going back to my obsession with high impact, you must admit that the peer review costs for high to very high impact journals are disproportionately higher compared to the rest of the publishing landscape. Can we afford to take these costs on ourselves in the form of subscription fees? We have no choice, really. Either we pay up, or the whole system of motivations promoting and rewarding the best science and the best review falls to pieces, as I have been trying to illustrate in my blog posts (3 of them by now).

  • Hi Cameron,
    Thanks for linking to my blog. The figures you are presenting are very interesting, but going back to my obsession with high impact, you must admit that the peer review costs for high to very high impact journals are disproportionately higher compared to the rest of the publishing landscape. Can we afford to take these costs on ourselves in the form of subscription fees? We have no choice, really. Either we pay up, or the whole system of motivations promoting and rewarding the best science and the best review falls to pieces, as I have been trying to illustrate in my blog posts (3 of them by now).

  • Hi Niewiap, thanks for dropping by.

    I think we may have to agree to disagree on your last point. My personal belief is that the current reward system has to be gently pulled apart and re-worked. I only just got to read your third post in the series but you sum up where we disagree in the middle when you say:

    “The argument goes that tenure/recruitment/funding commitees don’t bother reading the candidates papers and evaluating their value themselves and rather rely on IF as the ultimate measure of a paper’s worth. And in most cases THEY ARE RIGHT.”

    This simply isn’t true. The correlation of IF a journal with subsequent (and still debatable) measures of the quality of individual papers is extremely poor. And there are much better measures of the quality of papers. Or if I may suggest a different scenario. We may as well cut out all the editorial and paper writing altogether and only give people who went to the top schools tenure. The correlation with quality is no doubt similar (and for similar reasons) – and we’ll save a lot of cash.

    Or another scenario. You have a limited quantity of a life-saving drug and you need to choose who you are going to give it to. You have mortality statistics for the disease for five populations. Under some circumstances it might be acceptable to use the statistics to make the decision about which populations get the drug. If you have a cheap diagnostic test then using the averages is reprehensible. Yet that is exactly what we do in science when using figures like the IF. Actually it is slightly worse – it is like using statistics that are known to irreproducible.

    My claims are two-fold. If we want to measure and reward quality science, then looking at the Impact Factor of the journals where people have published is actually one of the worst ways of doing it and there are better ways available today. Secondly we need to start saving money somewhere, and there are ways of developing improvements to that measurement process that have the potential to save money. We need to examine those alternatives closely, do the cost benefit analysis, and also see the extent to which they are practical. You might think adoption is a problem but whatever system the funding agencies use is the one that will drive hiring and promotion so if funding councils adopt a system then everyone else will follow.

    Essentially if the volume of science keeps increasing we will spend all our time refereeing and none of our time doing any work. Then the system will come crashing down around our ears.

  • Hi Niewiap, thanks for dropping by.

    I think we may have to agree to disagree on your last point. My personal belief is that the current reward system has to be gently pulled apart and re-worked. I only just got to read your third post in the series but you sum up where we disagree in the middle when you say:

    “The argument goes that tenure/recruitment/funding commitees don’t bother reading the candidates papers and evaluating their value themselves and rather rely on IF as the ultimate measure of a paper’s worth. And in most cases THEY ARE RIGHT.”

    This simply isn’t true. The correlation of IF a journal with subsequent (and still debatable) measures of the quality of individual papers is extremely poor. And there are much better measures of the quality of papers. Or if I may suggest a different scenario. We may as well cut out all the editorial and paper writing altogether and only give people who went to the top schools tenure. The correlation with quality is no doubt similar (and for similar reasons) – and we’ll save a lot of cash.

    Or another scenario. You have a limited quantity of a life-saving drug and you need to choose who you are going to give it to. You have mortality statistics for the disease for five populations. Under some circumstances it might be acceptable to use the statistics to make the decision about which populations get the drug. If you have a cheap diagnostic test then using the averages is reprehensible. Yet that is exactly what we do in science when using figures like the IF. Actually it is slightly worse – it is like using statistics that are known to irreproducible.

    My claims are two-fold. If we want to measure and reward quality science, then looking at the Impact Factor of the journals where people have published is actually one of the worst ways of doing it and there are better ways available today. Secondly we need to start saving money somewhere, and there are ways of developing improvements to that measurement process that have the potential to save money. We need to examine those alternatives closely, do the cost benefit analysis, and also see the extent to which they are practical. You might think adoption is a problem but whatever system the funding agencies use is the one that will drive hiring and promotion so if funding councils adopt a system then everyone else will follow.

    Essentially if the volume of science keeps increasing we will spend all our time refereeing and none of our time doing any work. Then the system will come crashing down around our ears.

  • Cameron,
    I do read a lot of papers these days, and in a huge majority of cases a paper in Cell presents more innovative and more comprehensive science than a paper in, say, JCB or International Journal of Applied Pediatric Psychopharmacology. There are exceptions to the rule and I have heard many a rant about how the system is unfair, because somebody didn’t get their superhawtawesome data into Cell or Nature, but I beg to differ that we have a better system of judging what paper is better than the IF of the journal. The IF of the journal has one enormous advantage over all other measures – it is immediate. Biomedical science, at least in the US, is extremely fast-paced, with most young PIs having just 5 years to prove their worth to get tenure, labs barely making it with their next RO1s before the previous one runs out, all of these being dependent on how well they publish. All “alternative” measures of the paper’s worth (individual quotation records, for example) take time to become reliable, and they will take even more time in the absence of the “first approximation” IF measure. If we were to use them, we would have to postpone employment, tenure, and funding decisions by at least a couple of years, and that would bring everything to a standstill. I agree that we might as well use someone’s high school GPA instead of the IF of their papers, and probably we would be correct in >50% of the cases, but I think the accuracy of prediction rises as follows: high school GPA < college GPA < grad school GPA < grad school publication record (IF) < postdoc publication record (IF) < first author postdoc publication record (IF) < other alternative measures which take time to become accurate. I agree that we should constantly be on the lookout for better ways to measure good science, but in that pursuit we should never lose those annoying practical issues out of sight. Finally, I cannot agree with your last statement, because if the volume of sciences keeps increasing, also the number of scientists increases proportionately and so the referee burden per scientist will remain constant.
    I hope you will come visit my blog in the next few days, because I am planning to summarize the whole multi-faceted discussion soon and I will be looking forward to hearing your comments.

  • Cameron,
    I do read a lot of papers these days, and in a huge majority of cases a paper in Cell presents more innovative and more comprehensive science than a paper in, say, JCB or International Journal of Applied Pediatric Psychopharmacology. There are exceptions to the rule and I have heard many a rant about how the system is unfair, because somebody didn’t get their superhawtawesome data into Cell or Nature, but I beg to differ that we have a better system of judging what paper is better than the IF of the journal. The IF of the journal has one enormous advantage over all other measures – it is immediate. Biomedical science, at least in the US, is extremely fast-paced, with most young PIs having just 5 years to prove their worth to get tenure, labs barely making it with their next RO1s before the previous one runs out, all of these being dependent on how well they publish. All “alternative” measures of the paper’s worth (individual quotation records, for example) take time to become reliable, and they will take even more time in the absence of the “first approximation” IF measure. If we were to use them, we would have to postpone employment, tenure, and funding decisions by at least a couple of years, and that would bring everything to a standstill. I agree that we might as well use someone’s high school GPA instead of the IF of their papers, and probably we would be correct in >50% of the cases, but I think the accuracy of prediction rises as follows: high school GPA < college GPA < grad school GPA < grad school publication record (IF) < postdoc publication record (IF) < first author postdoc publication record (IF) < other alternative measures which take time to become accurate. I agree that we should constantly be on the lookout for better ways to measure good science, but in that pursuit we should never lose those annoying practical issues out of sight. Finally, I cannot agree with your last statement, because if the volume of sciences keeps increasing, also the number of scientists increases proportionately and so the referee burden per scientist will remain constant.
    I hope you will come visit my blog in the next few days, because I am planning to summarize the whole multi-faceted discussion soon and I will be looking forward to hearing your comments.

  • Niewap, have you read Bjoern Brembs‘ piece on the flaws of the impact factor? Bjoern has strong views on this but look closely at the data. Also bear in mind that he has some serious papers in serious journals and has just got one of the most prestigious fellowships available in Europe, so while he doesn’t agree with the system, he looks pretty good when measured by it.

    In particular look at the graph of the correlation of paper quality (as measured by citations, also a flawed measure but at least more precisely related to each paper) and impact factor of journal. I suspect that there would actually be a higher correlation with the school the paper comes from but I will see if I can get data to support that. It doesn’t really matter, they are appalling measures. There are better ones available.

    What really bothers me about this is that as scientists we are supposed to be serious about measuring things properly. Making funding and appointment decisions are amongst the most important decisions that we make, and they involve putting in a lot of public money. If that type of attitude presented in an undergraduate laboratory we would fail the student (“I couldn’t be bothered measuring the melting point of my compound, so I took the average of everyone else’s measurement”). Why is it acceptable in science policy? Because we can’t be bothered doing it properly? You say that “you reckon” papers in Cell are better than those in other journals. On what basis? Have you measured it? Have you checked whether there is reinforcement bias and run proper control experiments? Would you take that kind of assertion seriously if someone made it about a question in your scientific area? I don’t mean that as a personal criticism, a lot of scientists do exactly the same thing, but shouldn’t we apply the same rigour to how we do science as to the actual doing?

    I agree that timeliness of measures is important. It is one argument for avoiding pre-publication peer review though. Get the papers out there and under discussion so that you get some measure quicker. I don’t think it solves the whole problem because it will take two to five years (or ten to twenty in some disciplines) for the “true” value of a paper to become clear. But if that’s the case then why are we using what is close to a random number (again, check out that graph) as a proxy of something that is difficult to measure? Remember “not everything that is important can be measured”. But in this case there are plenty of other, better, faster, measures including release of referees reports, an editorial assessment of value relative to other papers published that year, number of image downloads, number of times the paper is bookmarked, number of times the paper is blogged about; any number of things. All of these are pretty immediate and most of these correlate better with total citations than the IF

    Your point re: burden isn’t quite right. An increase in funding leads to an increase in research workers and an increase in papers, but not necessarily an increase, certainly not of the same proportion, in the number of reviewers (effectively PIs). So the personal burden does increase. I’d be surprised if you can find an average PI (i.e. not a journal editor) who hasn’t seen the refereeing burden increase in the last 5 – 10 years. It also misses the fact that peer review is unevenly distributed with the UK and the US (and to a lesser extent the industrialised West generally) taking a disproportionate amount of the burden. But perhaps that is as it should be. More importantly the cost of refereeing as a proportion total science funding will rise – if it continues eventually all science funding will be taken up by refereeing – or, more likely, the quality of refereeing will continue to decrease, to the point where a significant proportion of our time is taken up generating a product which is of such low quality that no-one can make use of it.

    My argument isn’t that the system is unfair – all systems are unfair, that is just life. My argument is that, as people who invest public money in the production of science, we need to take a rigorous approach to maximising the value of that investment. If we as a community aren’t prepared to take that responsibility then frankly I’d rather see the money going into better high schools because I think the social and economic impact would be much greater.

  • Niewap, have you read Bjoern Brembs‘ piece on the flaws of the impact factor? Bjoern has strong views on this but look closely at the data. Also bear in mind that he has some serious papers in serious journals and has just got one of the most prestigious fellowships available in Europe, so while he doesn’t agree with the system, he looks pretty good when measured by it.

    In particular look at the graph of the correlation of paper quality (as measured by citations, also a flawed measure but at least more precisely related to each paper) and impact factor of journal. I suspect that there would actually be a higher correlation with the school the paper comes from but I will see if I can get data to support that. It doesn’t really matter, they are appalling measures. There are better ones available.

    What really bothers me about this is that as scientists we are supposed to be serious about measuring things properly. Making funding and appointment decisions are amongst the most important decisions that we make, and they involve putting in a lot of public money. If that type of attitude presented in an undergraduate laboratory we would fail the student (“I couldn’t be bothered measuring the melting point of my compound, so I took the average of everyone else’s measurement”). Why is it acceptable in science policy? Because we can’t be bothered doing it properly? You say that “you reckon” papers in Cell are better than those in other journals. On what basis? Have you measured it? Have you checked whether there is reinforcement bias and run proper control experiments? Would you take that kind of assertion seriously if someone made it about a question in your scientific area? I don’t mean that as a personal criticism, a lot of scientists do exactly the same thing, but shouldn’t we apply the same rigour to how we do science as to the actual doing?

    I agree that timeliness of measures is important. It is one argument for avoiding pre-publication peer review though. Get the papers out there and under discussion so that you get some measure quicker. I don’t think it solves the whole problem because it will take two to five years (or ten to twenty in some disciplines) for the “true” value of a paper to become clear. But if that’s the case then why are we using what is close to a random number (again, check out that graph) as a proxy of something that is difficult to measure? Remember “not everything that is important can be measured”. But in this case there are plenty of other, better, faster, measures including release of referees reports, an editorial assessment of value relative to other papers published that year, number of image downloads, number of times the paper is bookmarked, number of times the paper is blogged about; any number of things. All of these are pretty immediate and most of these correlate better with total citations than the IF

    Your point re: burden isn’t quite right. An increase in funding leads to an increase in research workers and an increase in papers, but not necessarily an increase, certainly not of the same proportion, in the number of reviewers (effectively PIs). So the personal burden does increase. I’d be surprised if you can find an average PI (i.e. not a journal editor) who hasn’t seen the refereeing burden increase in the last 5 – 10 years. It also misses the fact that peer review is unevenly distributed with the UK and the US (and to a lesser extent the industrialised West generally) taking a disproportionate amount of the burden. But perhaps that is as it should be. More importantly the cost of refereeing as a proportion total science funding will rise – if it continues eventually all science funding will be taken up by refereeing – or, more likely, the quality of refereeing will continue to decrease, to the point where a significant proportion of our time is taken up generating a product which is of such low quality that no-one can make use of it.

    My argument isn’t that the system is unfair – all systems are unfair, that is just life. My argument is that, as people who invest public money in the production of science, we need to take a rigorous approach to maximising the value of that investment. If we as a community aren’t prepared to take that responsibility then frankly I’d rather see the money going into better high schools because I think the social and economic impact would be much greater.

  • Cameron, I agree that the graphs look pretty dismal, but I am going to have to read the paper in BMJ for more detail of the methods used. Thanks for the link. All the measures of impact you mentioned, however, have their own serious flaws, some of which completely preclude their use. Let me address the ones you have mentioned:
    – I have addressed the flaws of post-publication peer review in Part 3 of my “Socialism in Science” series – the tone is evidently ranty, but if you can strip away the emotional veil, the gist is clear.
    – editorial assessment – you are simply adding a 4th reviewer to the three that have already given the paper a go ahead. Doesn’t really change anything and pushes the decision about impact back by a year. In addition, for non-niche high impact science, the editor will not be a good judge, because they will not be knowledgeable enough in the field of each paper. Editors will also obviously (and perhaps even unconciously) favor papers from fields more familiar to them.
    – number of downloads – can be easily exploited by downloading the paper multiple times from several computers by the author and his buddies. Measures the sexiness of the title and the abstract rather than scientific value and quality.
    – bookmarking and blogging – I never bookmark any papers and blog about very few and I don’t think many people do – may be a measure of the paper’s “sexiness” and “media presentability”, but not the soundness of the science and the usefulness to the community. Most papers would not get a single bookmark or blog entry.
    IF is really bad, but at this point, you must admit, we have nothing truly better, or at least nothing that has been PROVEN better. If you want to be scientifically rigorous, show me the data about the other measures you propose. The good thing about IF is that we know its weaknesses and we can compensate for them as much as we can. For all these other measures, I hope you see that, although they seem perfect at first glance, they are almost impossible to implement. I would advise further studies of any kind of “alternative” measures before we go ahead and start using them on a larger scale.

  • Cameron, I agree that the graphs look pretty dismal, but I am going to have to read the paper in BMJ for more detail of the methods used. Thanks for the link. All the measures of impact you mentioned, however, have their own serious flaws, some of which completely preclude their use. Let me address the ones you have mentioned:
    – I have addressed the flaws of post-publication peer review in Part 3 of my “Socialism in Science” series – the tone is evidently ranty, but if you can strip away the emotional veil, the gist is clear.
    – editorial assessment – you are simply adding a 4th reviewer to the three that have already given the paper a go ahead. Doesn’t really change anything and pushes the decision about impact back by a year. In addition, for non-niche high impact science, the editor will not be a good judge, because they will not be knowledgeable enough in the field of each paper. Editors will also obviously (and perhaps even unconciously) favor papers from fields more familiar to them.
    – number of downloads – can be easily exploited by downloading the paper multiple times from several computers by the author and his buddies. Measures the sexiness of the title and the abstract rather than scientific value and quality.
    – bookmarking and blogging – I never bookmark any papers and blog about very few and I don’t think many people do – may be a measure of the paper’s “sexiness” and “media presentability”, but not the soundness of the science and the usefulness to the community. Most papers would not get a single bookmark or blog entry.
    IF is really bad, but at this point, you must admit, we have nothing truly better, or at least nothing that has been PROVEN better. If you want to be scientifically rigorous, show me the data about the other measures you propose. The good thing about IF is that we know its weaknesses and we can compensate for them as much as we can. For all these other measures, I hope you see that, although they seem perfect at first glance, they are almost impossible to implement. I would advise further studies of any kind of “alternative” measures before we go ahead and start using them on a larger scale.

  • Cameron,

    I just read a review of a book called How Professors Think: Inside the Curious World of Academic Judgment by Michèle Lamont. Looks like a well-informed critique of problems with peer review as currently practiced in the U.S.

  • Cameron,

    I just read a review of a book called How Professors Think: Inside the Curious World of Academic Judgment by Michèle Lamont. Looks like a well-informed critique of problems with peer review as currently practiced in the U.S.

  • Niewap, I have to say I think your arguments against post publication peer review are strawmen. I don’t get teenage abuse here, we don’t see it on openwetware, there hasn’t been any on PLoS or BMC comment pages as far as I’ve seen. The “we’ll get swamped by the great unwashed masses” argument has two fatal flaws as far as I can see. One is, most people simply couldn’t be bothered, only those with an interest will look. But secondly it smacks of elitism of the worst kind – you know, maybe you should be interacting with those teenagers and telling them why your science is cool. Maybe if people saw that process out in the open they’d appreciate more about how science is actually carried out and be less impressed by nonsense about MMR and Facebook causing cancer?

    Also bear in mind that PPPR can also speed up the process of getting this feedback you want. Paper is published quicker, and peer review need take no longer than the 6-12 months it currently takes to get a paper published. Get your science out there and have an impact – rather than wait till its 12 months out of date in the hope that the journal title will do it for you.

    Editorial assessment: why can’t the editor (who is not just a fourth reviewer, they’ve actually seen all the other submissions for that year) simply make their written assessment public on publication? No time lag, no nothing? Just more information than we have at the moment. Don’t see how it can be bad.

    Bookmarking and blogging: Pedro Beltrao has looked at the correlations and they are actually demonstrably slightly better than the impact factor at picking important papers; the correlations are slightly better (but there are probably confounding variables so that analysis should be treated with care). Again though, its no worse at picking good papers, and it is a lot faster. What’s not to like? You want something provably better? No worse, and much cheaper. Isn’t that enough?

    At the end of the day though, remember that none of this is going to change overnight. We can still have traditional journals alongside pre-print archives and post publication review. We can still have high impact journals, based on cherry picking from the full range of published papers on the basis of PPPR even.

    But at core what I just don’t get is that you keep telling me that we need to spend public money on what we know to be a broken system because otherwise we won’t be able to decide who gets a job or a grant. What happened to appointment committees and grant review panels actually making an informed assessment of the science?

    Because I have to say I don’t want to work anywhere where decisions are made on the basis of bad metrics. If that is the level of rigour people bring to important decisions (“oh we know the machine isn’t calibrated, but we can’t be bothered doing anything about it; we’ve got to get on and make more measurements”) then how can I trust the rest of their science?

  • Niewap, I have to say I think your arguments against post publication peer review are strawmen. I don’t get teenage abuse here, we don’t see it on openwetware, there hasn’t been any on PLoS or BMC comment pages as far as I’ve seen. The “we’ll get swamped by the great unwashed masses” argument has two fatal flaws as far as I can see. One is, most people simply couldn’t be bothered, only those with an interest will look. But secondly it smacks of elitism of the worst kind – you know, maybe you should be interacting with those teenagers and telling them why your science is cool. Maybe if people saw that process out in the open they’d appreciate more about how science is actually carried out and be less impressed by nonsense about MMR and Facebook causing cancer?

    Also bear in mind that PPPR can also speed up the process of getting this feedback you want. Paper is published quicker, and peer review need take no longer than the 6-12 months it currently takes to get a paper published. Get your science out there and have an impact – rather than wait till its 12 months out of date in the hope that the journal title will do it for you.

    Editorial assessment: why can’t the editor (who is not just a fourth reviewer, they’ve actually seen all the other submissions for that year) simply make their written assessment public on publication? No time lag, no nothing? Just more information than we have at the moment. Don’t see how it can be bad.

    Bookmarking and blogging: Pedro Beltrao has looked at the correlations and they are actually demonstrably slightly better than the impact factor at picking important papers; the correlations are slightly better (but there are probably confounding variables so that analysis should be treated with care). Again though, its no worse at picking good papers, and it is a lot faster. What’s not to like? You want something provably better? No worse, and much cheaper. Isn’t that enough?

    At the end of the day though, remember that none of this is going to change overnight. We can still have traditional journals alongside pre-print archives and post publication review. We can still have high impact journals, based on cherry picking from the full range of published papers on the basis of PPPR even.

    But at core what I just don’t get is that you keep telling me that we need to spend public money on what we know to be a broken system because otherwise we won’t be able to decide who gets a job or a grant. What happened to appointment committees and grant review panels actually making an informed assessment of the science?

    Because I have to say I don’t want to work anywhere where decisions are made on the basis of bad metrics. If that is the level of rigour people bring to important decisions (“oh we know the machine isn’t calibrated, but we can’t be bothered doing anything about it; we’ve got to get on and make more measurements”) then how can I trust the rest of their science?

  • I should say that I did not compare directly IFs with other measures of impact. I don’t think I had enough numbers to check this. I only wanted to show that social measures of attention appear to be informative of impact of a given paper.

    Getting back to this discussion, given the evidence that there is no correlation between number of citations of an article and the IF of a journal the article is published in I think it is clear that one cannot evaluate an article based on where it is published. It is just wrong.
    In an ideal world scientists and their work should be evaluated on a case by case basis but if there is not enough time for this then I don’t see why grant/application reviewers should not have different indicators to look at, why should they look only at the IF if it is easy for publishers to gather and make available other possible indications of impact.

    Regarding the current tiered system of journals and the price it cost. I also don’t think we can do away with it until we find a system that effectively replaces this. We should be clear that the important function that this serves is as a filtering system and this should be the aim of any system we end up implementing.

    niewiap, you might have only recently stumbled upon this debate but we are certainly not the only people thinking about this. I suggest you have a look for example the Frontiers in Neuroscience journals and the way they are planning to gather usage information to measure impact (http://frontiersin.org/evaluationsystem/).

  • I should say that I did not compare directly IFs with other measures of impact. I don’t think I had enough numbers to check this. I only wanted to show that social measures of attention appear to be informative of impact of a given paper.

    Getting back to this discussion, given the evidence that there is no correlation between number of citations of an article and the IF of a journal the article is published in I think it is clear that one cannot evaluate an article based on where it is published. It is just wrong.
    In an ideal world scientists and their work should be evaluated on a case by case basis but if there is not enough time for this then I don’t see why grant/application reviewers should not have different indicators to look at, why should they look only at the IF if it is easy for publishers to gather and make available other possible indications of impact.

    Regarding the current tiered system of journals and the price it cost. I also don’t think we can do away with it until we find a system that effectively replaces this. We should be clear that the important function that this serves is as a filtering system and this should be the aim of any system we end up implementing.

    niewiap, you might have only recently stumbled upon this debate but we are certainly not the only people thinking about this. I suggest you have a look for example the Frontiers in Neuroscience journals and the way they are planning to gather usage information to measure impact (http://frontiersin.org/evaluationsystem/).

  • Cameron and Pedro,
    Thanks for replying to my concerns. I am aware of the stirring debate on measurements of impact, but only now have been able to look into it in more depth. As for the IF issue and the BMJ paper, most problems that are mentioned in the paper are purely technical, and can easily be eliminated without disrupting the principle premise, ie. corellation between impact of any given publication and impact of the journal it was published in. As for social measures (blogs/bookmarking) I am worried that only the very top impact papers will get enough attention to even show on the radar, while all the “middle ground” will remain uncharted. Journal IF has one more important advantage over other measures, ie. it is self-propagating in that only people with really good data on their hand will submit to Cell or Nature, while all the rest will aim lower. The reviewers and editors for Cell or Nature will also be more picky than for other lower impact journals.
    The argument about “not being bothered” to review is a double-edged sword. On one hand you say that teenagers will not want to do it, but then, why would super-busy experts in the field want to? Besides, I am still sticking to the argument that post-print review will either be really prone to abuse, or no less costly than the current editorial process. The “quickness” of getting the data out there in the post-print review system comes at a cost of getting a paper out that has not undergone any revisions and could contain serious methodological and reasoning flaws, which the current system of pre-pub peer review takes care of (mostly).
    All that said, I am not saying we should keep wasting money on a broken system, but that we should make-do with what we have until we find a better solution. I would love to see a system that would show better correlation between the quality and impact of a paper and the number we assign to it. It is possible that we will have to employ a hybrid system where pre-pub review and an enhanced journal IF (with all the technical easy to fix flaws of the current IF eliminated) will be combined with social measures and post-pub assessment by the community. Further studies are definitely warranted. I will be sure to check out the frontiers in neuroscience website.

  • Cameron and Pedro,
    Thanks for replying to my concerns. I am aware of the stirring debate on measurements of impact, but only now have been able to look into it in more depth. As for the IF issue and the BMJ paper, most problems that are mentioned in the paper are purely technical, and can easily be eliminated without disrupting the principle premise, ie. corellation between impact of any given publication and impact of the journal it was published in. As for social measures (blogs/bookmarking) I am worried that only the very top impact papers will get enough attention to even show on the radar, while all the “middle ground” will remain uncharted. Journal IF has one more important advantage over other measures, ie. it is self-propagating in that only people with really good data on their hand will submit to Cell or Nature, while all the rest will aim lower. The reviewers and editors for Cell or Nature will also be more picky than for other lower impact journals.
    The argument about “not being bothered” to review is a double-edged sword. On one hand you say that teenagers will not want to do it, but then, why would super-busy experts in the field want to? Besides, I am still sticking to the argument that post-print review will either be really prone to abuse, or no less costly than the current editorial process. The “quickness” of getting the data out there in the post-print review system comes at a cost of getting a paper out that has not undergone any revisions and could contain serious methodological and reasoning flaws, which the current system of pre-pub peer review takes care of (mostly).
    All that said, I am not saying we should keep wasting money on a broken system, but that we should make-do with what we have until we find a better solution. I would love to see a system that would show better correlation between the quality and impact of a paper and the number we assign to it. It is possible that we will have to employ a hybrid system where pre-pub review and an enhanced journal IF (with all the technical easy to fix flaws of the current IF eliminated) will be combined with social measures and post-pub assessment by the community. Further studies are definitely warranted. I will be sure to check out the frontiers in neuroscience website.

  • Pedro,
    How long has the Frontiers system been in use? Is it working out? It seems like it might be costly and a bit time-consuming, but perhaps again not as time consuming and traditional peer-review (they claim average time from submission to acceptance to be 80 days, which is really fast). Are they noticing any problems at this point? I hope it works out and is more widely adopted, because it looks very interesting.

  • Pedro,
    How long has the Frontiers system been in use? Is it working out? It seems like it might be costly and a bit time-consuming, but perhaps again not as time consuming and traditional peer-review (they claim average time from submission to acceptance to be 80 days, which is really fast). Are they noticing any problems at this point? I hope it works out and is more widely adopted, because it looks very interesting.

  • It’s quite sad, but maybe not unexpected, that even scientists seem to have a hard time accepting that some things are just very difficult, if not impossible to measure objectively. Scientific quality is just such a thing. Einstein once said that not everything that can be counted counts and not everything that counts can be counted. Scientific quality cannot be counted because it means different things for different people. Quite obviously, if you really like papers in hi IF journals, then definitely your “quality” is very different from my “quality”! We may agree on some papers, but most we would disagree.
    However, there of course are some variables besides quality which may correlate more or less with what you desire in a paper or a candidate. Citations, bookmarks, downloads, media presence, invitations to talks and conferences, awards, prizes, honorary degrees, teaching, whatever. Some of these variables can be counted and you yourself decide if they count. To confuse any of these variables with quality, of course, is childish. To be satisfied with a measurable but abysmally low correlation with any other equally flawed measure is below even the lowest scientific standards and frankly, an embarrassment for the entire scientific community. Without a moment’s hesitation I would fail any undergraduate who comes with a project using statistics only half as bad as those of the IF. But it’s good enough to determine who gets promoted and who doesn’t? Are you kidding me? I personally find the IF more dangerous for the credibility of science as a whole than ten Hwangs. Hwang was an individual, the IF is being used systemically. I personally know individuals who brag about inviting GlamMag editors for dinner to explain to them why their paper needs to be published there. The IF issue has slam-dunk potential for any religious nutcase organization with some funding (think DI).
    What is required is serious publishing reform. Period. The technicalities are solved, it just needs to be implemented. We don’t need the current 24,000 scholarly journals. We don’t even need two. For people who like the current system, there wouldn’t even be much change. On the surface, it would still look pretty much the same, with some people being paid to highlight whatever it is they like for whatever reason. Only that the back-end is actually working according to rational standards and not some make-shift, hand-waving, pre-school arithmetic.

  • It’s quite sad, but maybe not unexpected, that even scientists seem to have a hard time accepting that some things are just very difficult, if not impossible to measure objectively. Scientific quality is just such a thing. Einstein once said that not everything that can be counted counts and not everything that counts can be counted. Scientific quality cannot be counted because it means different things for different people. Quite obviously, if you really like papers in hi IF journals, then definitely your “quality” is very different from my “quality”! We may agree on some papers, but most we would disagree.
    However, there of course are some variables besides quality which may correlate more or less with what you desire in a paper or a candidate. Citations, bookmarks, downloads, media presence, invitations to talks and conferences, awards, prizes, honorary degrees, teaching, whatever. Some of these variables can be counted and you yourself decide if they count. To confuse any of these variables with quality, of course, is childish. To be satisfied with a measurable but abysmally low correlation with any other equally flawed measure is below even the lowest scientific standards and frankly, an embarrassment for the entire scientific community. Without a moment’s hesitation I would fail any undergraduate who comes with a project using statistics only half as bad as those of the IF. But it’s good enough to determine who gets promoted and who doesn’t? Are you kidding me? I personally find the IF more dangerous for the credibility of science as a whole than ten Hwangs. Hwang was an individual, the IF is being used systemically. I personally know individuals who brag about inviting GlamMag editors for dinner to explain to them why their paper needs to be published there. The IF issue has slam-dunk potential for any religious nutcase organization with some funding (think DI).
    What is required is serious publishing reform. Period. The technicalities are solved, it just needs to be implemented. We don’t need the current 24,000 scholarly journals. We don’t even need two. For people who like the current system, there wouldn’t even be much change. On the surface, it would still look pretty much the same, with some people being paid to highlight whatever it is they like for whatever reason. Only that the back-end is actually working according to rational standards and not some make-shift, hand-waving, pre-school arithmetic.