A question of trust

I have long being sceptical of the costs and value delivered by our traditional methods of peer review. This is really on two fronts, firstly that the costs, where they have been estimated are extremely high, representing a multi-billion dollar subsidy by governments of the scholarly publishing industry. Secondly the value that is delivered through peer review, the critical analysis of claims, informed opinion on the quality of the experiments, is largely lost. At best it is wrapped up in the final version of the paper. At worst it is simply completely lost to the final end user. A part of this, which the more I think about the more I find bizarre is that the whole process is carried on under a shroud of secrecy. This means that as an end user, as I do not know who the peer reviewers are, and do not necessarily know what  process has been followed or even the basis of the editorial decision to publish. As a result I have no means of assessing the quality of peer review for any given journal, let alone any specific paper.

Those of us who see this as a problem have a responsibility to provide credible and workable alternatives to traditional peer review. So far despite many ideas we haven’t, to be honest, had very much success. Post-publication commenting, open peer review, and Digg like voting mechanisms have been explored but have yet to have any large success in scholarly publishing. PLoS is leading the charge on presenting article level metrics for all of its papers, but these remain papers that have also been through a traditional peer review process. Very little that is both radical with respect to the decision and means of publishing and successful in getting traction amongst scientists has been seen as yet.

Out on the real web it has taken non-academics to demonstrate the truly radical when it comes to publication. Whatever you may think of the accuracy of Wikipedia in your specific area, and I know it has some weaknesses in several of mine, it is the first location that most people find, and the first location that most people look for, when searching for factual information on the web. Roderic Page put up some interesting statistics when he looked this week at the top hits for over 5000 thousand mammal names in Google. Wikipedia took the top spot 48% of the time and was in the top 10 in virtually every case (97%). If you want to place factual information on the web Wikipedia should be your first port of call. Anything else is largely a waste of your time and effort. This doesn’t incidentally mean that other sources are not worthwhile or have a place, but that people need to work with the assumption that people’s first landing point will be Wikipedia.

“But”, I hear you say, “how do we know whether we can trust a given Wikipedia article, or specific statements in it?”

The traditional answer has been to say you need to look in the logs, check the discussion page, and click back the profiles of the people who made specific edits. However this in inaccessible to many people, simply because they do not know how to process the information. Very few universities have an “Effective Use of Wikipedia 101” course. Mostly because very few people would be able to teach it.

So I was very interested in an article on Mashable about marking up and colouring Wikipedia text according to its “trustworthiness”. Andrew Su kindly pointed me in the direction of the group doing the work and their papers and presentations. The system they are using, which can be added to any MediaWiki installation measures two things, how long a specific piece of text has stayed in situ, and who either edited it, or left it in place. People who write long lasting edits get higher status, and this in turn promotes the text that they have “approved” by editing around but not changing.

This to me is very exciting because it provides extra value and information for both users and editors without requiring anyone to do any more work than install a plugin. The editors and writers simply continue working as they have. The user can access an immediate view of the trustworthiness of the article with a high level of granularity, essentially at the level of single statements. And most importantly the editor gets a metric, a number that is consistently calculated across all editors, that they can put on a CV. Editors are peer reviewers, they are doing review, on a constantly evolving and dynamic article that can both change in response to the outside world and also be continuously improved. Not only does the Wikipedia process capture most of the valuable aspects of traditional peer review, it jettisons many of the problems. But without some sort of reward it was always going to be difficult to get professional scientists to be active editors. Trust metrics could provide that reward.

Now there are many questions to ask about the calculation of this “karma” metric, should it be subject biased so we know that highly ranked editors have relevant expertise, or should it be general so as to discourage highly ranked editors from modifying text that is outside of their expertise? What should the mathematics behind it be? It will take time clearly for such metrics to be respected as a scholarly contribution, but equally I can see the ground shifting very rapidly towards a situation where a lack of engagement, a lack of interest in contributing to the publicly accessible store of knowledge, is seen as a serious negative on a CV. However this particular initiative pans out it is to me this is one of the first and most natural replacements for peer review that could be effective within dynamic documents, solving most of the central problems without requiring significant additional work.

I look forward to the day when I see CVs with a Wikipedia Karma Rank on them. If you happen to be applying for a job with me in the future, consider it a worthwhile thing to include.

24 Replies to “A question of trust”

  1. I think there’s an additional way that wikipedia could nurture different forms of community-led peer review, which is to allow user-specific alternate renderings of articles, separate to the main article. Assuming a neutral way could be found to allow users to choose from amongst the alternatives (not an easy proposition) users to vote with their mouses as to which they liked.

    For example, when a wikipedia article reaches a mature level, from a factual content point of view, there is always scope for editors with a good turn of phrase to come along and rewrite it in their own style. Indeed, if there’s a particular subject I’m interested in I personally prefer to read several sources (usually including wikipedia) to find out different takes on. If it’s a complex subject different authors can emphasis aspects which others have poorly elucidated. You could imagine some could gather a following in the same way some bloggers are avidly read.

    I think the idea of Flagged Revisions (http://en.wikipedia.org/wiki/Wikipedia:Flagged_revisions) is a move in this direction but as yet doesn’t go far enough.

  2. I think there’s an additional way that wikipedia could nurture different forms of community-led peer review, which is to allow user-specific alternate renderings of articles, separate to the main article. Assuming a neutral way could be found to allow users to choose from amongst the alternatives (not an easy proposition) users to vote with their mouses as to which they liked.

    For example, when a wikipedia article reaches a mature level, from a factual content point of view, there is always scope for editors with a good turn of phrase to come along and rewrite it in their own style. Indeed, if there’s a particular subject I’m interested in I personally prefer to read several sources (usually including wikipedia) to find out different takes on. If it’s a complex subject different authors can emphasis aspects which others have poorly elucidated. You could imagine some could gather a following in the same way some bloggers are avidly read.

    I think the idea of Flagged Revisions (http://en.wikipedia.org/wiki/Wikipedia:Flagged_revisions) is a move in this direction but as yet doesn’t go far enough.

  3. Dan, interesting points. I can imagine in the future actually filtering something like Wikipedia with its rich and annotated editing history through your own social network to get it in the form you want. A dynamic document doesn’t need to be only dynamic in time but could also be in presentation and style…this requires some more thought I think.

  4. Dan, interesting points. I can imagine in the future actually filtering something like Wikipedia with its rich and annotated editing history through your own social network to get it in the form you want. A dynamic document doesn’t need to be only dynamic in time but could also be in presentation and style…this requires some more thought I think.

  5. A problem I see with this is that, like any statistic, it can be gamed. If a Wikipedia Karma Rank becomes that important, anyone wanting to get a high rank will only edit non-controversial topics, and could just apply stylistic changes rather than anything of substance. The effect might be to drive experienced editors away from controversial topics, but that is exactly where we want them.

  6. A problem I see with this is that, like any statistic, it can be gamed. If a Wikipedia Karma Rank becomes that important, anyone wanting to get a high rank will only edit non-controversial topics, and could just apply stylistic changes rather than anything of substance. The effect might be to drive experienced editors away from controversial topics, but that is exactly where we want them.

  7. Cameron, very elegantly argued :-) OA to peer review, yes, there is no way around it any more, really, for any journal that claims a high rank and others are free to follow suit.
    as regards the mathematics behind it, I suggest not to simply follow click counts, Dan, we know that quantity does not necessarily mean quality ;-)
    and I agree with Bob that the opportunity of gaming on any kind of statistics attracts attention – and the ISI’s IF is of course a case in point, for the gaming and its effects, for the attention this gets by those who join the game, and also for the critical attention it gets that also keeps a lot of people busy arguing about what it is that is being measured and for whose benefit,
    I think Bob’s last point is very interesting, also, even in general: what do experienced editors do best and where would we like to see them show their expertise? which of course links to Cameron’s excellent point: “I can see the ground shifting very rapidly towards a situation where a lack of engagement, a lack of interest in contributing to the publicly accessible store of knowledge, is seen as a serious negative on a CV” – wish this was true for many cultures

  8. Cameron, very elegantly argued :-) OA to peer review, yes, there is no way around it any more, really, for any journal that claims a high rank and others are free to follow suit.
    as regards the mathematics behind it, I suggest not to simply follow click counts, Dan, we know that quantity does not necessarily mean quality ;-)
    and I agree with Bob that the opportunity of gaming on any kind of statistics attracts attention – and the ISI’s IF is of course a case in point, for the gaming and its effects, for the attention this gets by those who join the game, and also for the critical attention it gets that also keeps a lot of people busy arguing about what it is that is being measured and for whose benefit,
    I think Bob’s last point is very interesting, also, even in general: what do experienced editors do best and where would we like to see them show their expertise? which of course links to Cameron’s excellent point: “I can see the ground shifting very rapidly towards a situation where a lack of engagement, a lack of interest in contributing to the publicly accessible store of knowledge, is seen as a serious negative on a CV” – wish this was true for many cultures

  9. The question is whether the psychophysical (what we believe to be true) is equivalent to the physical (what is true). It is demonstrably not.

    Wikipedia/karma-based rating systems may be able to show the usefulness of information (though I doubt it); if mysterian (see McGinn) points of view are correct, however, then those systems will never be able to get us beyond a giant political beauty context.

    If you indeed think ALL is politics, then go for it.

    And, further, if you really want to be a stranger in a strange land, try advancing some sort of anti-karma throught in your future utopia of Wikipedia-land.

    You never have to worry about defining the “freedom of” your “speech,” if and when “speech” itself is bound and tied and defined solely by those choose to listen to it.

  10. The question is whether the psychophysical (what we believe to be true) is equivalent to the physical (what is true). It is demonstrably not.

    Wikipedia/karma-based rating systems may be able to show the usefulness of information (though I doubt it); if mysterian (see McGinn) points of view are correct, however, then those systems will never be able to get us beyond a giant political beauty context.

    If you indeed think ALL is politics, then go for it.

    And, further, if you really want to be a stranger in a strange land, try advancing some sort of anti-karma throught in your future utopia of Wikipedia-land.

    You never have to worry about defining the “freedom of” your “speech,” if and when “speech” itself is bound and tied and defined solely by those choose to listen to it.

  11. You write that “editors are peer reviewers” on Wikipedia, but unfortunately that’s not the case. Anyone can edit any article on Wikipedia. That’s not peer review as I understand it. Peer review means that the article is reviewed by people who know the topic well. On Wikipedia, a group of like-minded people, who are ignorant about a particular topic but have strong feelings about it, can write an article that’s full of false or misleading statements, and it can stay on Wikipedia for a long time unless someone knowledgable happens to come along and volunteer their time to fix it. That knowledgable person may then be in for a pitched battle to get their text accepted by the authors of the original article, who have an interest in defending their poor-quality work. (This happened to me in my academic field.) Wikipedia seems to work pretty well for uncontroversial topics, but until it introduces real peer review, I don’t see how it can possibly be trustworthy on any controversial topic.

  12. You write that “editors are peer reviewers” on Wikipedia, but unfortunately that’s not the case. Anyone can edit any article on Wikipedia. That’s not peer review as I understand it. Peer review means that the article is reviewed by people who know the topic well. On Wikipedia, a group of like-minded people, who are ignorant about a particular topic but have strong feelings about it, can write an article that’s full of false or misleading statements, and it can stay on Wikipedia for a long time unless someone knowledgable happens to come along and volunteer their time to fix it. That knowledgable person may then be in for a pitched battle to get their text accepted by the authors of the original article, who have an interest in defending their poor-quality work. (This happened to me in my academic field.) Wikipedia seems to work pretty well for uncontroversial topics, but until it introduces real peer review, I don’t see how it can possibly be trustworthy on any controversial topic.

  13. Benjamin, the model I proposed would, I believe, be closer to the peer review model because in that circumstance you would have “readers as peer reviewers.”

    A practical way to implement my suggestion on wikipedia could be to include the standard trunk/tags/branches idiom popular in version control system such as Subversion. The trunk corresponds to the current wikipedia model (i.e. one logical page per article), the tags would correspond to the “Flagged revisions” proposal and the branches would implement the “alternative renderings” I talked about. This model works exceptionally well in the open source software world, strongly suggesting it could be successful for writing collaborative documents.

    The peer-review aspect of this would come in at the point where the community decides which of the competing branches could be merged (perhaps wholesale) into the main trunk. The benefit is that no individual contributor can feel their ideas are constantly getting sidelined since they can spawn a branch of their own to try out (and show off) their ideas. This can happily coexist with branches made by others and also fosters diversity in styles of exposition. The hope is that you’ll no longer get the sorts of turf wars that sometimes happen at present where the “early settlers” on a page get the high ground whether they deserve it or not.

    Picking up on Cameron’s comments, I would agree with the idea of re-synthesising a collection of related facts into document more relevant to the reader’s tastes and interests should be the way things should go. However, it’s difficult for an automated system to provide the narrative that connects these facts in a coherent way and/or pitches them at the appropriate level. In the case of wikipedia it’s the human editor that fulfills this role.

  14. Benjamin, the model I proposed would, I believe, be closer to the peer review model because in that circumstance you would have “readers as peer reviewers.”

    A practical way to implement my suggestion on wikipedia could be to include the standard trunk/tags/branches idiom popular in version control system such as Subversion. The trunk corresponds to the current wikipedia model (i.e. one logical page per article), the tags would correspond to the “Flagged revisions” proposal and the branches would implement the “alternative renderings” I talked about. This model works exceptionally well in the open source software world, strongly suggesting it could be successful for writing collaborative documents.

    The peer-review aspect of this would come in at the point where the community decides which of the competing branches could be merged (perhaps wholesale) into the main trunk. The benefit is that no individual contributor can feel their ideas are constantly getting sidelined since they can spawn a branch of their own to try out (and show off) their ideas. This can happily coexist with branches made by others and also fosters diversity in styles of exposition. The hope is that you’ll no longer get the sorts of turf wars that sometimes happen at present where the “early settlers” on a page get the high ground whether they deserve it or not.

    Picking up on Cameron’s comments, I would agree with the idea of re-synthesising a collection of related facts into document more relevant to the reader’s tastes and interests should be the way things should go. However, it’s difficult for an automated system to provide the narrative that connects these facts in a coherent way and/or pitches them at the appropriate level. In the case of wikipedia it’s the human editor that fulfills this role.

  15. looking back on academic journal review processes, so, what’s your take on this example?

    Atmospheric Chemistry and Physics
    “It is a two-stage process involving the scientific discussion forum , and it has been designed to use the full potential of the internet to foster scientific discussion and enable rapid publication of scientific papers.” http://www.atmospheric-chemistry-and-physics.net/review/index.html

    and, same url:
    “Upon internet publication the paper is opened for Interactive Public Discussion during which Referee Comments (anonymous or attributed), Author Comments (on behalf of all co-authors), and Short Comments by any registered member of the scientific community (attributed) are published alongside the discussion paper”
    Chart 1:
    http://www.atmospheric-chemistry-and-physics.net/Copernicus_Publications_Two_Stage_Concept.pdf

  16. looking back on academic journal review processes, so, what’s your take on this example?

    Atmospheric Chemistry and Physics
    “It is a two-stage process involving the scientific discussion forum , and it has been designed to use the full potential of the internet to foster scientific discussion and enable rapid publication of scientific papers.” http://www.atmospheric-chemistry-and-physics.net/review/index.html

    and, same url:
    “Upon internet publication the paper is opened for Interactive Public Discussion during which Referee Comments (anonymous or attributed), Author Comments (on behalf of all co-authors), and Short Comments by any registered member of the scientific community (attributed) are published alongside the discussion paper”
    Chart 1:
    http://www.atmospheric-chemistry-and-physics.net/Copernicus_Publications_Two_Stage_Concept.pdf

  17. uwrite – we’re scientists here so the only concern is whether something is useful (i.e. matches the model and allows prediction of future events). Is science useful?- well you are presumably using a computer…

    Ben – its a question of who you think are peers. I could turn every one of your criticisms (which are all valid) around and apply them to traditional journal peer review. With one addition – you don’t even know who these “peers” are. You get many of the same problems in peer reviewed science, just that these problems are hidden.

    I agree there are issues, and particularly with much on Wikipedia not necessarily being written by the people us “experts” would choose but again I would turn that around – the more people engage fully with the process and don’t just expect our position to be automatically respected the more the content will improve. The people who have done the initial work are invested and have put in an effort and worked within a system – we need to learn and work within that system – and if necessary argue for change.

    But for me the key thing is the recognise that we don’t choose our peers for our own convenience. This is precisely what is wrong with the way we conduct science at the moment in my view.

    Claudia – do they actually have many submissions? There has been talk of taking a similar approach in the EMBO journals, a free for all database with promotion to higher stature in the EMBO journal itself and the Frontiers journals are taking a similar tack. My impression is that mainstream scientists are very wary of “different” peer review processes though. You only need to look at PLoS ONE, which has a very traditional process in practice, to see that.

  18. uwrite – we’re scientists here so the only concern is whether something is useful (i.e. matches the model and allows prediction of future events). Is science useful?- well you are presumably using a computer…

    Ben – its a question of who you think are peers. I could turn every one of your criticisms (which are all valid) around and apply them to traditional journal peer review. With one addition – you don’t even know who these “peers” are. You get many of the same problems in peer reviewed science, just that these problems are hidden.

    I agree there are issues, and particularly with much on Wikipedia not necessarily being written by the people us “experts” would choose but again I would turn that around – the more people engage fully with the process and don’t just expect our position to be automatically respected the more the content will improve. The people who have done the initial work are invested and have put in an effort and worked within a system – we need to learn and work within that system – and if necessary argue for change.

    But for me the key thing is the recognise that we don’t choose our peers for our own convenience. This is precisely what is wrong with the way we conduct science at the moment in my view.

    Claudia – do they actually have many submissions? There has been talk of taking a similar approach in the EMBO journals, a free for all database with promotion to higher stature in the EMBO journal itself and the Frontiers journals are taking a similar tack. My impression is that mainstream scientists are very wary of “different” peer review processes though. You only need to look at PLoS ONE, which has a very traditional process in practice, to see that.

Comments are closed.