Home » Blog

Metrics of use: How to align researcher incentives with outcomes

9 June 2010 739 views 23 Comments
slices of carrot
Image via Wikipedia

It has become reflexive in the Open Communities to talk about a need for “cultural change”. The obvious next step becomes to find strong and widely respected advocates of change, to evangelise to young researchers, and to hope for change to follow. Inevitably this process is slow, perhaps so slow as to be ineffective. So beyond the grassroots evangelism we move towards policy change as a top down mechanism for driving improved behaviour. If funders demand that data be open, that papers be accessible to the wider community, as a condition of funding then this will happen. The NIH mandate and the work of the Wellcome Trust on Open Access show that this can work, and indeed that mandates in some form are necessary to raise levels of compliance to acceptable levels.

But policy is a blunt instrument, and researchers being who they are don’t like to be pushed around. Passive aggressive responses from researchers are relatively ineffectual in the peer reviewed articles space. A paper is a paper. If its under the right licence then things will probably be ok and a specific licence is easy to mandate. Data though is a different fish. It is very easy to comply with a data availability mandate but provide that data in a form which is totally useless. Indeed it is rather hard work to provide it in a form that is useful. Data, software, reagents and materials, are incredibly diverse and it is difficult to make good policy that can be both effective and specific enough, as well as general enough to be useful. So beyond the policy mandate stick, which will only ever provide a minimum level of compliance, how do we motivate researchers to putting the effort into making their outputs available in a useful form? How do we encourage them to want to do the right thing? After all what we want to enable is re-use.

We need more sophisticated motivators than blunt policy instruments, so we arrive at metrics. Measuring the ouputs of researchers. There has been a wonderful animation illustrating a Daniel Pink talk doing the rounds in the past week. Well worth a look and important stuff but I think a naive application of it to researchers’ motivations would miss two important aspects. Firstly, money is never “off the table” in research. We are always to some extent limited by resources. Secondly the intrinsic motivators, the internal metrics that matter to researchers, are tightly tied to the metrics that are valued by their communities. In turn those metrics are tightly tied to resource allocation. Most researchers value their papers, the places they are published and the citations received, as measures of their value, because that’s what their community values. The system is highly leveraged towards rapid change, if and only if a research community starts to value a different set of metrics.

What might the metrics we would like to see look like? I would suggest that they should focus on what we want to see happen. We want return on the public investment, we want value for money, but above all we want to maximise the opportunity for research outputs to be used and to be useful. We want to optimise the usability and re-usability of research outputs and we want to encourage researchers to do that optimisation. Thus if our metrics are metrics of use we can drive behaviour in the right direction.

If we optimise for re-use then we automatically value access, and we automatically value the right licensing arrangements (or lack thereof). If we value and measure use then we optimise for the release of data in useful forms and for the release of open source research software. If we optimise for re-use, for discoverability, and for value add, then we can automatically tension the loss of access inherent in publishing in Nature or Science vs the enhanced discoverability and editorial contribution and put a real value on these aspects. We would stop arguing about whether tenure committees should value blogging and start asking how much those blogs were used by others to provide outreach, education, and research outcomes.

For this to work there would need to be mechanisms that automatically credit the use of a much wider range of outputs. We would need to cite software and data, would need to acknowledge the providers of metadata that enabled our search terms to find the right thing, and we would need to aggregate this information in a credible and transparent way. This is technically challenging, and technically interesting, but do-able. Many of the pieces are in place, and many of the community norms around giving credit and appropriate citation are in place, we’re just not too sure how to do it in many cases.

Equally this is a step back towards what the mother of all metrics, the Impact Factor was originally about. The IF was intended as a way of measuring the use of journals through counting citations, as a means of helping librarians to choose which journals to subscribe to. Article Level Metrics are in many ways the obvious return to this where we want to measure the outputs of specific researchers. The H-factor for all its weaknesses is a measure of re-use of outputs through formal citations. Influence and impact are already an important motivator at the policy level. Measuring use is actually a quite natural way to proceed. If we can get it right it might also provide the motivation we want to align researcher interests with the wider community and optimise access to research for both researchers and the public.

Reblog this post [with Zemanta]

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

  • Pingback: Tweets that mention Science in the Open » Blog Archive » Metrics of use: How to align researcher incentives with outcomes -- Topsy.com

  • Pingback: Next-Generation Data Availability | DataBraid Blog

  • Mk630

    Very nice post. Thought you might interested in this, in case you hadn't seen it yet: http://www.biomedcentral.com/1756-0500/2/113

  • Cameron Neylon

    Nat, I think that the two big success stories for open data, astronomy and biological sequence data, are special cases. Both depend (or at least depended) on very high cost infrastructure at the hundreds of millions level. Public funders have an absolute stranglehold on policy in these situations, at least where the data is of a highly consistent and shareable form natively.

    So in the DNA sequencing case a small set of funders, Wellcome and NIH mainly, along with a few strong community members (Sulston, Collins) were able to almost unilaterally say the data will be open because a) it isn’t so hard to make it available and b) if you don’t you won’t get your sequencing centre renewal. In the astronomy case you have image data and the need for new telescope instrumentation playing a similar role.

    Now in another area where the infrastructure is expensive but the experiments are very different, X-ray synchrotrons and neutron sources, despite the fact that the infrastructure is expensive the data are very diverse so sharing hasn’t had the same impetus. In the case where data are consistent, protein x-ray crystallography there has been a push for sharing, first through the PDB, and later for raw(er) data.

    So I think astronomy (and DNA) are a special case. The lesson we learn IMO is that where funders have real centralised power and researchers need ongoing investment in infrastructure, and the data is easy (enough) to share then much faster progress is made. Where the data are very diverse and there are a wider diversity of funders it is harder to get coordinated action. Hence my argument that we need to motivate action towards coordination, rather than mandate it from the top.

    This comment was originally posted on O’Reilly Radar Insight, analysis, and research about emergin…

  • http://friendfeed.com/cameronneylon Cameron Neylon

    Nat Torkington asks an interesting question about this at http://radar.oreilly.com/2010/06/four-short-links-11-june-2010.html I’ve got a response in the queue which I will try to remember to post here when it goes live.

    This comment was originally posted on FriendFeed

  • http://friendfeed.com/cavlec D0r0th34

    My answer is that scientific practices are not a monolith. There are VERY different incentives in astronomy, where data is expensive and rare, than in chemistry, where data is plentiful.

    This comment was originally posted on FriendFeed

  • http://friendfeed.com/cameronneylon Cameron Neylon

    My comment at Radar: "Nat, I think that the two big success stories for open data, astronomy and biological sequence data, are special cases. Both depend (or at least depended) on very high cost infrastructure at the hundreds of millions level. Public funders have an absolute stranglehold on policy in these situations, at least where the data is of a highly consistent and shareable form natively. So in the DNA sequencing case a small set of funders, Wellcome and NIH mainly, along with a few strong community members (Sulston, Collins) were able to almost unilaterally say the data will be open because a) it isn’t so hard to make it available and b) if you don’t you won’t get your sequencing centre renewal. In the astronomy case you have image data and the need for new telescope instrumentation playing a similar role. Now in another area where the infrastructure is expensive but the experiments are very different, X-ray synchrotrons and neutron sources, despite the fact that the infrastructure is expensive the data are very…

    This comment was originally posted on FriendFeed

  • http://friendfeed.com/cavlec D0r0th34

    bravo.

    This comment was originally posted on FriendFeed

  • http://friendfeed.com/webmaven Michael R. Bernstein

    Very well put.

    This comment was originally posted on FriendFeed

  • http://friendfeed.com/researchremix Heather Piwowar

    fyi some data from gene expression microarray data, which I think is less of a special case. There aren’t specific funder requirements around it. Many journals require sharing, but not all. In that environment, I estimate that about 25% of published papers that create data share it in centralized repositories. Haven’t broken it down into voluntary/mandate-compliant estimates (though that would be a good idea now that I think of it!), but the high-level attributes most associated with sharing frequency were a) whether the authors had shared/reused before (increased likelihood), and b) was the data about humans or cancer (decreased likelihood). Of course these attributes are correlated with others (cancer papers more likely to be published in medical journals, medical journals less likely to mandate data sharing, etc….)

    This comment was originally posted on FriendFeed

  • http://friendfeed.com/researchremix Heather Piwowar

    Anyone from Stanford? I have a finding that I think could be informative in determining best-practices that work. Papers out of Stanford share their gene expression microarray data MUCH more often than others. Is this because Stanford has the SMD repository and thus it is easier for them to share data? Or because they are institutionally mandated to do so? Or because there is a culture in that dept to do so? Or because there are incentives of some type? Lessons to be learned and applied elsewhere….

    This comment was originally posted on FriendFeed

  • http://friendfeed.com/cavlec D0r0th34

    I wonder about competition-by-equipment sometimes. If your lab can buy a Dooziwhatsis Data Producer 5000 and most labs can’t, you have a competitive advantage, and you’ll hug the produced data to your chest and not let anybody else get hold of it. Once everybody has a Dooziwhatsis, attention shifts from the data to what can be *learned* from the data, and openness becomes somewhat more likely. (Is sequencing perhaps an example of this?)

    This comment was originally posted on FriendFeed

  • http://friendfeed.com/cavlec D0r0th34

    I wonder about competition-by-equipment sometimes. If your lab can buy a Dooziwhatsis Data Producer 5000 and most labs can’t, you have a competitive advantage, and you’ll hug the produced data to your chest and not let anybody else get hold of it. Once everybody has a Dooziwhatsis, attention shifts from producing the data to what can be *learned* from the data, and openness becomes somewhat more likely. (Is sequencing perhaps an example of this?)

    This comment was originally posted on FriendFeed

  • http://friendfeed.com/cameronneylon Cameron Neylon

    Possibly but there is also the contrary example, once equipment becomes stupendously expensive then your granting agency will probably demand data availability as a condition of letting you have one. Also early on you will be able to publish "just the data" much more easily. See for example early genome papers and how they evolve over time to become much more about what the data reveals (at least from the titles). My guess would be that genome sequencing is a very good example of all of these phases – starting from the horrendously expensive to the fairly affordable. I guess the availability of commercial services also plays a role (I paid to get this data so I’m going to milk it…?)

    This comment was originally posted on FriendFeed

  • http://friendfeed.com/cavlec D0r0th34

    thank you for clarifying my off-the-cuff thoughts!

    This comment was originally posted on FriendFeed

  • http://friendfeed.com/cameronneylon Cameron Neylon

    Hah, your off the cuff thoughts have driven more interesting thinking than most of my carefully thought out nonsense…

    This comment was originally posted on FriendFeed

  • Marius Kempe

    Thought you might be interested in this short article from evolutionary biology, along very similar lines to what you've written.

    http://www.nature.com/nature/journal/v441/n7093

  • http://blog.dannynavarro.net Danny Navarro

    Granting agencies have to reward more data generators. It seems all the glory goes to the people who use the data to publish (who in many cases don't credit the data generators properly).

    In my field, proteomics, sharing data is not only more work for the researcher but it also gives advantage to a competitor who might get it out before you and will get the next grant in the field instead of you.

    I don't think evangelization will work. Granting agencies should start looking into something esle that is not publications per se.

    For example, nobody cares about making programs maintainable or fast because you get the same kind of publication with a crappy program than with a good one, as long as both have the same function.

  • http://cameronneylon.net Cameron Neylon

    Danny, you won't get any argument from me on that. I think evangelization has a role because it helps to raise the profile of the issues but on its own it is not enough. Granting agencies have to start actually caring about optimising output in the general sense – and my feeling is that that will require as you say looking at a much wider range of outputs.