Submission to the European Commission Expert Group on Altmetrics

As part of the broader Open Science agenda of the European Commission an expert group on “altmetrics” has been formed. This group has a remit to consider how indicators of research performance can be used effectively to enhance the strategic goals of the commission and the risks and opportunities that new forms of data pose to the research enterprise. This is my personal submission. 

Next Generation Altmetrics

Submission by Cameron Neylon, Professor of Research Communications, Curtin University

1. Introduction

The European Commission has an ambitious program for Open Science as part of three aspirations, Open Innovation, Open Science, and Open to the World. Key to defining the role of evaluation, and within that the role of metrics, in achieving all these aspirations is a clear understanding of the opportunities and limitations that our new, data-rich, environment creates. I therefore welcome the Commission’s call for evidence and formation of the expert group.

My expertise in this area is based on a long term interest in the intersection between research evaluation and policy implementation, specifically the role that new indicators can play in helping to drive (or hinder) cultural change. I was an author of the Altmetrics Manifesto[1] as well as the first major paper on Article Level Metrics[2]. I have more recently (in my previous role as Advocacy Director at PLOS) been closely engaged in technology and policy development, and wrote the PLOS submission to the HEFCE Metrics enquiry[3]. Since leaving PLOS I have been focusing on developing a research program looking at how technology and incentives combine to effect the culture of research communities. In this context recent work has included the preprint (currently under review) Excellence R Us[4] which has gained significant attention, and two reports for Jisc[5,6], that address related issues of evaluation and culture.

2. Next generation Metrics for open science

A. How do metrics, altmetrics & ‘responsible metrics’ fit within the broader EC vision & agenda for open science?

Delivering on the Commission’s agenda across the research policy platform requires a substantial culture change across a range of stakeholders. The cultures of research communities, and the practices that they support are diverse and often contradictory. It is important to separate the question of evaluation and how indicators support this, how evaluation contributes to the overall incentives that individuals and organisations experience, and what effect changes in incentives have on culture. Thoughtful evaluation, including the application of new and improved indicators can contribute to, but will not, on its own, drive change.

B. What are the key policy opportunities and tensions in this area? What leadership role can the EU play in wider international debates?

There are two opportunities that the current environment offers. First, the Commission can take a progressive leadership position on research evaluation. As the HEFCE Metrics enquiry and many others have concluded, much research evaluation has the tail wagging the dog: available indicators drive targets and therefore behaviour. It is necessary to reframe evaluation around what public research investment is for and how different stakeholder goals can be tensioned and prioritised. The Commission can take a leadership role here. The second opportunity is in using new indicators to articulate the values that underpin the Commission’s policy agenda. In this sense using indicators that provide proxies of the qualities that align with the Open Science agenda can provide a strong signal to research communities, researchers and RPOs that these aspects (collaboration, open access, data accessibility, evidence of re-use) are important to the Commission.

3. Altmetrics: The emerging state of the art

A. How can we best categorise the current landscape for metrics and altmetrics? How is that landscape changing? How robust are various leading altmetrics, and how does their robustness compare to more ‘traditional’ bibliometrics?

The landscape of available indicators is diverse and growing, both in the range of indicators available and the quality of data underpinning them. That said, this increase is from a low base. The current quality and completeness of data underlying indicators, both new and traditional, does not meet basic standards of transparency, completeness or equity. These indicators are neither robust, stable nor reliable. Auditing and critical analysis is largely impossible because data is generally proprietary. On top of this, the analysis of this data to generate indicators is in most cases highly naïve and undertheorized. This can be seen in a literature providing conflicting results on even basic questions of how different indicators correlate with each other. Bibliometrics while more established suffer from many of the same problems. There is greater methodological rigour within the bibliometrics research community but much of the use of this data are by users without this experience and expertise.

B. What new problems and pitfalls might arise from their usage?

The primary risk in the use of all such poorly applied indicators and metrics is that individuals and organizations refocus their efforts on performing against metrics instead of delivering on the qualities of research that the policy agenda envisions. Lack of disciplinary and output-type coverage is a serious issue for representation, particularly across the arts and humanities as noted in the HEFCE Metrics report.

C. What are some key conclusions and unanswered questions from the fast-growing literature in this area?

With some outstanding exceptions the literature on new indicators is methodologically weak and under-theorized. In particular, there is virtually no work looking at the evolution of indicator signals over time. There is a fundamental failure to understand these indicators as a signal of underlying processes. As a result there is a tendency to seek indicators that match particular qualities (e.g. “influence”) rather than understand how a particular process (e.g. effective communication to industry) leads to specific signals. Core to this failure is the lack of a framework for defining how differing indicators can contribute to answering a strategic evaluative question, and a tendency to create facile mathematical constructs of available data and defining them as a notionally desired quality.

4. Data infrastructure and standards

I refer the expert group to the conclusions of the HEFCE Metrics report, the PLOS submission to that enquiry[3] and to my report to Jisc[6] particularly on the issues of access to open citations data. Robust, trusted and responsible metrics require an open and transparent data infrastructure, with robust and critical data quality processes, alongside open processes subjected to full scholarly critical analysis.

The Commission has the capacity and resources to lead infrastructure development, including in data and technology as well as social infrastructures such as standards. My broad recommendation is that the Commission treat administrative and process data with the same expectations of openness, quality assurance, re-usablity and critical analysis as the research data that it funds. The principles of Open Access, transparency, and accountability all apply. As with research data, privacy and other issues arise and I commend the Commission’s position that data should be “as open as possible, as closed as necessary”.

5. Cultures of counting: metrics, ethics and research

A. How are new metrics changing research cultures (in both positive and negative ways)? What are the implications of different metrics and indicators for equality and diversity?

The question of diversity has been covered in the PLOS submission to the HEFCE Enquiry[3]. Indicators and robust analysis can both be used to test for issues of diversity but can also create issues for diversity. These issues are also covered in detail in our recent preprint4. Culture has been changing towards a more rigid, homogeneous and performative stance. This is dangerous and counter to the policy goals of the Commission. It will only be addressed by developing a strong culture of critical evaluation supported by indicators.

B. What new dynamics of gaming and strategic response are being incentivized?

Gaming is a complex issue. On one side there is “cheating”, on the other an adjustment of practice towards policy goals (e.g. wider engagement with users of research through social media). New indicators are arguably more robust to trivial gaming than traditional single data-source metrics. Nonetheless we need to develop institutional design approaches that promote “strategic responses” in the desired direction, not facile performance against quantitative targets.

6. Next generation metrics: The way forward

A. Can we identify emerging best practices in this area? What recommendations might we as a group make, and to which actors in EU research systems?

There are structural reasons why it is difficult to identify specific examples of best practice. I take the thoughtful use of data and derived indicators to support strategic decision making against clearly defined goals and values as the ideal. The transparency and audit requirements of large scale evaluations make this difficult. Smaller scale evaluation that is not subject to external pressures is most likely to follow this path. Amongst the large scale efforts that best exemplify efforts to reach these goals is the UK REF, where the question of what “excellence” is to be determined is addressed with some specificity and in Impact Narratives where data was used to support a narrative claim against defined evaluation criteria.

Overall we need to develop a strong culture of evaluation.

  • The Commission can support this directly through actions that provide public and open data sources for administrative and activity data and through adopting critical evaluative processes internally. The Commission can also act to lead and encourage adoption of similar practice across European Funding Organisations, including through work with Science Europe.
  • Institutions and funders can support development of stronger critical evaluation processes (including that of evaluating those processes themselves) by implementing developing best practice as it is identified and by supporting the development of expertise, including new research, within their communities.
  • Scholarly Societies can play a strong role in articulating the distinctive nature of their communities’ work and what classes of indicators may or may not be appropriate in assessment of that. They are also valuable potential drivers of the narratives that can support culture change
  • Researchers can player a greater role by being supported to consider evaluation as part of the design of research programs. Developing a critical capacity for determining how to assess a program (as opposed to developing the skills required to defend it all costs) would be valuable.
  • Publics can be engaged to define some of the aspects of what matters to them in the conduct and outcomes of research and how performance against those measures might be demonstrated and critically assessed to their satisfaction.

References

  1. Priem et al (2010), altmetrics: a manifesto, http://altmetrics.org
  2. Wu and Neylon (2009), Article Level Metrics and the Evolution of Scientific Impact, PLOS Biology, http://dx.doi.org/10.1371/journal.pbio.1000242
  3. PLOS (2013), PLOS Submission to the HEFCE RFI on Metrics in Research Assessment, http://dx.doi.org/10.6084/m9.figshare.1089555
  4. Moore et al (2016): Excellence R Us: University Research and the Fetishisation of Excellence. https://dx.doi.org/10.6084/m9.figshare.3413821.v1
  5. Neylon, Cameron (2016) Jisc Briefing Document on Data Citations, http://repository.jisc.ac.uk/id/eprint/6399
  6. Neylon, Cameron (2016) Open Citations and Responsible Metrics, http://repository.jisc.ac.uk/id/eprint/6377

Tracking research into practice: Are nurses on twitter a good case study?

The holy grail of research assessment is a means of automatically tracking the way research changes the way practitioners act in the real world. How does new research influence policy? Where has research been applied by start-ups? And have new findings changed the way medical practitioners treat patients? Tracking this kind of research impact is hard for a variety of reasons: practitioners don’t (generally) write new research papers citing the work they’ve used; even if they did their work is often several steps removed from the original research making the links harder to identify; and finally researchers themselves are often too removed from the application of the research to be aware of it. Where studies of downstream impact have been done they are generally carefully selected case studies, generating a narrative description. These case studies can be incredibly expensive, and by their nature are unlikely to uncover unexpected applications of research.

In recent talks I have used a specific example of a research article reaching a practitioner community. This is a paper that I discovered will search through the output of the University of Cape Town on Euan Adie‘s Altmetric.com service. The paper deals with domestic violence, HIV status and rape. These are critical social issues and new insights have a real potential to improve people’s lives, particularly in the area of the study. The paper was tweeted by a number of accounts but in particularly by @Shukumisa and @SonkeTogether two support and adovcacy organisations in South Africa. Shukumisa in particular tweeted in response to another account “@lizieloots a really important study, we have linked to it on our site”. This is a single example but it illustrates how it is possible to at least identify where research is being discussed within practitioner and community spaces.

But can we go further? More recently I’ve shown some other examples of heavily tweeted papers that relate to work funded by cancer charities. In one of those talks I made the throw away comment “You’ve always struggled to see whether practitioners actually use your research…and there are a lot of nurses on Twitter”. I hadn’t really followed that up until yesterday when I asked on twitter about research into the use of social media by nurses and was rapidly put in touch with a range of experts on the subject (remind me, how did we ask speculative research questions before Twitter?) . So the question I’m interested in probing is whether the application of research by nurses is something that can be tracked using links shared on Twitter as a proxy?

The is interesting from a range of perspectives. To what extent do practicing nurses who use social media share links to web content that informs their professional practice. How does this mirror the parallel link sharing activity by academic researchers? Are nurses referring to primary research content, or is this information mediated through other sources? Do such other sources link back to the primary research? Can those links be traced automatically?  And a host of other questions around how professional practice is changing with the greater availability of these primary and secondary resources.

My hypothesis is as follows: Links shared by nurse practitioners and their online community are a viable proxy of (some portion of) the impact that research has in clinical practice. The extent to which links are shared by nurses on Twitter, perhaps combined with sentiment analysis,  could serve as a measure of the impact of research targeted at the professional practice of nurses.

Thoughts? Criticisms?

(S)low impact research and the importance of open in maximising re-use

Open
Image by tribalicious via Flickr

This is an edited version of the text that I spoke from at the Altmetrics Workshop in Koblenz in June. There is also an audio recording of the talk I gave available as well as the submitted abstract for the workshop.

I developed an interest in research evaluation as an advocate of open research process. It is clear that researchers are not going to change themselves so someone is going to have to change them and it is funders who wield the biggest stick. The only question, I thought,  was how to persuade them to use it

Of course it’s not that simple. It turns out that funders are highly constrained as well. They can lead from the front but not too far out in front if they want to retain the confidence of their community. And the actual decision making processes remain dominated by senior researchers. Successful senior researchers with little interest in rocking the boat too much.

The thing you realize as you dig deeper into this as that the key lies in finding motivations that work across the interests of different stakeholders. The challenge lies in finding the shared objectives. What it is that unites both researchers and funders, as well as government and the wider community. So what can we find that is shared?

I’d like to suggest that one answer to that is Impact. The research community as a whole has stake in convincing government that research funding is well invested. Government also has a stake in understanding how to maximize the return on its investment. Researchers do want to make a difference, even if that difference is a long way off. You need a scattergun approach to get the big results, but that means supporting a diverse range of research in the knowledge that some of it will go nowhere but some of it will pay off.

Impact has a bad name but if we step aside from the gut reactions and look at what we actually want out of research then we start to see a need to raise some challenging questions. What is research for?  What is its role in our society really? What outcomes would we like to see from it, and over what timeframes? What would we want to evaluate those outcomes against? Economic impact yes, as well as social, health, policy, and environmental impact. This is called the ‘triple bottom line’ in Australia. But alongside these there is also research impact.

All these have something in common. Re-use. What we mean by impact is re-use. Re-use in industry, re-use in public health and education, re-use in policy development and enactment, and re-use in research.

And this frame brings some interesting possibilities. We can measure some types of re-use. Citation, retweets, re-use of data or materials, or methods or software. We can think about gathering evidence of other types of re-use, and of improving the systems that acknowledge re-use. If we can expand the culture of citation and linking to new objects and new forms of re-use, particularly for objects on the web, where there is some good low hanging fruit, then we can gather a much stronger and more comprehensive evidence base to support all sorts of decision making.

There are also problems and challenges. The same ones that any social metrics bring. Concentration and community effects, the Matthew effect of the rich getting richer. We need to understand these feedback effects much better and I am very glad there are significant projects addressing this.

But there is also something more compelling for me in this view. It let’s us reframe the debate around basic research. The argument goes we need basic research to support future breakthroughs. We know neither what we will need nor where it will come from. But we know that its very hard to predict – that’s why we support curiosity driven research as an important part of the portfolio of projects. Yet the dissemination of this investment in the future is amongst the weakest in our research portfolio. At best a few papers are released then hidden in journals that most of the world has no access to and in many cases without the data, or other products either being indexed or even made available. And this lack of effective dissemination is often because the work is perceived as low, or perhaps better, slow impact.

We may not be able to demonstrate or to measure significant re-use of the outputs of this research for many years. But what we can do is focus on optimizing the capacity, the potential, for future exploitation. Where we can’t demonstrate re-use and impact we should demand that researchers demonstrate that they have optimized their outputs to enable future re-use and impact.

And this brings me full circle. My belief is that the way to ensure the best opportunities for downstream re-use, over all timeframes, is that the research outputs are open, in the Budapest Declaration sense. But we don’t have to take my word for it, we can gather evidence. Making everything naively open will not always be the best answer, but we need to understand where that is and how best to deal with it. We need to gather evidence of re-use over time to understand how to optimize our outputs to maximize their impact.

But if we choose to value re-use, to value the downstream impact that our research or have, or could have, then we can make this debate not about politics or ideology but how about how best to take the public investment in research and to invest it for the outcomes that we need as a society.

 

 

 

 

Enhanced by Zemanta

Beyond the Impact Factor: Building a community for more diverse measurement of research

An old measuring tape
Image via Wikipedia

I know I’ve been a bit quiet for a few weeks. Mainly I’ve been away for work and having a brief holiday so it is good to be plunging back into things with some good news. I am very happy to report that the Open Society Institute has agreed to fund the proposal that was built up in response to my initial suggestion a month or so ago.

OSI, which many will know as one of the major players in bringing the Open Access movement to its current position, will fund a workshop that will identify both potential areas where the measurement and aggregation of research outputs can be improved as well as barriers to achieving these improvements. This will be immediately followed by a concentrated development workshop (or hackfest) that will aim to deliver prototype examples that show what is possible. The funding also includes further development effort to take one or two of these prototypes and develop them to proof of principle stage, ideally with the aim of deploying these into real working environments where they might be useful.

The workshop structure will be developed by the participants over the 6 weeks leading up to the date itself. I aim to set that date in the next week or so, but the likelihood is early to mid-March. The workshop will be in southern England, with the venue to be again worked out over the next week or so.

There is a lot to pull together here and I will be aiming to contact everyone who has expressed an interest over the next few weeks to start talking about the details. In the meantime I’d like to thank everyone who has contributed to the effort thus far. In particular I’d like to thank Melissa Hagemann and Janet Haven at OSI and Gunner from Aspiration who have been a great help in focusing and optimizing the proposal. Too many people contributed to the proposal itself to name them all (and you can check out the GoogleDoc history if you want to pull apart their precise contributions) but I do want to thank Heather Piwowar and David Shotton in particular for their contributions.

Finally, the success of the proposal, and in particular the community response around it has made me much more confident that some of the dreams we have for using the web to support research are becoming a reality. The details I will leave for another post but what I found fascinating is how far the network of people spread who could be contacted, essentially through a single blog post. I’ve contacted a few people directly but most have become involved through the network of contacts that spread from the original post. The network, and the tools, are effective enough that a community can be built up rapidly around an idea from a much larger and more diffuse collection of people. The challenge of this workshop and the wider project is to see how we can make that aggregated community into a self sustaining conversation that produces useful outputs over the longer term.

It’s a complete co-incidence that Michael Nielsen posted a piece in the past few hours that forms a great document for framing the discussion. I’ll be aiming to write something in response soon but in the meantime follow the top link below.

Enhanced by Zemanta

Metrics and Money

Crane Paper Company in Dalton produces the pap...
Image via Wikipedia

David Crotty, over at Scholarly Kitchen has an interesting piece on metrics, arguing that many of these have not been thought through because they don’t provide concrete motivation to researchers to care about them. Really he’s focused mainly on exchange mechanisms, means of persuading people that doing high quality review is worth their while by giving them something in exchange, but the argument extends to all sorts of metrics. Why would you care about any given measure if achieving on it doesn’t translate into more resources, time, or glory?

You might expect me to disagree with a lot of this but for the most part I don’t. Any credible metric has to be real, it has to mean something. It has to matter. This is why connecting funders, technologists, data holders, and yes, even publishers, is at the core of the proposal I’m working with at the moment. We need funders to want to have access to data and to want to reward performance on those measures. If there’s money involved then researchers will follow.

Any time someone talks about a “system” using a language of currency there is a key question you have to ask; can this “value” can be translated into real money. If it can’t then it is unlikely people will take it seriously. Currency has to be credible, it has to be taken seriously or it doesn’t work. How much is the cash in your pocket actually worth? Cash has to embody transferable value and many of these schemes don’t provide anything more than a basic barter.

But equally the measures of value, or of cost have to be real. Confidence in the reality of community measures is crucial, and this is where I part company with David, because at the centre of his argument is what seems to me a massive hole.

“The Impact Factor, flawed though it may be, at least tries to measure something that directly affects career advancement–the quality and impact of one’s research results.  It’s relevant because it has direct meaning toward determining the two keys to building a scientific career, jobs and funding.”

The second half of this I agree with (but resent). But it depends absolutely on the first part being widely believed. And the first part simply isn’t true. The Thomson Reuters Journal Impact Factor does not try to to measure the quality and impact of individual research results. TR are shouting this from the treetops at the moment. We know that it is at best an extremely poor measure of individual performance in practice. In economic terms, our dependence on the JIF is a bubble. And bubbles burst.

The reason people are working on metrics is because they figure that replacing one rotten measure at the heart of the system with ones that are self-evidently technically superior should be easy. Of course this isn’t true. Changing culture, particularly reward culture is very difficult. You have to tackle the self-reinforcement that these measures thrive on – and you need to work very carefully to allow the bubble to deflate in a controlled fashion.

There is one further point where I disagree with David. He asks a rhetorical question:

“Should a university deny tenure to a researcher who is a poor peer reviewer, even if he brings in millions of dollars in grants each year and does groundbreaking research?  Should the NIH offer to fund poorly designed research proposals simply because the applicant is well-liked and does a good job interpreting the work of others?”

It’s interesting that David even asks these questions, because the answers seem obvious, self evident even. The answer, at least the answer to the underlying question, is in both cases yes. The ultimate funders should fund people who excel at review even if they are poor at other parts of the enterprise. The work of review must be valued or it simply won’t be done. I have heard heads of university departments tell researchers to do less reviewing and write more grants. And I can tell you in the current UK research funding climate that review is well off the top of my priority list. If there is no support for reviewing then in the current economic climate we will see less of it done; if there is no space in our community for people who excel at reviewing then who will teach it? Or do we continue the pernicious myth that the successful scientist is a master of all of their trades? Aside from anything else basic economics tells us that specialisation leads to efficiency gains, even when one of the specialists is a superior practitioner in both areas. Shouldn’t we be seeking those efficiency gains?

Because the real question is not whether reviewers should be funded, by someone in some form, but what the relative size of that investment should be in a balanced portfolio that covers all the contributions needed across the scientific enterprise. The question is how we balance, these activities. How do we tension them? And the answer to that is that we need a market. And to have a functioning market we need a functioning currency. That currency may just be money, but reputation can be converted, albeit not directly, into funds. Successfully hacking research reputation will make a big difference to more effective tensioning between different important and valid scientific research roles, and that’s why people are so interested in trying to do it.

Enhanced by Zemanta