A league table by any means will smell just as rank

ladder
Ladder (Wikipedia)

The University Rankings season is upon us with the QS league table released a few weeks back to much hand wringing here in the UK as many science focussed institutions tumbled downwards. The fact that this was due to a changed emphasis in counting humanities and social sciences rather than any change at the universities themselves was at least noted, although how much this was to excuse the drop rather than engage with the issue is unclear.

At around the same time particle physicists and other “big science” communities were up in arms as the Times Higher ranking, being released this week, announced that it would not count articles with huge numbers of authors. Similar to the change in the QS rankings this would tend to disadvantage institutions heavily invested in big science projects, although here the effect would probably be more the signals being sent to communities than a substantial effect on scores or rankings. In the context of these shifts the decision of Japanese government to apparently shut a large proportion of Humanities and Social Sciences departments so as to focus on “areas for which society has strong needs” is…interesting.

Also interesting was the response of Phil Baty, the editor of the THES Rankings to John Butterworth’s objections on twitter.

The response is interesting because it suggests there is a “right way” to manage the “problem”. The issue of course is rather the other way around. There can be no right way to solve the problem independent of an assessment of what it is you are trying to assess. Is it the contribution of the university to the work? Is it some sense of the influence that accrues to the institution for being associated with the article? Is it the degree to which being involved will assist in gaining additional funding?

This, alongside the shifts up and down the QS rankings, illustrates the fundamental problem of rankings. They assume that what is being ranked is obvious, when it is anything but. No linear ranking can ever capture the multifaceted qualities of thousands of institutions but worse than that the very idea of a ranking is built on the assumption that we know what we’re measuring.

Now you might ask why this matters. Surely these are just indicators, mere inputs into decision making, even just a bit of amusing fun that allows Cambridge to tweak the folks at Imperial this year? But there is a problem. And that problem is that these ranking really show a vacuum at the centre of our planning and decision making processes.

What is clear from the discussion above and the hand wringing over how the rankings shift is that the question of what matters is not being addressed. Rather it is swept under the carpet by assuming there is some conception of “quality” or “excellence” that is universally shared. I’ve often said that for me when I hear the word “quality” it is a red flag that means someone wants to avoid talking about values.

What matters in the production of High Energy Physics papers? What do we care about? Is HEP something that all institutions should do or something that should be focussed on a small number of places? But not just HEP, but genomics, history, sociology…or perhaps chemistry. To ask the question “how do we count physics the same as history” is to make a basic category error. Just as it is to assume that one authorship is the same as another.

If the question is was which articles in a year have the most influence, and which institutions contributed, the answer would be very different to the question of which institutions made the most contribution in aggregate to global research outputs. Rankings ignore these questions and try to muddle through with forced compromises like the ones we’re seeing in the THES case.

All that these rankings show is that the way you choose to value things depends how you would (arbitrarily) rank them. Far more interesting is the question of what the rankings tell us about what we really value, and how hard that is in fact to measure.

Costly signalling in scholarly communications

Male Blue Peacock in Melbourne Zoo, Australia.
Male Blue Peacock in Melbourne Zoo, Australia. (Photo credit: Wikipedia)

For a long time it was difficult for evolutionary biology to make sense of a (male) peacock’s tail. Clearly it is involved in courtship but the investment in growing it, and the disdvantage of carrying it around, would seem to be a disadvantage over all. The burden of the tail might be worth it for a single male if female preferences are fixed

Fisher found a solution to this problem by noting that the genes for large tails in male peacocks would tend to be carried along with the genes for a preference for males with large tails expressed in females. In combination these two traits can cause a run away selection process which could explain the extravagant display in many animals.

Zahavi proposed another solution in which the display is a form of “costly signalling”. The ability to invest in the production of a large tail demonstrates the health or fitness of the animal. For this to work the signalling must be costly and it must be difficult to fake. Coloured plumage in the presence of stalking predators implies speed and agility, large horns (or simply size) a sufficient ability to obtain food.

Hartley and Potts in their book Cultural Science (chapter 3) apply the idea of costly signalling to question of cultural evolution. They suggest that cultures will adopt forms of costly signalling to create within-group trust and cohesion. In turn cultural norms of truth-telling and even traditions of narrative (the assumption of sympathy for the ‘white hat’, the presentation of compromises as ‘necessary’, that even bad acts reveal the underlying goodness of the hero) build community and in extremis send members of that community out to die for it in battle. This is not a facile claim about “group evolution” or how genetic evolution might drive culture but part of a program to understand how culture itself evolves.

One of the challenges of understanding peer review in the scientific community is why we do it at all. It is a part of our culture but it is very hard to demonstrate how and where it contributes value. The humanistic approach to the empirical challenge to value is to respond that it is a cultural norm that defines the scholarly community. Even if peer review achieved nothing it would have value as a means of defining a community, the community that has a cultural dedication to peer review. The “we”, the culture that valuesand engaged with peer review, is defined in terms of its different from the “they” who do not. This form of identification reinforces the analogy both with Fisher (we select those who share culture) and Zahavi (the costly signalling of engaging in peer review is part of the creation of our scholarly culture).

So perhaps another way to look at engaging with peer review is as costly signalling. The purpose of submitting work to peer review is to signal that the underlying content is “honest” in some sense. In the mating dance between researchers and funders or researchers and institutions the peer review process is intended to make the pure signalling of publication and to make it harder to fake. Taking Fisher’s view of mutual selection, authors on one side, funders and instiutions on the other, we can see, at least as analogy, a reason for the run away selection for publishing in prestigious journals. A runaway process where the signalling bares a tenous relationship with the underlying qualities being sought, in the same way as the size of the peacock’s tail has a tenous link with its health and fitness.

But as Martin Eve has argued (Open Access in the Humanities, Chapter 2), we need such signals. The labour of detailed assessment of all research for the full range of desirable qualities is unaffordable. Summaries and signals are needed. The question, perhaps, is whether this costly signalling is as honest as it could be. Is it creating a sustainable culture and community with a solid base? The apparent rise in fraud in retractions, particularly amongst those high prestige publications, suggests that this is a question that should be seriously addressed. To stretch the biological analogy, has a gene for faked tails emerged? Such fake display is not uncommon in biology.

Addressing that question means asking questions about what the underlying qualities we desire are. That’s an important question which I’ve raised elsewhere but I don’t want to go down that route here. I want to explore a different possibility. One that arises from asking whether a different form of signalling might be possible.

Communicating research in a reproducible (or replicable, or generalizable, the semantics are also an issue for another time) fashion is hard work. Many of us have argued that to enable greater reproducibility we need to provide better tools to reduce that cost. But what if the opposite were true? What if the value actually lies precise in the fact that communicating reproducibility is costly but is also potentially a more honest representation of what a community values than publication in a high profile journal.

If you buy that argument then we have a problem. The sexual selection run away is hard to break out of, at least in the case of biological evolution. At some point survivability prevents tails or horns growing so big they overbalance the animal, but by that stage a huge and unnecessary investment has been made. However in the case made by Potts and Hartley the thing that is evolving is more malleable. Perhaps, by creating a story of how the needs of funders and institutions are better served by focussing on a different form of signalling it will be possible to shift.

Of course this does happen in nature as well. When a sub-population develops a different form of display and co-selection kicks off then populations diverge, sometimes to occupy different niches, sometimes to compete, and ultimately displace the original population. It’s one way that new species form.

 

 

Evidence to the European Commission Hearing on Access to Scientific Information

European Commission
Image by tiseb via Flickr

On Monday 30 May I gave evidence at a European Commission hearing on Access to Scientific Information. This is the text that I spoke from. Just to re-inforce my usual disclaimer I was not speaking on behalf of my employer but as an independent researcher.

We live in a world where there is more information available at the tips of our fingers than even existed 10 or 20 years ago. Much of what we use to evaluate research today was built in a world where the underlying data was difficult and expensive to collect. Companies were built, massive data sets collected and curated and our whole edifice of reputation building and assessment grew up based on what was available. As the systems became more sophisticated new measures became incorporated but the fundamental basis of our systems weren’t questioned. Somewhere along the line we forgot that we had never actually been measuring what mattered, just what we could.

Today we can track, measure, and aggregate much more, and much more detailed information. It’s not just that we can ask how much a dataset is being downloaded but that we can ask who is downloading it, academics or school children, and more, we can ask who was the person who wrote the blog post or posted it to Facebook that led to that spike in downloads.

This is technically feasible today. And make no mistake it will happen. And this provides enormous potential benefits. But in my view it should also give us pause. It gives us a real opportunity to ask why it is that we are measuring these things. The richness of the answers available to us means we should spend some time working out what the right questions are.

There are many reasons for evaluating research and researchers. I want to touch on just three. The first is researchers evaluating themselves against their peers. While this is informed by data it will always be highly subjective and vary discipline by discipline. It is worthy of study but not I think something that is subject to policy interventions.

The second area is in attempting to make objective decisions about the distribution of research resources. This is clearly a contentious issue. Formulaic approaches can be made more transparent and less easy to legal attack but are relatively easy to game. A deeper challenge is that by their nature all metrics are backwards looking. They can only report on things that have happened. Indicators are generally lagging (true of most of the measures in wide current use) but what we need are leading indicators. It is likely that human opinion will continue to beat naive metrics in this area for some time.

Finally there is the question of using evidence to design the optimal architecture for the whole research enterprise. Evidence based policy making in research policy has historically been sadly lacking. We have an opportunity to change that through building a strong, transparent, and useful evidence base but only if we simultaneously work to understand the social context of that evidence. How does collecting information change researcher behavior? How are these measures gamed? What outcomes are important? How does all of this differ cross national and disciplinary boundaries, or amongst age groups?

It is my belief, shared with many that will speak today, that open approaches will lead to faster, more efficient, and more cost effective research. Other groups and organizations have concerns around business models, quality assurance, and sustainability of these newer approaches. We don’t need to argue about this in a vacuum. We can collect evidence, debate what the most important measures are, and come to an informed and nuanced inclusion based on real data and real understanding.

To do this we need to take action in a number areas:

1. We need data on evaluation and we need to able to share it.

Research organizations must be encouraged to maintain records of the downstream usage of their published artifacts. Where there is a mandate for data availability this should include mandated public access to data on usage.

The commission and national funders should clearly articulate that that provision of usage data is a key service for publishers of articles, data, and software to provide, and that where a direct payment is made for publication provision for such data should be included. Such data must be technically and legally reusable.

The commission and national funders should support work towards standardizing vocabularies and formats for this data as well critiquing it’s quality and usefulness. This work will necessarily be diverse with disciplinary, national, and object type differences but there is value in coordinating actions. At a recent workshop where funders, service providers, developers and researchers convened we made significant progress towards agreeing routes towards standardization of the vocabularies to describe research outputs.

2. We need to integrate our systems of recognition and attribution into the way the web works through identifying research objects and linking them together in standard ways.

The effectiveness of the web lies in its framework of addressable items connected by links. Researchers have a strong culture of making links and recognizing contributions through attribution and citation of scholarly articles and books but this has only recently being surfaced in a way that consumer web tools can view and use. And practice is patchy and inconsistent for new forms of scholarly output such as data, software and online writing.

The commission should support efforts to open up scholarly bibliography to the mechanics of the web through policy and technical actions. The recent Hargreaves report explicitly notes limitations on text mining and information retrieval as an area where the EU should act to modernize copyright law.

The commission should act to support efforts to develop and gain wide community support for unique identifiers for research outputs, and for researchers. Again these efforts are diverse and it will be community adoption which determines their usefulness but coordination and communication actions will be useful here. Where there is critical mass, such as may be the case for ORCID and DataCite, this crucial cultural infrastructure should merit direct support.

Similarly the commission should support actions to develop standardized expressions of links, through developing citation and linking standards for scholarly material. Again the work of DataCite, CoData, Dryad and other initiatives as well as technical standards development is crucial here.

3. Finally we must closely study the context in which our data collection and indicator assessment develops. Social systems cannot be measured without perturbing them and we can do no good with data or evidence if we do not understand and respect both the systems being measured and the effects of implementing any policy decision.

We need to understand the measures we might develop, what forms of evaluation they are useful for and how change can be effected where appropriate. This will require significant work as well as an appreciation of the close coupling of the whole system.
We have a generational opportunity to make our research infrastructure better through effective evaluation and evidence based policy making and architecture development. But we will squander this opportunity if we either take a utopian view of what might technically feasible, or fail to act for a fear of a dystopian future. The way to approach this is through a careful, timely, transparent and thoughtful approach to understanding ourselves and the system we work within.

The commission should act to ensure that current nascent efforts work efficiently towards delivering the technical, cultural, and legal infrastructure that will support an informed debate through a combination of communication, coordination, and policy actions.

Enhanced by Zemanta

Michael Nielsen, the credit economy, and open science

No credit cards please.......

Michael Nielsen is a good friend as well as being an inspiration to many of us in the Open Science community. I’ve been privileged to watch and in a small way to contribute to the development of his arguments over the years and I found the distillation of these years of effort into the talk that he recently gave at TEDxWaterloo entirely successful. Here is a widely accesible and entertaining talk that really pins down the arguments, the history, the successes and the failures of recent efforts to open up science practice.

Professional scientific credit is the central issue

I’ve been involved in many discussions around why the potential of opening up research practice hasn’t lead to wider adoption of these approaches. The answer is simple, and as Michael says very clearly in the opening section of the talk, the problem is that innovative approaches to doing science are not going to be adopted while those that use them don’t get conventional scientific credit. I therefore have to admit to being somewhat nonplussed by GrrlScientist’s assessment of the talk that “Dr Nielsen has missed — he certainly has not emphasised — the most obvious reason why the Open Science movement will not work: credit.”

For me, the entire talk is about credit. He frames the discussion of why the Qwiki wasn’t a huge success, compared to the Polymath project, in terms of the production of conventional papers, he discusses the transition from Galileo’s anagrams to the development of the scientific journal in terms of ensuring priority and credit. Finally he explicitly asks the non-scientist members of the audience to do something that even more closely speaks to the issue of credit, to ask their scientist friends and family what they are doing to make their results more widely available. Remember this talk is aimed at a wider audience, the TEDxWaterloo attendees and the larger audience for the video online (nearly 6,000 when I wrote this post). What happens when taxpayers start asking their friends, their family, and their legislative representatives how scientific results are being made available? You’d better believe that this has an affect on the credit economy.

Do we just need the celebrities to back us?

Grrl suggests that the answer to pushing the agenda forward is to enlist Nobelists to drive projects in the same way that Tim Gowers pushed the Polymath project. While I can see the logic and there is certainly value in moral support from successful scientists we already have a lot of this. Sulston, Varmus, Michael and Jon Eisen, and indeed Michael himself just to name a few are already pushing this agenda. But moral support and single projects are not enough. What we need to do is hack the underlying credit economy, provide proper citations for data and software, exploit the obsession with impact factors.

The key to success in my view is a pincer movement. First, showing that more (if not always completely) open approaches can outcompete closed approaches on traditional assessment measures, something demonstrated successfully by Galaxy Zoo, the Alzeimers Disease Neuroimaging Initiative, and the Polymath Projects. Secondly changing assessment policy and culture itself, both explicitly by changing the measures by which researchers are ranked, and implicitly by raising the public expectation that research should be open.

The pendulum is swinging and we’re pushing it just about every which-way we can

I guess what really gets my back up is that Grrl sets off with the statement that “Open Science will never work” but then does on to put her finger on exactly the point where we can push to make it work. Professional and public credit is absolutely at the centre of the challenge. Michael’s talk is part of a concerted, even quite carefully coordinated, campaign to tackle this issue at a wide range of levels. Michael’s tour of his talk, funded by the Open Society Institute seeks to raise awareness. My recent focus on research assessment (and a project also funded by OSI) is tackling the same problem from another angle. It is not entirely a coincidence that I’m writing this in a hotel room in Washington DC and it is not at all accidental that I’m very interested in progress towards widely accepted researcher identifiers. The development of Open Research Computation is a deliberate attempt to build a journal that exploits the nature of journal rankings to make software development more highly valued. 

All of these are part of a push to hack, reconfigure, and re-assess the outputs and outcomes that researchers get credit for and the the outputs and outcomes that are valued by tenure committees and grant panels. And from where I stand we’re making enough progress that Grrl’s argument seems a bit tired and outdated. I’m seeing enough examples of people getting credit and reward for being open and simply doing and enabling better science as a result that I’m confident the pendulum is shifting. Would I advise a young scientist that being open will lead to certain glory? No, it’s far from certain, but you need to distinguish yourself from the crowd one way or another and this is one way to do it. It’s still high risk but show me something in a research career that is low risk and I’ll show something that isn’t worth doing.

What can you do?

If you believe that a move towards more open research practice is a good thing then what can you do to make this happen? Well follow what Michael says, give credit to those who share, explicitly acknowledge the support and ideas you get from others. Ask researchers how they go about ensuring that their research is widely available and above all used. The thing is, in the end changing the credit economy itself isn’t enough, we actually have to change the culture that underlies that economy. This is hard but it is done by embedding the issues and assumptions in the everyday discourse about research. “How useable are your research outputs really?” is the question that gets to the heart of the problem. “How easily can people access, re-use, and improve on your research? And how open are you to getting the benefit of other people’s contribution?” are the questions that I hope will become embedded in the assumptions around how we do research. You can make that happen by asking them.

 

Enhanced by Zemanta