The more things change, the more they need to stay the same: The challenges and opportunities of monitoring research cultures and environments

An antique telescope in brass and wood with its cap lying beside it

With the world in its current state, there is a wide interest in both how we preserve what is good in research systems and make necessary improvements. Ensuring sustainability and quality of support in the organisations and communities that make up the research landscape is crucial. But there are also deep problems in our communities and organisations, including structural issues that lead to poor practice and fraud, deep and ongoing inequities and exclusions, and failures in the mentoring and support systems that are needed by the next generations of researchers.

It was against this background that we started a workstream to look at indicators of research culture and environment within the RoRI AGORRA project, culminating in a new Working Paper being published today. AGORRA (standing for A Global Observatory of Responsible Research Assessment) is a project focused on understanding and cataloguing national evaluation systems in different parts of the world. By working closely with the funder representatives on the AGORRA team, we were able to ground our research in a diverse set of national contexts, which shaped the ultimate focus of the paper.

At the beginning I was keen to ensure we try to take a wide perspective on what was meant by “research culture” and “research environment”. One that was inclusive of the differing contexts represented in AGORRA and RoRI more generally. These terms were being used in the UK as part of the REF (which has included aspects of “research environment” previously) with a pilot on “People, Culture and Environment” running in parallel with our workstream. I was also keen to get a perspective on where (and whether) the terms were being used in other settings.

Culture and environment are used to focus on different issues in different contexts

What was clear was that — within contexts of research evaluation or monitoring — the terms “research culture” and “research environment” were used to focus attention on different (but often overlapping) issues in different contexts. The demographic diversity of research staff was a common, but not universal, issue. In some contexts, research practice and integrity were the most significant issues linked with “culture”, but in others these were one of many. “Environment” could refer to the qualities of career structures, the availability of financial or digital resources, or the safety of buildings, alongside the presence of policy instruments. 

We don’t always mean the same things by “environment” and “culture”. They are used in different ways in different contexts and often without a clear definition. Environment tends to refer to structures and organisations, and culture to communities and people, but there were variations. This makes sense; one person’s culture (say that of a senior researcher in a specific organisation) might be experienced by another as part of the environment in which they have little say (e.g. a junior researcher, or a community stakeholder like an interested patient).

What we want to preserve and what we want to change are often in common but not universal

There were many “areas of concern” that were common across contexts, but none were a universal priority. Research practice, demographic diversity and inclusion, resourcing, career structures and opportunities, researcher mobility, wider engagement and open communication were all areas that appeared in multiple contexts. 

When it comes to examining these there are many existing indicators that might be applicable in a monitoring or evaluation context for specific areas of concern. Most of these come from two places. The first is existing reporting requirements for organisations, either imposed by government, or as part of programs such as ATHENA Swan focusing on gender diversity and inclusion. The second source is the monitoring of research outputs. Both sources provide a foundation for developing improved indicators.The concentration of these existing and potential indicators in two areas, and the fact that they are relevant to specific areas, leads to a question: is there a way to put indicators in context and define gaps? We need a framework that will help us relate them to one another. For example, there are many existing indicators which might be useful for examining organisations, but very few of these could currently be applied to communities. The data simply isn’t reported that way.

Ostrom’s IAD framework as a tool to understand relationships and gaps

This aspect of communities and organisations plus the presence of many output-based indicators lead us to Ostrom’s Institutional Analysis and Development framework (IAD). IAD is used to understand how characteristics of a setting (made up of organisations, communities and the formal and informal rules they use) come together in specific interactions (called an “action arena”) to generate outcomes which  through a process of feedback change the organisations, communities and rules that make up the setting.

The IAD framework reminds us that what matters here is how the feedback from monitoring and evaluation processes is received and leads to change, or leads to stabilisation. While we have many indicators of specific areas of concern, we have few if any that help us to understand whether organisations and communities are ready to process feedback productively. Do they have the capacity to engage with it, and if so, make good use of it?

The project both defined many indicators that might be useful in specific contexts but also, as a result, that no single indicator (or even a “basket” of them) can capture the complexity of research cultures and environments. Whether the policy goal is change or stabilisation, the IAD framework shows we need to consider how any indicators we do implement are received, and be ready to adapt, change or abandon them if they lead to adverse responses.

What measurement does to us…

A thermometer showing −17°C.
A thermometer showing −17°C. (Photo credit: Wikipedia)

Over the past week this tweet was doing the rounds. I’m not sure where it comes from or precisely what its original context was, but it appeared in my feed from folks in various student analytics and big data crowds. The message I took was “measurement looks complicated until you pin it down”.

But what I took from this was something a bit different. Once upon a time the idea of temperature was a complex thing. It was subjective, people could reasonably disagree on whether today was hotter or colder than today. Note those differences between types of “cold” and “hot”; damp, dank, frosty, scalding, humid, prickly. This looks funny to us today because we can look at a digital readout and get a number. But what really happened is that our internal conception of what temperature is changed. What has actually happened is that a much richer and nuanced concept has been collapsed onto a single linear scale. To re-create that richness weather forecasters invent things like “wind chill” and “feels like” to capture different nuances but we have in fact lost something, the idea that different people respond differently to the same conditions.

https://twitter.com/SilverVVulpes/status/850439061273804801

Last year I did some work where I looked at the theoretical underpinnings for the meaning we give to referencing and citation indicators in the academy. What I found was something rather similar. Up until the 70s the idea of what made “quality” or “excellence” in research was much more contextual. The development of much better data sources, basically more reliable thermometers, for citation counting led to intense debates about whether this data had any connection to the qualities of research at all, let alone whether anything could be based on those numbers. The conception of research “quality” was much richer, including the idea that different people might have different responses.

In the 1970s and 80s something peculiar happens. This questioning of whether citations can represent the qualities of research disappears, to be replaced by the assumption that it does. A rear-guard action continues to question this, but it is based on the idea that people are doing many different things when they reference, not the idea that counting such things is fundamentally a questionable activity in and of itself. Suddenly citations became a “gold standard”, the linear scale against which everything was measured, and our ideas about the qualities of research became consequently impoverished.

At the same time it is hard to argue that a simple linear scale of defined temperature has created massive advances, we can track global weather against agreed standards, including how it is changing and quantify the effects of climate change. We can calibrate instruments against each other and control conditions in ways that allow everything from the safekeeping of drugs and vaccines to ensuring that our food is cooked to precisely the right degree. Of course on top of that we have to acknowledge that temperature actually isn’t as simple as concept as its made out to be as well. Definitions always break down somewhere.

https://twitter.com/StephenSerjeant/status/851016277992894464

It seems to me that its important to note that these changes in meaning can affect the way we think and talk about things. Quantitative indicators can help us to share findings and analysis, to argue more effectively, most importantly to share claims and evidence in a way which is much more reliably useful. At the same time if we aren’t careful those indicators can change the very things that we think are important. It can change the underlying concept of what we are talking about.

Ludwig Fleck in The Genesis and Development of a Scientific Fact explains this very effectively in terms of the history of the concept of “syphillis”. He explain how our modern conception (a disease with specific symptoms caused by an infection with a specific transmissible agent) would be totally incomprehensible to those who thought of disease in terms of how they were to be treated (in this case being classified as a disease treated with mercury). The concept itself being discussed changes when the words change.

None of this is of course news to people in Science and Technology Studies, history of science, or indeed much of the humanities. But for scientists it often seems to undermine our conception of what we’re doing. It doesn’t need to. But you need to be aware of the problem.

This ramble brought to you in part by a conversation with @dalcashdvinksy and @StephenSergeant