Costly signalling in scholarly communications

Male Blue Peacock in Melbourne Zoo, Australia.
Male Blue Peacock in Melbourne Zoo, Australia. (Photo credit: Wikipedia)

For a long time it was difficult for evolutionary biology to make sense of a (male) peacock’s tail. Clearly it is involved in courtship but the investment in growing it, and the disdvantage of carrying it around, would seem to be a disadvantage over all. The burden of the tail might be worth it for a single male if female preferences are fixed

Fisher found a solution to this problem by noting that the genes for large tails in male peacocks would tend to be carried along with the genes for a preference for males with large tails expressed in females. In combination these two traits can cause a run away selection process which could explain the extravagant display in many animals.

Zahavi proposed another solution in which the display is a form of “costly signalling”. The ability to invest in the production of a large tail demonstrates the health or fitness of the animal. For this to work the signalling must be costly and it must be difficult to fake. Coloured plumage in the presence of stalking predators implies speed and agility, large horns (or simply size) a sufficient ability to obtain food.

Hartley and Potts in their book Cultural Science (chapter 3) apply the idea of costly signalling to question of cultural evolution. They suggest that cultures will adopt forms of costly signalling to create within-group trust and cohesion. In turn cultural norms of truth-telling and even traditions of narrative (the assumption of sympathy for the ‘white hat’, the presentation of compromises as ‘necessary’, that even bad acts reveal the underlying goodness of the hero) build community and in extremis send members of that community out to die for it in battle. This is not a facile claim about “group evolution” or how genetic evolution might drive culture but part of a program to understand how culture itself evolves.

One of the challenges of understanding peer review in the scientific community is why we do it at all. It is a part of our culture but it is very hard to demonstrate how and where it contributes value. The humanistic approach to the empirical challenge to value is to respond that it is a cultural norm that defines the scholarly community. Even if peer review achieved nothing it would have value as a means of defining a community, the community that has a cultural dedication to peer review. The “we”, the culture that valuesand engaged with peer review, is defined in terms of its different from the “they” who do not. This form of identification reinforces the analogy both with Fisher (we select those who share culture) and Zahavi (the costly signalling of engaging in peer review is part of the creation of our scholarly culture).

So perhaps another way to look at engaging with peer review is as costly signalling. The purpose of submitting work to peer review is to signal that the underlying content is “honest” in some sense. In the mating dance between researchers and funders or researchers and institutions the peer review process is intended to make the pure signalling of publication and to make it harder to fake. Taking Fisher’s view of mutual selection, authors on one side, funders and instiutions on the other, we can see, at least as analogy, a reason for the run away selection for publishing in prestigious journals. A runaway process where the signalling bares a tenous relationship with the underlying qualities being sought, in the same way as the size of the peacock’s tail has a tenous link with its health and fitness.

But as Martin Eve has argued (Open Access in the Humanities, Chapter 2), we need such signals. The labour of detailed assessment of all research for the full range of desirable qualities is unaffordable. Summaries and signals are needed. The question, perhaps, is whether this costly signalling is as honest as it could be. Is it creating a sustainable culture and community with a solid base? The apparent rise in fraud in retractions, particularly amongst those high prestige publications, suggests that this is a question that should be seriously addressed. To stretch the biological analogy, has a gene for faked tails emerged? Such fake display is not uncommon in biology.

Addressing that question means asking questions about what the underlying qualities we desire are. That’s an important question which I’ve raised elsewhere but I don’t want to go down that route here. I want to explore a different possibility. One that arises from asking whether a different form of signalling might be possible.

Communicating research in a reproducible (or replicable, or generalizable, the semantics are also an issue for another time) fashion is hard work. Many of us have argued that to enable greater reproducibility we need to provide better tools to reduce that cost. But what if the opposite were true? What if the value actually lies precise in the fact that communicating reproducibility is costly but is also potentially a more honest representation of what a community values than publication in a high profile journal.

If you buy that argument then we have a problem. The sexual selection run away is hard to break out of, at least in the case of biological evolution. At some point survivability prevents tails or horns growing so big they overbalance the animal, but by that stage a huge and unnecessary investment has been made. However in the case made by Potts and Hartley the thing that is evolving is more malleable. Perhaps, by creating a story of how the needs of funders and institutions are better served by focussing on a different form of signalling it will be possible to shift.

Of course this does happen in nature as well. When a sub-population develops a different form of display and co-selection kicks off then populations diverge, sometimes to occupy different niches, sometimes to compete, and ultimately displace the original population. It’s one way that new species form.

 

 

Freedoms and responsibilities: Goffman, Hunt, Bohannan and Stapel

There has been much talk about both “academic freedom” as well as the responsibilities of scholars over the past few weeks. Both of these are troublesome concepts, not least because one person’s “freedom” is another’s irresponsible conduct. But particularly in the context of “academic freedom” the question of freedom to do or say what, and what responsibilities come with that is complex. And of course the freedom to speak is not the right to an expectation to be taken seriously. Any such right or authority is also tied to certain, usually unspecified responsibilities.

The question of academic freedom has been most visibly raised in the context of Tim Hunt’s reported comments at a Korean conference. As it happens I have my own story involving Hunt and misintepretation, once which might provide a useful way in to the issue.

At the closing panel of the Berlin11 meeting I spoke in a discussion panel about the progress towards open access and the future of scholarly communications. The meeting, in Berlin, was held in an old building that had been wonderfully refurbished as a conference space. In my remarks I drew an analogy with the building, the idea of taking the best of the past and repurposing it to support the future, noting that the building literally showed the scars of history, in this case the damage inflicted by allied bombing in World War II.

It was later related to me that Hunt had said that this was a very eloquent defence of journals like Nature and Science. Of course anyone who knows me will know that was absolutely not my intended meaning. What I meant, and what I said, were not congruent with what was heard. My intent was to provoke thought on what was worth keeping, not to defend the status quo. But who is responsible for the misunderstanding? What is my responsibility for greater clarity?

It may seem like a trivial misunderstanding, but it could have not been. We were in Berlin. The building might have a very dark history, certainly it is a near statistical certainty that some members of the audience had lost family members to allied bombing. My comments could have been misintepreted as saying that the building was more important than their relative’s suffering. That issue did not occur to me at the time, and looking back today I am ashamed by that. It may not have changed what I said, but it would certainly have changed the way I said it. As a sufficient authority to be asked to offer my views in the final session of an important meeting I had a responsibility to take care that my comments were authoritative but also that they were responsible.

Nobel Laureates travel a lot. They are in demand as speakers because they have authority. They have authority obviously in their area of research but also as senior members of the research community they bring a perspective as leaders who have been involved in the governance and strategic development of the research community. When that authority is assumed without sufficient care, or in areas where the person in question is not well informed, the result tends to rebound badly – Jim Watson’s comments on race, Pauling’s on vitamin C come to mind.

To those who are given great authority, whether in the form of Nobel prizes or large twitter followings, is also given great responsibility. Sometimes discharged well, sometimes not. Academic authority and academic freedom are not easy bedfellows. The right to speak one’s mind is freedom of speech. The ability to deploy one’s authority is not a right. Authority is the ability to be listened to not the ability to speak freely. And that ability comes with responsibility. Academic freedom is not the right to speak one’s mind. It is rather the responsibility to speak on issues, with the authority that arises from scholarly rigour. It is the tradition that employment should not be at risk when a scholar speaks in their area of expertise.

The most damning indictment therefore of the cries of “Academic Freedom” in the defense of Hunt is that his comments were bad science. They were spectacularly uninformed by the large quantity of literature that shows virtually the opposite of what he said (see Curt Rice’s blog for an up to date summary). Further the defence that “it was just a joke” can only be made by failing to engage with the literature that shows that not only do jokes surface real bias and real issues, but that asking a disdvantaged group to accept something as a joke normalises that disadvantage. Hilda Bastian has covered this in her excellent post.

The question of responsibility has also been raised in the furore of John Bohannan’s recent exposes, first on fly by night scholarly publishers seeking to fleece researchers, and more recently on the reporting, review and publicity around poorly run “medical” studies. In both cases questions are raised of methodology. In the Open Access sting many commentators, myself included, excoriated Bohannan for not running a proper control, in essence not running a proper scientific study. In the more recent chocolate study issues of ethical oversight and risk to participants were raised. If Bohannan was a scientist, speaking with the authority of a scholar then this would be a reasonable criticism. His own claim of the title “gonzo scientist” raises some interesting questions in this regard but fundamentally he is a journalist and writer, governed by different rules of authority and responsibility.

In the case of the OA sting those questions of authority were muddied by the publication of the piece in Science. Online the distinction between this journalistic piece and a research article is not immediately clear. To be fair, in the piece itself John does make the point that conclusions on the prevalence of poor peer review practices in subscription vs open access journals cannot be drawn from this work. Indeed his aim is a different kind of “proof”, in this case an existence proof of the problem – there are “journals” that do little to no peer review, and many of them are open access.

The problems I have with the piece, and they are many, arguably conflate my expectations of a piece of scholarly research and the responsibilities of a scholar – the need to tell us something new or useful – with the very different aims of a journalist, to expose an issue to a wider audience. Indeed the very establishment power structures moving into place to defend Hunt are the ones that I, arguably hypocritically, deployed to combat Bohannan. “The problem has been known for some time” “Quiet work was being done on it” “Leave our community to get on and sort out our problems”. But did we need an outsider to make it public enough and urgent enough to drive real action? Did Bohannan have responsibilities to the Open Access community to tell us more about the problem, to do a proper study, or as a journalist was his responsiblity to publicly expose the issue?

Alice Goffman is another researcher facing a different set of tensions over responsibility, freedom and authority. Her book On the Run gives a challenging account of inner city life amongst deprived black american youth. Published in 2014 it can be seen as a warning of the subsequent events in Ferguson and Baltimore and other places.

Goffman is an enthographer and her book started life as a scholarly monograph, but one that has gone on to have success as a mainstream non-fiction book. Ethnography involves working closely with, often living with research subjects, and the protection of the privacy of subjects is held as a very high principle. As described in this Slate article (which is my main source) this generally means obscuring locations, names, even the chronology of events to create a narrative which surfaces a deeper underlying truth about what is going on. Goffman took this responsibility particularly seriously given she observed events that could land people in jail, going so far as to destroy her notebooks so as to protect her research subjects.

But as this uncomfortable narrative became more public and transformed into a mainstream non-fiction book the responsibilities of the author (no longer a scholar?) seemed to change. General non-fiction is supposed to be “true” and Goffman’s rearrangement of facts, people and timelines breaks this expectation. What is interesting is that was in turn is used to raise charges of scholarly misconduct. The responsibility of the author to the reader is in direct conflict with the responsibility of the scholar to their subjects, yet the critic chooses to attack the scholarship. Indeed, given that the criticism and claims of misconduct are based on a forensic analysis of the text in some sense Goffman is under attack because she didn’t do a good enough job of hiding the process of discharging her scholarly responsibilities, leaving inconsistencies in the timelines and events.

Which responsibility trumps which? What does “integrity” mean in this context, or rather disparate and competing contexts, and how does a public scholar working on important and challenging problems navigate those competing issues? Where is Goffman’s academic freedom and where do her academic responsibilities lie? In restricting her space for communication to the academic world? In speaking (her) truth to power? Or is that space left for those licensed to speak through mainstream books? Is it left for Bohannan because only the outsider can make that transition?

The question of research integrity in Goffman’s case is challenging. Her destruction of notebooks certainly disturbs me as someone concerned primarily with the integrity of the research record. But I can respect the logic and to the extent that it is seen as reasonable within her disciplinary context accept that as appropriate scholarly practice.

The question of fraud in natural and social science research may seem much clearer. Diederik Stapel (I could have easily chosen Jan Hendrik Schön or many others) simply made up datasets. Here it seems there are clear lines of responsibility. The scholar is expected to add to the record, not muddy it. As we move towards digital records and data sharing these expectations are rising. Reproducible research is a target that seems plausible at least in some disciplines, although ironically we are perhaps merely returning to the level of record keeping recommended by Robert Boyle in 1660.

Does academic freedom mean the right to publish results based on made up data? Of course not. The scholar has a responsibility to report accurately when speaking in a scholarly context. It is not a crime to make up data, even in a research context. Ideas might be expressed though imagined or constructed datasets, they may even be an integral part of the research process as test sets, statistical tools or training sets. It is a “crime” to misrepresent or mis-use them. Even carelessness is treated a significant misdemeanour, leading as it does to retraction and consequent embarassment. Where does “carelesness” of the type that leads to retraction become “foolishness” that only requires mild rebuke?

But the idea of a “complete record” and “reproducibility” is a slippery one. In Goffman’s case reproducibility is impossible even in principle. Ethnographers would I imagine regard it as deeply problematic. The responsibility here is not even to report true facts, but the deeper truth – as the scholar sees it – that underlies the events they observe. Stapel may well have thought he was also telling “a truth”, just one for which the data wasn’t quite clean enough. A serious issue behind Bohannan’s chocolate expose is that p-value hacking, searching a weak dataset for “a truth” to tell, is endemic in many disciplines and that peer review as currently constructed is impotent in tackling it. Peer review assumes that authors have taken on board the responsibility to tell the truth (something Bohannan explicitly didn’t do for instance in the correspondence he had with PLOS One staff in the technical checks done before formal peer review).

Many of the technical discussions of reproducibility and data sharing founder on issues of reproducible for who? At what level? In what way? Bohannan shared his data, but you could not now reproduce his “experiment” precisely. His actions make that impossible. Goffman’s data does not exist but events in Ferguson, Baltimore and elsewhere arguably confirm her claims and narrative. Does Amgen’s failure to reproduce the vast majority of findings published on cancer biology in “top journals” mean we have a crisis?

Perhaps better to ask, what is the responsibility of authors publishing in cancer biology to their readers. To tell the truth as they see it? Obviously. To use all the tools at our disposal to prevent us fooling ourselves, to prevent us seeing what we want to see? Certainly. To provide the data? Increasingly so. To ensure the materials (cell lines, antibodies, reagents) are available to those who do want to do direct replication? Oh, that might be too much to expect. Today at least, but tomorrow? This is a discussion about responsibilies. Not technical details. Responsibilities to who, and for what, and how does that vary across disciplines. Perhaps focussing on “reproducibility” is the wrong approach.

As freedom of speech is merely right to a voice, not to a listener, academic freedom has its limits. The boundaries between scholarly speech, within a scholarly community, and wider public speech is blurring, as Goffman and Hunt have found, and as Bohannan has shown us. Context matters, whether the context of my previous writing on the merits and de-merits of Nature, the history of a building, or in the choice to make a joke of the wrong type in the wrong place. And the authority that comes from experience and responsibility in one space does not always travel well into a different context.

Does this mix of contexts and expectations mean we should simply give up? Just be quiet and retreat? That would be the easy answer. But the wrong one. Academic Freedom, or Academic Responsibility comes with the responsibility to speak. But it is a responsibility to be exercised with care. And with empathy for the different contexts that different audiences may find themselves in. Showing our working and showing our thinking. Showing the disciplinary traditions and expectations, the responsibilities that we have assumed, explicitly will help.

Above all, speaking from a position of authority (and I have chosen to use the word authority, rather than power deliberately) means assuming a higher level of reponsibility. This is perhaps best summed up in the direct advice “never punch down”. When speaking from a position of scholarly authority the limits of that authority, the limits of the experience, and the care expected in having mastery of the evidence are higher. And this is reasonable. And more and more important if scholarship is to be part of the wider world and not something that is done to it. If, after all, scholarship is about informed criticism and discussion, we all have a responsibility not just to speak, with care, but also to listen.

This piece has been very strongly shaped by a range of recent discussions, most strongly with Michael Nielsen (on John Bohannan’s work) and Michelle Brook (on diversity, power relations, integrity and the tensions between them), but also the ongoing discussion on twitter and more generally about Tim Hunt’s comments and Bohannan’s recent “sting”.

Good practice in research coding: What are the targets and how do we get there…?

EN{code}D Exhibition, The Building Centre, Sto...
Image by olliepalmer.com via Flickr

The software code that is written to support and manage research sits at a critical intersection of our developing practice of shared, reproducible, and re-useble research in the 21st century. Code is amongst the easiest things to usefully share, being both made up of easily transferable bits and bytes but also critically carrying its context with it in a way that digital data doesn’t do. Code at its best is highly reproducible: it comes with the tools to determine what is required to run it (make files, documentation of dependencies) and when run should (ideally) generate the same results from the same data. Where there is a risk that it might not, good code will provide tests of one sort or another than you can run to make sure that things are ok before proceeding. Testing, along with good documentation is what ensures that code is re-usable, that others can take it and efficiently build on it to create new tools, and new research.

The outside perspective, as I have written before, is that software does all of this better than experimental research. In practice the truth is that there are frameworks that make it possible for software to do a very good job on these things, but that in reality doing a good job takes work; work that is generally not done. Most software for research is not shared, is not well documented, generates results that are not easily reproducible, and does not support re-use and repurposing through testing and documentation. Indeed much like most experimental research. So how do we realise the potential of software to act as an exemplar for the rest of our research practice?

Nick Barnes of the Climate Code Foundation developed the Science Code Manifesto, a statement of how things ought to be (I was very happy to contribute and be a founding signatory) and while for many this may not go far enough (it doesn’t explicitly require open source licensing) it is intended as a practical set of steps that might be adopted by communities today. This has already garnered hundreds of endorsers and I’d encourage you to sign up if you want to show your support. The Science Code Manifesto builds on work over many years of Victoria Stodden in identifying the key issues and bringing them to wider awareness with both researchers and funders as well as the work of John Cook, Jon Claerbout, and Patrick Vanderwalle at ReproducibleResearch.net.

If the manifesto and the others work are actions that aim (broadly) to set out the principles and to understand where we need to go then Open Research Computation is intended as a practical step embedded in today’s practice. Researchers need the credit provided by conventional papers, so if we can link papers in a journal that garners significant prestige, with high standards in the design and application of the software that is described we can link the existing incentives to our desired practice. This is a high wire act. How far do we push those standards out in front of where most of the community is. We explicitly want ORC to be a high profile journal featuring high quality software, for acceptance to be a mark of quality that the community will respect. At the same time we can’t ask for the impossible. If we set standards so high that no-one can meet them then we won’t have any papers. And with no papers we can’t start the process of changing practice. Equally, allow too much in and we won’t create a journal with a buzz about it. That quality mark has to be respected as meaning something by the community.

I’ll be blunt. We haven’t had the number of submissions I’d hoped for. Lots of support, lots of enquiries, but relatively few of them turning into actual submissions. The submissions we do have I’m very happy with. When we launched the call for submissions I took a pretty hard line on the issue of testing. I said that, as a default, we’d expect 100% test coverage. In retrospect that sent a message that many people felt they couldn’t deliver on. Now what I meant by that was that when testing fell below that standard (as it would in almost all cases) there would need to be an explanation of what the strategy for testing was, how it was tackled, and how it could support people re-using the code. The language in the author submission guidelines has been softened a bit to try and make that clearer.

What I’ve been doing in practice is asking reviewers and editors to comment on how the testing framework provided can support others re-using the code. Are the tests provided adequate to help someone getting started on the process of taking the code, making sure they’ve got it working, and then as they build on it, giving them confidence they haven’t broken anything. For me this is the critical question, does the testing and documentation make the code re-usable by others, either directly in its current form, or as they build on it. Along the way we’ve been asking whether submissions provide documentation and testing consistent with best practice. But that always raises the question of what best practice is. Am I asking the right questions? And were should we ultimately set that bar?

Changing practice is tough, getting the balance right is hard. But the key question for me is how do we set that balance right? And how do we turn the aims of ORC, to act as a lever to change the way that research is done, into practice?

 

Enhanced by Zemanta

Replication, reproduction, confirmation. What is the optimal mix?

Issues surrounding the relationship of Open Research and replication seems to be the meme of the week. Abhishek Tiwari provided notes on a debate describing concerns about how open research could damage replication and Sabine Hossenfelder explored the same issue in a blog post. The concern fundamentally is that by providing more of the details of our research we may actually be damaging the research effort by reducing the motivation to reproduce published findings or worse, as Sabine suggests, encouraging group think and a lack of creative questioning.

I have to admit that even at a naive level I find this argument peculiar. There is no question that in aiming to reproduce or confirm experimental findings it may be helpful to carry out that process in isolation, or with some portion of the available information withheld. This can obviously increase the quality and power of the confirmation, making it more general. Indeed the question of how and when to do this most effectively is very interesting and bears some thought. The optimization of these descisions in specific cases will be important part of improving research quality. What I find peculiar is the apparent belief in many quarters (but not necessarily Sabine who I think has a much more sophisticated view) that this optimization is best encouraged by not bothering to make information available. We can always choose not to access information if it is available but if it is not we cannot choose to look at it. Indeed to allow the optimization of the confirmation process it is crucial that we could have access to the information if we so decided.

But I think there is a deeper problem than the optimization issue. I think that the argument also involves two category errors. Firstly we need to distinguish between different types of confirmation. There is pure mechanical replication, perhaps just to improve statistical power or to re-use a technique for a different experiment. In this case you want as much detail as possible about how the original process was carried out because there is no point in changing the details. The whole point of the exercise is to keep things as similar as possible. I would suggest the use of the term “reproduction” to mean a slightly different case. Here the process or experiment “looks the same” and is intended to produce the same result but the details of the process are not tightly controlled. The purpose of the exercise is to determine how robust the result or output is to modified conditions. Here withholding, or not using, some information could be very useful. Finally there is the process of actually doing something quite different with the intention of testing an idea or a claim from a different direction with an entirely different experiment or process. I would refer to this as “confirmation”. The concerns of those arguing against providing detailed information lie primarily with confirmation, but the data and process sharing we are talking about relates more to replication and reproduction. The main efficiency gains lie in simply re-using shared process to get down a scientific path more rapidly rather than situations where the process itself is the subject of the scientific investigation.

The second category error is somewhat related in as much as the concerns around “group-think” refer to claims and ideas whereas the objects we are trying to encourage sharing when we talk about open research are more likely to be tools and data. Again, it seems peculiar to argue that the process of thinking independently about research claims is aided by reducing  the amount of data available. There is a more subtle argument that Sabine is making and possibly Harry Collins would make a similar one, that the expression of tools and data may be inseparable from the ideas that drove their creation and collection. I would still argue however that it is better to actively choose to omit information from creative or critical thinking rather than be forced to work in the dark. I agree that we may need to think carefully about how we can effectively do this and I think that would be an interesting discussion to have with people like Harry.

But the argument that we shouldn’t share because it makes life “too easy” seems dangerous to me. Taking that argument to its extreme we should remove the methods section from papers altogether. In many cases it feels like we already have and I have to say that in day to day research that certainly doesn’t feel helpful.

Sabine also makes a good point, that Michael Nielsen also has from time to time, that these discussions are very focussed on experimental and specifically hypothesis driven research. It bears some thinking about but I don’t really know enough about theoretical research to have anything useful to add. But it is the reason that some of the language in this post may seem a bit tortured.