This is more a note to write something on this up in some more detail. In the original post What is it publishers do anyway? I gestured towards the idea that one of the main value-adds for the artist formerly known as the publisher is in managing a long tail of challenging, and in some cases quite dangerous issues. What I didn’t quite say, but was implicit, is that a big role for publishers in preventing the researcher-author from getting egg on their face.
Enter this weeks entry into the pantheon of grotesque research ethics fails, the release of 70 thousand OK Cupid records as a dataset on the Open Science Framework. Now OSF and its home the Centre for Open Science would not claim to be a “publisher” but it provides a platform for publishing research outputs just as does Nature, Cell, PLOS ONE, bioRxiv, Figshare, and Dryad…and…Github.
Some of these platforms would have triaged this (or anything like it) out before it went public, or where its relevant before it even reached a reviewer. You might be a little surprised by which ones (remember PNAS publishing the Facebook emotion study paper?). These kinds of disaster submissions don’t need to be very frequent before it can require significant resources to catch the ones that do come in. It also means you need to develop policies to decide what to allow on the platform. For instance bioRxiv did not allow “medically relevant” manuscripts to be made public on their platform when it launched.
As we expand the ways and forms in which we communicate research outputs, it will become increasingly important to ask what standards we expect of ourselves in deciding where and how things are shared, and what expectations we have of those platforms that make a claim to be “scholarly” in some sense. No-one expects Google to police the publishing of each and every dataset on a shared drive. Everyone expects a traditional publisher to ensure community ethical standards are applied (even, or rather especially, when the community is diverse enough to not necessarily know what they are). Where does OSF, or bioRxiv, or arXiv fit into this continuum?
It is catching this kind of thing that really contributes to the costs that publishers incur. When we talk about a completely in-house system this is the kind of thing we’re talking about throwing away. Of course there may be better and cheaper ways of maintaining these kinds of standards and we need a serious discussion about which ones matter and what kinds of breaches are acceptable. But for those of you who wonder where that extra few hundred dollars in costs is coming from, this is definitely one of them.

These are good points, but this sort of norm is generated by the scientific community anyway. Journals don’t enforce ethical norms because it’s their job as journals, but because we, as a scientific community decided that ethical standards around human data are important. And journals want to be arbiters of the things that are important.
As with peer review, curation etc – Journals have appropriated this role not because they’re best suited to enforcement, but because it props up the edifice that they’re important. Establishing community standards, educating trainees in those standards and punishing violations of those standards can (and I think should be) an open process just like the rest of science.
Well you might argue that ethical standards actually arise from the interaction of research communities with other communities (both research and non-research) but that’s probably an aside. The reality is that researchers are very bad at enforcing our own standards. Shifting this to journals/publishers/societies is one way to manage it. Not the only one of course but a way.
But this is not even so much about ethical standards as about catching mistakes and embarrassments before they become (too) public. That’s a service. Now we could do it ourselves, but we mostly don’t – some of this gets caught in peer review but a lot more in “technical review” – so the question is whether the service is good value for money and whether we need it.