publication – Science in the Open

March 8, 2009December 30, 2009

Why good intentions are not enough to get negative results published

There are a set of memes that seem to be popping up with increasing regularity in the last few weeks. The first is that more of the outputs of scientific research need to be published. Sometimes this means the publication of negative results, other times it might mean that a community doesn’t feel they have an outlet for their particular research field. The traditional response to this is “we need a journal” for this. Over the years there have been many attempts to create a “Journal of Negative Results”. There is a Journal of Negative Results – Ecology and Evolutionary Biology (two papers in 2008), a Journal of Negative Results in Biomedicine (four papers in 2009, actually looks pretty active) , a Journal of Interesting Negative Results in Natural Language (one paper), and a Journal of Negative Results in Speech and Audio Sciences, which appears to be defunct.

The idea is that there is a huge backlog of papers detailing negative results that people are gagging to get out if only there was somewhere to publish them. Unfortunately there are several problems with this. The first is that actually writing a paper is hard work. Most academics I know do not have the problem of not having anything to publish, they have the problem of getting around to writing the papers, sorting out the details, making sure that everything is in good shape. This leads to the second problem, that getting a negative result to a standard worthy of publication is much harder than for a positive result. You only need to make that compound, get that crystal, clone that gene, get the microarray to work once and you’ve got the data to analyse for publication. To show that it doesn’t work you need to repeat several times, make sure your statistics are in order, and establish your working condition. Partly this is a problem with the standards we apply to recording our research; designing experiments so that negative results are well established is not high on many scientists’ priorities. But partly it is the nature of beast. Negative results need to be much more tightly bounded to be useful .

Finally, even if you can get the papers, who is going to read them? And more importantly who is going to cite them? Because if no-one cites them then the standing of your journal is not going to be very high. Will people pay to have papers published there? Will you be able to get editors? Will people referee for you? Will people pay for subscriptions? Clearly this journal will be difficult to fund and keep running. And this is where the second meme comes in, one which still gets suprising traction, that “publishing on the web is free”. Now we know this isn’t the case, but there is a slighlty more sophisticated approach which is “we will be able to manage with volunteers”. After all with a couple of dedicated editors donating the time, peer review being done for free, and authors taking on the formatting role, the costs can be kept manageable surely? Some journals do survive on this business model, but it requires real dedication and drive, usually on the part of one person. The unfortunate truth is that putting in a lot of your spare time to support a journal which is not regarded as high impact (however it is measured) is not very attractive.

For this reason, in my view, these types of journals need much more work put into the business model than for a conventional specialist journal. To have any credibility in the long term you need a business model that works for the long term. I am afraid that “I think this is really important” is not a business model, no matter how good your intentions. A lot of the standing of a journal is tied up with the author’s view of whether it will still be there in ten years time. If that isn’t convincing, they won’t submit, if they don’t submit you have no impact, and in the long term a downward spiral until you have no journal.

The fundamental problem is that the “we need a journal” approach is stuck in the printed page paradigm. To get negative results published we need to reduce the barriers to publication much lower than they currently are, while at the same time applying either a pre- or post-publication filter. Rich Apodaca, writing on Zusammen last week talked about micropublication in chemistry, the idea of reducing the smallest publishable unit by providing routes to submit smaller packages of knowledge or data to some sort of archive. This is technically possible today, services like ChemSpider, NMRShiftDB, and others make it possible to submit small pieces of information to a central archive. More generally the web makes it possible to publish whatever we want, in whatever form we want, but hopefully semantic web tools will enable us to do this in an increasingly more useful form in the near future.

Fundamentally my personal belief is that the vast majority of “negative results” and other journals that are trying to expand the set of publishable work will not succeed. This is precisely because they are pushing the limits of the “publish through journal” approach by setting up a journal. To succeed these efforts need to embrace the nature of the web, to act as a web-native resource, and not as a printed journal that happens to be viewed in a browser. This does two things, it reduces the barrier to authors submitting work, making the project more likely to be successful, and it can also reduce costs. It doesn’t in itself provide a business model, nor does it provide quality assurance, but it can provide a much richer set of options for developing both of these that are appropriate to the web. Routes towards quality assurance are well established, but suffer from the ongoing problem of getting researchers involved in the process, a subject for another post. Micropublication might work through micropayments, the whole lab book might be hosted for a fee with a number of “publications” bundled in, research funders may pay for services directly, or more interestingly the archive may be able to sell services built over the top of the data, truly adding value to the data.

But the key is a low barriers for authors and a robust business model that can operate even if the service is perceived as being low impact. Without these you are creating a lot of work for yourself, and probably a lot of grief. Nothing comes free, and if there isn’t income, that cost will be your time.

February 15, 2009January 7, 2016

Contributor IDs – an attempt to aggregate and integrate

Following on from my post last month about using OpenID as a way of identifying individual researchers,Â Chris Rusbridge made the sensible request that when conversations go spreading themselves around the web it would be good if they could be summarised and aggregated back together. Here I am going to make an attempt to do that – but I won’t claim that this is a completely unbiased account. I will try to point to as much of the conversation as possible but if I miss things out or misprepresent something please correct me in the comments or the usual places.

The majority of the conversation around my post occured on friendfeed, at the item here, but also see commentary around Jan Aert’s post (and friendfeed item) and Bjoern Bremb’s summary post. Other commentary included posts from Andy Powell (Eduserv), Chris Leonard (PhysMathCentral), Euan, Amanda Hill of the Names project, and Paul Walk (UKOLN). There was also a related article in Times Higher Education discussing the article (Bourne and Fink) in PLoS Comp Biol that kicked a lot of this off [Ed – Duncan Hull also pointed out there is a parallel discussion about the ethics of IDs that I haven’t kept up with – see the commentary at the PLoS Comp Biol paper for examples]. David Bradley also pointed out to me a post he wrote some time ago which touches on some of the same issues although from a different angle. Pierre set up a page on OpenWetWare to aggregate material to, and Martin Fenner has a collected set of bookmarks with the tag authorid at Connotea.

The first point which seems to be one of broad agreement is that there is a clear need for some form of unique identifier for researchers. This is not necessarily as obvious as it might seem. With many of these proposals there is significant push back from communities who don’t see any point in the effort involved. I haven’t seen any evidence of that with this discussion which leads me to believe that there is broad support for the idea from researchers, informaticians, publishers, funders, and research managers. There is also strong agreement that any system that works will have to be credible and trustworthy to researchers as well as other users, and have a solid and sustainable business model. Many technically minded people pointed out that building something was easy – getting people to sign up to use it was the hard bit.

Equally, and here I am reading between the lines somewhat, any workable system would have to be well designed and easy to use for researchers. There was much backwards and forwards about how “RDF is too hard”, “you can’t expect people to generate FOAF” and “OpenID has too many technical problems for widespread uptake”. Equally people thinking about what the back end would have to look like to even stand a chance of providing an integrated system that would work felt that FOAF, RDF, OAuth, and OpenID would have to provide a big part of the gubbins. The message for me was that the way the user interface(s) is presented have to be got right. There are small models of aspects of this that show that easy interfaces can be built to capture sophisticated data, but getting it right at scale will be a big challenge.

Where there is less agreement is on the details, both technical and organisational of how best to go about creating a useful set of unique identifiers. There was some to-and-fro as to whether CrossRef was the right organisation to manage such a system. Partly this turned on concern over centralised versus distributed systems and partly over issues of scope and trust. Nonetheless the majority view appeared to be that CrossRef would be right place to start and CrossRef do seem to have plans in this direction (from Geoffry Bilder see this Friendfeed item).

There was also a lot of discussion around identity tokens versus authorisation. Overall it seemed that the view was that these can be productively kept separate. One of the things that appealed to me in the first instance was that OpenIDs could be used as either tokens (just a unique code that is used as an identifier) as well as a login mechanism. The machinery is already in place to make that work. Nonetheless it was generally accepted, I think, that the first important step is an identifier. Login mechansisms are not necessarily required, or even wanted, at the moment.

The discussion as to whether OpenID is a good mechanism seemed in the end to go around in circles. Many people brought up technical problems they had with getting OpenIDs to work, and there are ongoing problems both with the underlying services that support and build on the standard as well as with the quality of some of the services that provide OpenIDs. This was at the core of my original proposal to build a specialist provider, that had an interface, and functionality that worked for researchers. As Bjoern pointed out, I should of course be applying my own five criteria for successful web services (got to the last slide) to this proposal. Key questions: 1) can it offer something compelling? Well no, not unless someone, somewhere requires you to have this thing 2) can you pre-populate? Well yes, and maybe that is the key…(see later). In the end, as with the concern over other “informatics-jock” terms and approaches, the important thing is that all of the technical side is invisible to end users.

Another important discussion, that again, didn’t really come to a conclusion, was who would pass out these identifiers? And when? Here there seemed to be two different perspectives. Those who wanted the identifiers to be completely separated from institutional associations, at least at first order. Others seemed concerned that access to identifiers be controlled via institutions. I definitely belong in the first camp. I would argue that you just give them to everyone who requests them. The problem then comes with duplication, what if someone accidentally (or deliberately) ends up with two or more identities. At one level I don’t see that it matters to anyone except to the person concerned (I’d certainly be trying to avoid having my publication record cut in half). But at the very least you would need to have a good interface for merging records when it was required. My personal belief is that it is more important to allow people to contribute than to protect the ground. I know others disagree and that somewhere we will need to find a middle path.

One thing that was helpful was the fact that we seemed to do a pretty good job of getting various projects in this space aggregated together (and possibly more aware of each other). Among these is ResearcherID, a commercial offering that has been running for a while now, the Names project, a collaboration of Mimas and the British Library funded by JISC, ClaimID is an OpenID provider that some people use that provides some of the flexible “home page” functionality (see Maxine Clark’s for instance) that drove my original ideas, PublicationsList.org provides an online homepage but does what ClaimID doesn’t, providing a PubMed search that makes it easier (as long as your papers are in PubMed) to populate that home page with your papers (but not easier to include datasets, blogs, or wikis – see here for my attempts to include a blog post on my page). There are probably a number of others, feel free to point out what I’ve missed!

So finally where does this leave us? With a clear need for something to be done, with a few organisations identified as the best ones to take it forward, and with a lot of discussion required about the background technicalities required. If you’re still reading this far down the page then you’re obviously someone who cares about this. So I’ll give my thoughts, feel free to disagree!

We need an identity token, not an authorisation mechanism. Authorisation can get easily broken and is technically hard to implement across a wide range of legacy platforms. If it is possible to build in the option for authorisation in the future then that is great but it is not the current priority.
The backend gubbins will probably be distributed RDF. There is identity information all over the place which needs to be aggregated together. This isn’t likely to change so a centralised database, to my mind, will not be able to cope. RDF is built to deal with these kinds of problems and also allows multiple potential identity tokens to be pulled together to say they represent one person.
This means that user interfaces will be crucial. The simpler the better but the backend, with words like FOAF and RDF needs to be effectively invisible to the user. Very simple interfaces asking “are you the person who wrote this paper” are going to win, complex signup procedures are not.
Publishers and funders will have to lead. The end view of what is being discussed here is very like a personal home page for researchers. But instead of being a home page on a server it is a dynamic document pulled together from stuff all over the web. But researchers are not going to be interested for the most part in having another home page that they have to look after. Publishers in particular understand the value (and will get most value out of in the short term) unique identifiers so with the most to gain and the most direct interest they are best placed to lead, probably through organisations like CrossRef that aggregate things of interest across the industry. Funders will come along as they see the benefits of monitoring research outputs, and forward looking ones will probably come along straight away, others will lag behind. The main point is that pre-populating and then letting researchers come along and prune and correct is going to be more productive than waiting for ten millions researchers to sign up to a new service.
The really big question is whether there is value in doing this specially for researchers. This is not a problem unique to research and one in which a variety of messy and disparate solutions are starting to arise. Maybe the best option is to sit back and wait to see what happens. I often say that in most cases generic services are a better bet than specially built ones for researchers because the community size isn’t there and there simply isn’t a sufficient need for added functionality. My feeling is that for identity that there is a special need, and that if we capture the whole research community that it will be big enough to support a viable service. There is a specific need for following and aggregating the work of people that I don’t think is general, and is different to the authentication issues involved in finance. So I think in this case it is worth building specialist services.

The best hope I think lies in individual publishers starting to disambiguate authors across their existing corpus. Many have already put a lot of effort into this. In turn, perhaps through CrossRef, it should be possible to agree an arbitrary identifier for each individual author. If this is exposed as a service it is then possible to start linking the information up. People can and will and the services will start to grow around that. Once this exists then some of the ideas around recognising referees and other efforts will start to flow.

January 20, 2009December 30, 2009

A specialist OpenID service to provide unique researcher IDs?

Following on fromÂ Science Online 09Â and particularly discussions on Impact Factors and researcher incentivesÂ (also onÂ FriendfeedÂ and some video available atÂ Mogulus via video on demand) as well as theÂ article in PloS Computational BiologyÂ by Phil Bourne and Lynn Fink the issue of unique researcher identifiers has really emerged as absolutely central to making traditional publication work better, effectively building a real data web that works, and making it possible to aggregate the full list of how people contribute to the community automatically.

Good citation practice lies at the core of good science. The value of research data is not so much in the data itself but its context, its connection with other data and ideas. How then is it that we have no way of citing a person? We need a single, unique way, of identifying researchers. This will help traditional publishers and the existing ecosystem of services by making it possible to uniquely identify authors and referees. It will make it easier for researchers to be clear about who they are and what they have done. And finally it is a critical step in making it possible to automatically track all the contributions that people make. We’ve all seen CVs where people say they have refereed for Nature or the NIH or served on this or that panel. We can talk about micro credits but until there are validated ways of pulling that information and linking it to an identity that follows the person, not who they work for, we won’t make much progress.

On the other hand most of us do not want to be locked into one system, particularly if it is controlled by one commercialÂ organization. Â Thomson ISI’sÂ ResearcherIDÂ is positioned as a solution to this problem, but I for one am not happy with being tied into using one particular service, regardless of who runs it.

In the PLoS Comp Biol article Bourne and Fink argue that one solution to this isÂ OpenID. OpenID isn’t a service, it is a standard. This means that an identity can be hosted by a range of services and people can choose between them based on the service provided, personal philosophy, or any other reason. The central idea is that you have a single identity which you can use to sign on to a wide range of sites. In principle you sign into your OpenID and then you never see another login screen. In practice you often end up typing in your ID but at least it reduces the pain in setting up new accounts. It also provides in most cases a “home page”. If you go toÂ http://cameron.neylon.myopenid.comÂ you will see a (pretty limited) page with some basic information.

OpenID is becoming more popular with a wide range of webservices providing it as a login option includingÂ Dopplr,Â Blogger, and research sites includingÂ MyExperiment. Enabling OpenID is also on the list for a wide range of other services, although not always high up the priority list. As a starting point it could be very easy for researchers with an OpenID simply to add it to their address when publishing papers, thus providing a unique, and easily trackable identifier that is carried through the journal, abstracting services, and the whole ecosystem services built around them.

There are two major problems with OpenID. The first is that it is poorly supported by big players such as Google and Yahoo. Google and Yahoo will let you use your account with them as an OpenID but they don’t accept other OpenID providers. More importantly,Â people just don’t seem to get OpenID. It seems unnatural for some reason for a person’s identity marker to be a URL rather than a number, a name, or an email address. Compounded with the limited options provided by OpenID service providers this makes the practical use of such identifiers for researchers very much a minority activity.

So what about building an OpenID service specifically for researchers? Imagine a setup screen that asks sensible questions about where you work and what field you are in. Imagine that on the second screen, having done a search through literature databases it presents you with a list of publications to check through, remove any mistakes, allow you to add any that have been missed. And then imagine that the default homepage format is similar to an academic CV.

Problem 1: People already have multiple IDs and sometimes multiple OpenIDs. So we make at least part of the back end file format, and much of what is exposed on the homepageÂ FOAF, making it possible to at least assert that you are the same person as, say cameronneylon@yahoo.com.

Problem 2: Aren’t we just locking people into a specific service again? Well no, if people don’t want to use it they can use any OpenID provider, even set one up themselves. It is an open standard.

Problem 3: What is there to make people sign up? This is the tough one really. It falls into two parts. Firstly, for those of us who already have OpenIDs or other accounts on other systems, isn’t this just (yet) another “me too” service. So, in accordance with the five rules I have proposed for successful researcher web services, there has to be a compelling case for using it.

For me the answer to this comes in part from the question. One of the things that comes up again and again as a complaint from researchers is the need to re-format their CV (seeÂ Schleyer et al, 2008Â for a study of this). Remember that the aim here is to automaticallyÂ aggregate most of the information you would put in a CV. Papers should be (relatively) easy, grants might be possible. Because we are doing this for researchersÂ we know what the main categories are and what they look like. That is we have semantically structured data.

Ok so great I can re-format my CV easier and I don’t need to worry about whether it is up to date with all my papers but what about all these other sites where I need to put the same information? For this we need to provide functionality that lets all of this be carried easily to other services. Simple embed functionality like that you see onÂ YouTube, and most other good file hosting services, which generates a little fragment of code that can easily be put in place on other services (obviously this requires other services to allow that – which could be a problem in some cases). But imagine the relief if all the poor people who try to manage university department websites could just throw in some embed codes to automatically keep their staff pages up to date? Anyone seeing a business model here yet?

But for this to work the real problem to be solved is the vast majority of researchers for whom this concept is totally alien. How do we get them to be bothered to sign up for this thing which apparently solves a problem they don’t have? The best approach would be if journals and grant awarding bodies used OpenIDs as identifiers. This would be a dream result but doesn’t seem likely. It would require significant work on changing many existing systems and frankly what is in it for them? Well one answer is that it would provide a mechanism for journals and grant bodies toÂ publiclyÂ acknowledgeÂ the people who referee for them. An authenticated RSS feed from each journal or funder could be parsed and displayed on each researcher’s home page. The feed would expose a record of how many grants or papers that each person has reviewed (probably with some delay to prevent people linking that to the publication of specific papers). Of course such a feed could be used for lot of other interesting things as well, but none of them will work without a unique person identifier.

I don’t think this is compelling enough in itself, for the moment, but a simpler answer is what was proposed above – just encouraging people to include an OpenID as part of their address. Researchers will bend over backwards to make people happy if theyÂ believeÂ those people have an impact on their chances of being published or getting a grant. A little thing could provide a lot of impetus and that might bring into play the kind of effects that could result from acknowledgement and ultimately make the case that shifting to OpenID as the login system is worth the effort. This would particularly the case for funders who really want to be able to aggregate information about the people they fund effectively.

There are many details to think about here. Can I use my own domain name (yes, re-directs should be possible). Will people who use another service be at a disadvantage (probably, otherwise any business model won’t really work). Â Is there a business model that holds water (I think there is but the devil is in the details). Should it be non-profit or for profit or run by a respected body (I would argue that for-profit is possible and should be pursued to make sure the service keeps improving – but then we’re back with a commercial provider).

There are many good questions that need to be thought through but I think the principle of this could work, and if such an approach is to be successful it needs to get off the ground soon and fast.

Note:Â I am aware that a number of people are working behind the scenes on components of this and on similar ideas. Some of what is written above is derived from private conversations with these people and as soon as I know that their work has gone public I will add references and citations as appropriate at the bottom of this post.Â

January 1, 2009December 30, 2009

New Year’s Resolutions 2009

All good traditions require someone to make an arbitrary decision to do something again. Last year I threw up a few New Year’s resolutions in the hours before NYE in the UK. Last night I was out on the shore of Sydney Harbour. I had the laptop – I thought about writing something – and then I thought – nah I can just lie here and look at the pretty lights. However I did want to follow up the successes and failures of last year’s resolutions and maybe make a few more for this year.

So last year’s resolutions were, roughly speaking, 1) to adopt the principles of the NIH Open Access mandate when choosing journals for publications, 2) to get more of the existing data within my group online and available, 3) to take the whole research group fully open notebook, 4) to mention Open Notebooks in every talk I gave, and 5) attempt to get explicit funding for developing open notebook approaches.

So successes – the research group at RAL is now (technically) working on an Open Notebook basis. This has taken a lot longer than we expected and the guys are still really getting a feel for what that means both in terms of how the record things and how they feel about it. I think it will improve over time and it just reinforces the message that none of this is easy.Â I also made a point about talking about the Open Notebook approach is every talk I gave – mostly this was well received – often there was some scepticism but the message is getting out there.

However we didn’t do so well on picking journals – most of the papers I was on this year were driven by other people or were directed requests for special issues, or both. The papers that I had in mind I still haven’t got written, some drafts exist, but they’re definitely not finished. I also haven’t done any real work on getting older data online – it has been enough work just trying to manage the stuff we already have.

Funding is a mixed bag – the network proposal that was in last New Year’s was rejected. A few proposals have gone in – more haven’t gone in but exist in draft form – and a group of us went close to getting a tender to do some research into the uptake of Web 2. tools in science (more on that later but Gavin Baker has written about it and our tender document itself is available). The success of the year was the funding that Jean-Claude Bradley obtained from Submeta (as well as support from Aldrich Chemicals and Nature Publishing Group) to support the Open Notebook Science Challenge. I can’t take any credit for this but I think it is a good sign that we may have more luck this coming year.

So for this year – there are some follow ons – and some new ones:

I will re-write the network application (and will be asking for help) and re-submit it to a UK funder
I will clean up the “Personal View of Open Science” series of blog posts and see if I can get it published as a perspectives article in a high ranking journal
I will get some of those damn papers finished – and decide which ones are never going to be written and give up on them. Papers I have full control over will go by first preference to Gold OA journals.
I will pull together the pieces needed to take action on the ideas that came out of the Southampton Open Science workshop, specifically the idea of a letter signed by a wide range of scientists and interested people to a high ranking journal stating the importance of working towards published papers being fully supported by data and methodological detail that is fully available
I will focus on doing less things and doing them better – or at least making sure the resources are available to do more of the things I take on…

I think five is enough things to be going on with. Hope you all have a happy new year, whenever it may start, and that it takes you further in the direction you want to go (whether you know what that is now or not) than you thought was possible.

p.s. I noticed in the comments to last year’s post a comment from one Shirley Wu suggesting the idea of running a session at the 2009 Pacific Symposium on Biocomputing – a proposal that resulted in the session we are holding in a few days (again more later on – we hope – streaming video, micro blogging etc). Just thinking about how much has changed in the way such an idea would be raised and explored in the last twelve months is food for thought.