The future of research communication is aggregation

Paper as aggregation
Image by cameronneylon via Flickr

“In the future everyone will be a journal editor for 15 minutes” – apologies to Andy Warhol

Suddenly it seems everyone wants to re-imagine scientific communication. From the ACS symposium a few weeks back to a PLoS Forum, via interesting conversations with a range of publishers, funders and scientists, it seems a lot of people are thinking much more seriously about how to make scientific communication more effective, more appropriate to the 21st century and above all, to take more advantage of the power of the web.

For me, the “paper” of the future has to encompass much more than just the narrative descriptions of processed results we have today. It needs to support a much more diverse range of publication types, data, software, processes, protocols, and ideas, as well provide a rich and interactive means of diving into the detail where the user in interested and skimming over the surface where they are not. It needs to provide re-visualisation and streaming under the users control and crucially it needs to provide the ability to repackage the content for new purposes; education, public engagement, even main stream media reporting.

I’ve got a lot of mileage recently out of thinking about how to organise data and records by ignoring the actual formats and thinking more about what the objects I’m dealing with are, what they represent, and what I want to do with them. So what do we get if we apply this thinking to the scholarly published article?

For me, a paper is an aggregation of objects. It contains, text, divided up into sections, often with references to other pieces of work. Some of these references are internal, to figures and tables, which are representations of data in some form or another. The paper world of journals has led us to think about these as images but a much better mental model for figures on the web is of an embedded object, perhaps a visualisation from a service like Many Eyes, Swivel, and Tableau Public. Why is this better? It is better because it maps more effectively onto what we want to do with the figure. We want to use it to absorb the data it represents, and to do this we might want to zoom, pan, re-colour, or re-draw the data. But we want to know if we do this that we are using the same underlying data, so the data needs a home, an address somewhere on the web, perhaps with the journal, or perhaps somewhere else entirely, that we can refer to with confidence.

If that data has an individual identity it can in turn refer back to the process used to generate it, perhaps in an online notebook or record, perhaps pointing to a workflow or software process based on another website. Maybe when I read the paper I want that included, maybe when you read it you don’t – it is a personal choice, but one that should be easy to make. Indeed, it is a choice that would be easy to make with today’s flexible web frameworks if the underlying pieces were available and represented in the right way.

The authors of the paper can also be included as a reference to a unique identifier. Perhaps the authors of the different segments are different. This is no problem, each piece can refer to the people that generated it. Funders and other supporting players might be included by reference. Again this solves a real problem of today, different players are interested in how people contributed to a piece of work, not just who wrote the paper. Providing a reference to a person where the link show what their contribution was can provide this much more detailed information. Finally the overall aggregation of pieces that is brought together and finally published also has a unique identifier, often in the form of the familiar DOI.

This view of the paper is interesting to me for two reasons. The first is that it natively supports a wide range of publication or communication types, including data papers, process papers, protocols, ideas and proposals. If we think of publication as the act of bringing a set of things together and providing them with a coherent identity then that publication can be many things with many possible uses. In a sense this is doing what a traditional paper should do, bringing all the relevant information into a single set of pages that can be found together, as opposed to what they usually do, tick a set of boxes about what a paper is supposed to look like. “Is this publishable?” is an almost meaningless question on the web. Of course it is. “Is it a paper?” is the question we are actually asking. By applying the principles of what the paper should be doing as opposed to the straightjacket of a paginated, print-based document, we get much more flexibility.

The second aspect which I find exciting revolves around the idea of citation as both internal and external references about the relationships between these individual objects. If the whole aggregation has an address on the web via a doi or a URL, and if its relationship both to the objects that make it up and to other available things on the web are made clear in a machine readable citation then we have the beginnings of a machine readable scientific web of knowledge. If we take this view of objects and aggregates that cite each other, and we provide details of what the citations mean (this was used in that, this process created that output, this paper is cited as an input to that one) then we are building the semantic web as a byproduct of what we want to do anyway. Instead of scaring people with angle brackets we are using a paradigm that researchers understand and respect, citation, to build up meaningful links between packages of knowledge. We need the authoring tools that help us build and aggregate these objects together and tools that make forming these citations easy and natural by using the existing ideas around linking and referencing but if we can build those we get the semantic web for science as a free side product – while also making it easier for humans to find the details they’re looking for.

Finally this view blows apart the monolithic role of the publisher and creates an implicit marketplace where anybody can offer aggregations that they have created to potential customers. This might range from a high school student putting their science library project on the web through to a large scale commercial publisher that provides a strong brand identity, quality filtering, and added value through their infrastructure or services. And everything in between. It would mean that large scale publishers would have to compete directly with the small scale on a value-for-money basis and that new types of communication could be rapidly prototyped and deployed.

There are a whole series of technical questions wrapped up in this view, in particular if we are aggregating things that are on the web, how did they get there in the first place, and what authoring tools will we need to pull them together. I’ll try to start on that in a follow-up post.

Reblog this post [with Zemanta]

Why good intentions are not enough to get negative results published

There are a set of memes that seem to be popping up with increasing regularity in the last few weeks. The first is that more of the outputs of scientific research need to be published. Sometimes this means the publication of negative results, other times it might mean that a community doesn’t feel they have an outlet for their particular research field. The traditional response to this is “we need a journal” for this. Over the years there have been many attempts to create a “Journal of Negative Results”. There is a Journal of Negative Results – Ecology and Evolutionary Biology (two papers in 2008), a Journal of Negative Results in Biomedicine (four papers in 2009, actually looks pretty active) , a Journal of Interesting Negative Results in Natural Language (one paper), and a Journal of Negative Results in Speech and Audio Sciences, which appears to be defunct.

The idea is that there is a huge backlog of papers detailing negative results that people are gagging to get out if only there was somewhere to publish them. Unfortunately there are several problems with this. The first is that actually writing a paper is hard work. Most academics I know do not have the problem of not having anything to publish, they have the problem of getting around to writing the papers, sorting out the details, making sure that everything is in good shape. This leads to the second problem, that getting a negative result to a standard worthy of publication is much harder than for a positive result. You only need to make that compound, get that crystal, clone that gene, get the microarray to work once and you’ve got the data to analyse for publication. To show that it doesn’t work you need to repeat several times, make sure your statistics are in order, and establish your working condition. Partly this is a problem with the standards we apply to recording our research; designing experiments so that negative results are well established is not high on many scientists’ priorities. But partly it is the nature of beast. Negative results need to be much more tightly bounded to be useful .

Finally, even if you can get the papers, who is going to read them? And more importantly who is going to cite them? Because if no-one cites them then the standing of your journal is not going to be very high. Will people pay to have papers published there? Will you be able to get editors? Will people referee for you? Will people pay for subscriptions? Clearly this journal will be difficult to fund and keep running. And this is where the second meme comes in, one which still gets suprising traction, that “publishing on the web is free”. Now we know this isn’t the case, but there is a slighlty more sophisticated approach which is “we will be able to manage with volunteers”. After all with a couple of dedicated editors donating the time, peer review being done for free, and authors taking on the formatting role, the costs can be kept manageable surely? Some journals do survive on this business model, but it requires real dedication and drive, usually on the part of one person. The unfortunate truth is that putting in a lot of your spare time to support a journal which is not regarded as high impact (however it is measured) is not very attractive.

For this reason, in my view, these types of journals need much more work put into the business model than for a conventional specialist journal. To have any credibility in the long term you need a business model that works for the long term. I am afraid that “I think this is really important” is not a business model, no matter how good your intentions. A lot of the standing of a journal is tied up with the author’s view of whether it will still be there in ten years time. If that isn’t convincing, they won’t submit, if they don’t submit you have no impact, and in the long term a downward spiral until you have no journal.

The fundamental problem is that the “we need a journal” approach is stuck in the printed page paradigm. To get negative results published we need to reduce the barriers to publication much lower than they currently are, while at the same time applying either a pre- or post-publication filter. Rich Apodaca, writing on Zusammen last week talked about micropublication in chemistry, the idea of reducing the smallest publishable unit by providing routes to submit smaller packages of knowledge or data to some sort of archive. This is technically possible today, services like ChemSpider, NMRShiftDB, and others make it possible to submit small pieces of information to a central archive. More generally the web makes it possible to publish whatever we want, in whatever form we want, but hopefully semantic web tools will enable us to do this in an increasingly more useful form in the near future.

Fundamentally my personal belief is that the vast majority of “negative results” and other journals that are trying to expand the set of publishable work will not succeed. This is precisely because they are pushing the limits of the “publish through journal” approach by setting up a journal. To succeed these efforts need to embrace the nature of the web, to act as a web-native resource, and not as a printed journal that happens to be viewed in a browser. This does two things, it reduces the barrier to authors submitting work, making the project more likely to be successful, and it can also reduce costs. It doesn’t in itself provide a business model, nor does it provide quality assurance, but it can provide a much richer set of options for developing both of these that are appropriate to the web. Routes towards quality assurance are well established, but suffer from the ongoing problem of getting researchers involved in the process, a subject for another post. Micropublication might work through micropayments, the whole lab book might be hosted for a fee with a number of “publications” bundled in, research funders may pay for services directly, or more interestingly the archive may be able to sell services built over the top of the data, truly adding value to the data.

But the key is a low barriers for authors and a robust business model that can operate even if the service is perceived as being low impact. Without these you are creating a lot of work for yourself, and probably a lot of grief. Nothing comes free, and if there isn’t income, that cost will be your time.