scholarly-publishing – Page 2 – Science in the Open

July 22, 2010July 22, 2010

Driving UK Research – Is copyright a help or a hindrance?

: Image via Wikipedia

The following is my contribution to a collection prepared by the British Library and released today at the Wellcome Trust, called â€œDriving UK Research. Is copyright a help or a hindrance?â€Â -Â Press Release â€“ Document[pdf] – which is being released under a CC-BY-NC license. The British Library kindly allowed authors to retain copyright on their contributions so I am here releasing the text into the public domain via a CCZero waiver. I would also like to acknowledge the contribution of Chris Morrison in editing and improving the piece.

If I want to be confident that this text will be used to its full extent Â I am going to have to republish it separately to this collection. Not because the collection uses Â restrictive rights management or licences, it actually uses a relatively liberal copyright licence. No, the problem is copyright itself and the way it interacts with how we create knowledge in the 21^st century.

Until recently we would use texts or data by reading, taking notes, making photocopies, and then writing down new insights. We would refer to the originals by citing them. A person making limited copies or taking notes (perhaps quoting the text) does not breach copyright because of the notion of â€œfair dealingâ€. Making copies of reasonable portions of a work is explicitly not a violation of copyright. If it were we wouldnâ€™t be able to do any useful work at all.

Today, scholarship and research cannot effectively proceed via manual human processes. There is simply too much for us to handle. On the other hand we have excellent computer systems that can, to some extent at least, take these notes for us. Automated assistants that can read the text for us, that can do text mining, data aggregation and indexing allowing us to cope with the volume of information. As these tools improve we have an opportunity to radically increase the speed of the innovation cycle, using the human brain for what it is best at: insight and creative thinking; and using machines for what they are best at: indexing, checking, collecting.

The problem is that to do this those machines need to take a copy of the whole of the text and in doing so they trigger copyright. Even though the collection you are reading is released under a Creative Commons licence that allows non-commercial use, no-one can take a copy, find an interesting sentence, and then index it if they are going to make money. Google are not allowed to check what is here and index it for us.

Or perhaps they are. Perhaps this does come under â€œfair useâ€ in the US. Or maybe it does, but not in the UK. What about Australia? Or Brazil? All with slightly different copyright law and a slightly different relationship between copyright and contract law. Even if current legal opinion says it is allowed a future court case could change that. The only way I can be sure that my text is available into the future is to give up the copyright altogether.

To build effectively on the scientific and cultural data being generated today we need computers. If a human were doing the job it would clearly be covered by fair dealing. What we need is a clear and explicit statement that machine based analysis for the purpose of indexing, mining, or collecting references is a fair dealing exception, even where a full copy is taken. There clearly need to be boundaries. The entire work should not be kept or distributed. As with existing fair dealing we could have guidelines on amounts kept or quoted: perhaps no more than 5% of a work. These could easily be developed and be compatible with existing fair dealing guidance.

We risk stifling the development of new tools, both commercial and academic, and new knowledge under the weight of a legal regime that was designed to cope with the printing press. At the same time a simple statement that this kind of analysis is fair dealing will provide certainty without damaging the interests of copyright holders or complicating copyright law. These new uses will ultimately bring more traffic, and perhaps more customers, to the primary documents. By taking the simple and easy step of making automated analysis an allowable fair dealing exception everyone wins.

May 19, 2010

Implementing the â€œPublication as Aggregationâ€

: Image by cameronneylon via Flickr

I wrote a few weeks back about the idea of re-imagining the formally published scientific paper as an aggregation of objects. The idea behind this is that it provides both continuity, through enabling the display of such papers in more or less the same way as we do currently, enhancing functionality by, for instance, embedding active versions of figures in a native form, and at the same time providing a route towards a linked data web for research.

Fundamentally the idea is that we publish fragments, and then aggregate these fragments together. The mechanism of aggregation is supposed to be an expanded version of the familiar paradigm of citation: the introduction of a paper will link to and cite other papers as usual, and these would be incorporated as links and citations within the presentation of the paper. But in addition the paper will cite the introduction. By default the introduction would be included in the view of the paper presented to the user, but equally the user might choose to only look at figures, only conclusions, or only the citations. Figures would be citations to data, again available on the web, again with a default visualization that might be an embedded active graph or a static image.

I asserted that the tools for achieving this are more or less in place. Actually that is only half true. The tools for storing, displaying, and even to some extent archiving communications in this form do exist, at least in the form of examples.

An emerging standard for aggregated objects on the web is the Open Archives Initiative â€“ Object Re-use and Exchange (OAI-ORE). The OAI-ORE object is a generic description of the address of a series of things on the web and how they relate to each other. It is the natural approach for representing the idea of a paper as an aggregation of pieces. The OAI-ORE object itself is RDF with no concept of how it should be displayed or used. It is just a set of things, each labeled with their role in the overall object. In principle at least makes it straightforward to display it in any number of ways. A simple example would be converting OAI-ORE to the NLM-DTD XML format. The devil as always, is in the detail here but this makes a good first pass technical requirement for the details of how the pieces of the OAI-ORE object are described; it must be straightforward to convert to NLM-DTD.

Once we have both the collection of pieces, their relationship to each other, and the idea that we can choose to display some or all of these pieces in any way we choose then a lot of the rest falls into place. Figures can be data objects, which have a default visualization method. These visualizations can be embedded in the way which is now familiar with audio and video files. But equally references to gene names, structures, and chemical entities could be treated the same way. Want the chemical name? Just click a button, the visualization tool will deliver that. Want the structure? Again, just click the button, toggle the menu, or write the script to ask for it in that form if you are doing that kind of thing. We would need more open standards for embedding objects; probably less Flash, and more open standards but thatâ€™s a fairly minor issue.

There needs to be some communication between the citing object (the paper) and the cited object (the data, figure, text, external reference). This could be built up from the TrackBack or Pingback protocols. There also needs to be default content negotiation: â€œI want this data, what can you give me? Graph? Table?…ok Iâ€™ll take the graphâ€¦â€ Thatâ€™s just a RESTful API, something which is more or less standard for the consumer web data services but which is badly missing on the research web. None of this is actually terribly difficult and there are good tools out there to do it.

But I said that we only had half solved. The other side of the problem is good authoring and publishing tools. All of the above assumes that these OAI-ORE objects exist or can be easily built, and that the pieces we want to aggregate are already on the web, ready to be pointed at and embedded. They are not. We have two fundamental problems. First we have to get these things onto the web in a useful and re-useable form. Some of this can be done with existing data services such as ChemSpider, Genbank, PubChem, GEO, etc. but that is the easy end of the problem. The hard bit is the heterogenous mass of pieces of data, Excel spreadsheets, CSV files, XML and binaries, that make up the majority of the research outputs we generate.

Publication could be made easy, using automatic upload tools and lightweight data services that provide a place for them on the web. The criticism is often made that â€œjust publishingâ€ is not enough because there is no context. What is often missed is that the best way to provide context is for the person who generated the research object to link it in to a larger record. The catch is, for that to be useful they have to publish it to the web first, otherwise the record they create points at a local and inaccessible object. So we need tools that simply push the raw material up onto the web, probably in the short to medium term to secure servers, but ones where the individual objects can be made public at some point.

So the other tools we need are for authoring these documents. These will look and behave like a Word Processor (or like a LaTeX document for those who prefer that route) but with a clever reference manager and citation creator. Today our reference libraries only contain papers. But imagine that your library contained all of the data youâ€™ve generated as well and that the easiest way to write up your lab notebook was to simply right click and include the reference, select the visualization that you want and youâ€™re done. All the details of default visualizations, of where the data really is, of adding the record to the OAI-ORE root node, all of this is done for you behind the scenes. You might need to select lumps of text to say what their role is, probably using some analogue of styles, similar to the way that the Integrated Content Environment (ICE) system does.

This environment could be built for Word, for Open Office, for LaTeX. One of the reasons I remain excited about Google Wave is that it should be relatively easy to prototype such an environment, the hooks are already there in a way that they arenâ€™t in traditional document authoring tools because Wave is much more web native. It will however take quite a lot of work. There is also a chicken and egg problem in that such an environment isnâ€™t a whole lot of use without the published objects to aggregate together, and the publication services to provide rich views of the final aggregated documents.Â It will be quite a lot of work to build all of these pieces, and it will take some time before the benefits become clear. But I think it is a direction that is worth pursuing because it takes the best of what we already know works on the web and applies it to an evolutionary adaption of the communication style that is already familiar. The revolution comes once the pieces are there for people to work with in new ways.

Sefton, Peter: ICE and word processor / HTML interop, the ugly, uglier, ugliest (ptsefton.com)
Main Articles: ‘Enhancing Scientific Communication through Aggregated Publications Environments’, Ariadne Issue 61 (ariadne.ac.uk)
Bigwood, David: Exposing OAI Content (catalogablog.blogspot.com)

November 16, 2009December 30, 2009

Nature Communications Q&A

A few weeks ago I wrote aÂ postÂ looking at the announcement ofÂ Nature Communications, a new journal fromÂ Nature Publishing GroupÂ that will be online only and have an open access option.Â Grace Baynes, fromthe Â NPG communications team kindly offered to get some of the questions raised in that piece answered and I am presenting my questions and the answers from NPG here in their complete form. I will leave any thoughts and comments on the answers for another post. There has also been more information from NPG available at theÂ journal websiteÂ since my original post, some of which is also dealt with below. Below this point, aside from formatting I have left the response in its original form.

Q: What is the motivation behind Nature Communications? Where did the impetus to develop this new journal come from?

NPG has always looked to ensure it is serving the scientific community and providing services which address researchers changing needs. The motivation behind Nature Communications is to provide authors with more choice; both in terms of where they publish, and what access model they want for their papers.At present NPG does not provide a rapid publishing opportunity for authors with high-quality specialist work within the Nature branded titles. The launch of Nature Communications aims to address that editorial need. Further, Nature Communications provides authors with a publication choice for high quality work, which may not have the reach or breadth of work published in Nature and the Nature research journals, or which may not have a home within the existing suite of Nature branded journals. At the same time authors and readers have begun to embrace online only titles â€“ hence we decided to launch Nature Communications as a digital-first journal in order to provide a rapid publication forum which embraces the use of keyword searching and personalisation. Developments in publishing technology, including keyword archiving and personalization options for readers, make a broad scope, online-only journal like Nature Communications truly useful for researchers.

Over the past few years there has also been increasing support by funders for open access, including commitments to cover the costs of open access publication. Therefore, we decided to provide an open access option within Nature Communications for authors who wish to make their articles open access.

Q: What opportunities does NPG see from Open Access? What are the most important threats?

Opportunities: Funder policies shifting towards supporting gold open access, and making funds available to cover the costs of open access APCs. These developments are creating a market for journals that offer an open access option.Threats: That the level of APCs that funders will be prepared to pay will be too low to be sustainable for journals with high quality editorial and high rejection rates.

Q: Would you characterise the Open Access aspects of NC as a central part of the journal strategy

Yes. We see the launch of Nature Communications as a strategic development.Nature Communications will provide a rapid publication venue for authors with high quality work which will be of interest to specialists in their fields. The title will also allow authors to adhere to funding agency requirements by making their papers freely available at point of publication if they wish to do so.

or as an experiment that is made possible by choosing to develop a Nature branded online only journal?

NPG doesnâ€™t view Nature Communications as experimental. Weâ€™ve been offering open access options on a number of NPG journals in recent years, and monitoring take-up on these journals. Weâ€™ve also been watching developments in the wider industry.

Q: What would you give as the definition of Open Access within NPG?

Itâ€™s not really NPGâ€™s focus to define open access. Weâ€™re just trying to offer choice to authors and their funders.

Q: NPG has a number of “Open Access” offerings that provide articles free to the user as well as specific articles within Nature itself under a Creative Commons Non-commercial Share-alike licence with the option to authors to add a “no derivative works” clause. Can you explain the rationale behind this choice of licence?

Again, itâ€™s about providing authors with choice within a framework of commercial viability.On all our journals with an open access option, authors can choose between the Creative Commons AttribuÂtion Noncommercial Share Alike 3.0 Unported Licence and the Creative Commons Attribution-Non-commerÂcial-No Derivs 3.0 Unported Licence.The only instance where authors are not given a choice at present are genome sequences articles published in Nature and other Nature branded titles, which are published under Creative Commons AttribuÂtion Noncommercial Share Alike 3.0 Unported Licence. No APC is charged for these articles, as NPG considers making these freely available an important service to the research community.

Q: Does NPG recover significant income by charging for access or use of these articles for commercial purposes? What are the costs (if any) of enforcing the non-commercial terms of licences? Does NPG actively seek to enforce those terms?

Weâ€™re not trying to prevent derivative works or reuse for academic research purposes (as evidenced by our recent announcementÂ that NPG author manuscripts would be included in UK PMCâ€™s open access subset).What we are trying to keep a cap on is illegal e-prints and reprints where companies may be using our brands or our content to their benefit. Yes we do enforce these terms, and we have commercial licensing and reprints services available.

Q: What will the licence be for NC?

Authors who wish to take for the open access option can choose either the Creative Commons AttribuÂtion Noncommercial Share Alike 3.0 Unported Licence or the Creative Commons Attribution-Non-commerÂcial-No Derivs 3.0 Unported Licence.Subscription access articles will be published under NPGâ€™s standardÂ License to Publish.

Q: Would you accept that a CC-BY-NC(ND) licence does not qualify as Open Access under the terms of the Budapest and Bethesda Declarations because it limits the fields and types of re-use?

Yes, we do accept that. But we believe that we are offering authors and their funders the choices they require.Our licensing terms enable authors to comply with, or exceed, the public access mandates of all major funders.

Q: The title “Nature Communications” implies rapid publication. The figure of 28 days from submission to publication has been mentioned as a minimum. Do you have a target maximum or indicative average time in mind?

We are aiming to publish manuscripts within 28 days of acceptance, contrary to an earlier report which was in error. In addition, Nature Communications will have a streamlined peer review system which limits presubmission enquiries, appeals and the number of rounds of review â€“ all of which will speed up the decision making process on submitted manuscripts.

Q: In the press release an external editorial board is described. This is unusual for a Nature branded journal. Can you describe the makeup and selection of this editorial board in more detail?

In deciding whether to peer review manuscripts, editors may, on occasion, seek advice from a member of the Editorial Advisory Panel. However, the final decision rests entirely with the in-house editorial team. This is unusual for a Nature-branded journal, but in fact, Nature Communications is simply formalising a well-established system in place at other Nature journals.The Editorial Advisory Panel will be announced shortly and will consist of recognized experts from all areas of science. Their collective expertise will support the editorial team in ensuring that every field is represented in the journal.

Q: Peer review is central to the Nature brand, but rapid publication will require streamlining somewhere in the production pipeline. Can you describe the peer review process that will be used at NC?

The peer review process will be as rigorous as any Nature branded title â€“ Nature Communications will only publish papers that represent a convincing piece of work. Instead, the journal will achieve efficiencies by discouraging presubmission enquiries, capping the number of rounds of review, and limiting appeals on decisions. This will enable the editors to make fast decisions at every step in the process.

Q: What changes to your normal process will you implement to speed up production?

The production process will involve a streamlined manuscript tracking system and maximise the use of metadata to ensure manuscripts move swiftly through the production process. All manuscripts will undergo rigorous editorial checks before acceptance in order to identify, and eliminate, hurdles for the production process. Alongside using both internal and external production staff we will work to ensure all manuscripts are published within 28days of acceptance â€“ however some manuscripts may well take longer due to unforeseen circumstances. We also hope the majority of papers will take less!

Q: What volume of papers do you aim to publish each year in NC?

As Nature Communications is an online only title the journal is not limited by page-budget. As long as we are seeing good quality manuscripts suitable for publication following peer review we will continue to expand. We aim to launch publishing 10 manuscripts per month and would be happy remaining with 10-20 published manuscripts per month but would equally be pleased to see the title expand as long as manuscripts were of suitable quality.

Q: The Scientist article says there would be an 11 page limit. Can you explain the reasoning behind a page limit on an online only journal?

Articles submitted to Nature Communications can be up to 10 pages in length. Any journal, online or not, will consider setting limits to the â€˜printed paperâ€™ size (in PDF format) primarily for the benefit of the reader. Setting a limit encourages authors to edit their text accurately and succinctly to maximise impact and readability.

Q: The press release description of pap
ers for NC sounds very similar to papers found in the other “Nature Baby” journals, such as Nature Physics, Chemistry, Biotechnology, Methods etc. Can you describe what would be distinctive about a paper to make it appropriate for NC? Is there a concern that it will compete with other Nature titles?

Nature Communications will publish research of very high quality, but where the scientific reach and public interest is perhaps not that required for publication in Nature and the Nature research journals. We expect the articles published in Nature Communications to be of interest and importance to specialists in their fields. This scope of Nature Communications also includes areas like high-energy physics, astronomy, palaeontology and developmental biology, that aren’t represented by a dedicated Nature research journal.

Q: To be a commercial net gain NC must publish papers that would otherwise have not appeared in other Nature journals. Clearly NPG receives many such papers that are not published but is it not that case that these papers are, at least as NPG measures them, by definition not of the highest quality? How can you publish more while retaining the bar at its present level?

Nature journals have very high rejection rates, in many cases well over 90% of what is submitted. A proportion of these articles are very high quality research and of importance for a specialist audience, but lack the scientific reach and public interest associated with high impact journals like Nature and the Nature research journals. The best of these manuscripts could find a home in Nature Communications. In addition, we expect to attract new authors to Nature Communications, who perhaps have never submitted to the Nature family of journals, but are looking for a high quality journal with rapid publication, a wide readership and an open access option.

Q: What do you expect the headline subscription fee for NC to be? Can you give an approximate idea of what an average academic library might pay to subscribe over and above their current NPG subscription?

We havenâ€™t set prices for subscription access for Nature Communications yet, because we want them to base them on the number of manuscripts the journal may potentially publish and the proportion of open access content. This will ensure the site licence price is based on absolute numbers of manuscripts available through subscription access. Weâ€™ll announce these in 2010, well before readers or librarians will be asked to pay for content.

Q: Do personal subscriptions figure significantly in your financial plan for the journal?

No, there will be no personal subscriptions for Nature Communications. Nature Communications will publish no news or other â€˜front half contentâ€™, and we expect many of the articles to be available to individuals via the open access option or an institutional site license. If researchers require access to a subscribed-access article that is not available through their institution or via the open-access option, they have the option of buying the article through traditional pay-per-view and docuÂment-delivery options. For a journal with such a broad scope, we expect individuals will want to pick and choose the articles they pay for.

Q: What do you expect author charges to be for articles licensed for free re-use?

$5,000 (The Americas)â‚¬3,570 (Europe)Â¥637,350 (Japan)Â£3,035 (UK and Rest of World)Manuscripts accepted before April 2010 will receive a 20% discount off the quoted APC.

Q: Does this figure cover the expected costs of article production?

This is a flat fee with no additional production charges (such as page or colour figure charges). The article processing charges have been set to cover our costs, including article production.

Q: The press release states that subscription costs will be adjusted to reflect the take up of the author-pays option. Can you commit to a mechanistic adjustment to subscription charges based on the percentage of author-pays articles?

We are working towards a clear pricing principle for Nature Communications, using input from NESLi and others. Because the amount of subscription content may vary substantially from year to year, an entirely mechanistic approach may not give libraries the ability to they need to forecast with confidence.

Q: Does the strategic plan for the journal include targets for take-up of the author-pays option? If so can you disclose what those are?

We have modelled Nature Communications as an entirely subscription access journal, a totally open access journal, and continuing the hybrid model on an ongoing basis. The business model works at all these levels.

Q: If the author-pays option is a success at NC will NPG consider opening up such options on other journals?

We already have open access options on more than 10 journals, and we have recently announced the launch in 2010 of a completely open access journal, Cell Death & Disease. In addition, we publish the successful open access journal Molecular Systems Biology, in association with the European Molecular Biology OrganizationWeâ€™re open to new and evolving business models where it is sustainable.The rejection rates on Nature and the Nature research journals are so high that we expect the APC for these journals would be substantially higher than that for Nature Communications.

Q: Do you expect NC to make a profit? If so over what timeframe?

As with all new launches we would expect Nature Communications to be financially viable during a reasonable timeframe following launch.

Q: In five years time what are the possible outcomes that would be seen at NPG as the journal being a success? What might a failure look like?

We would like to see Nature Communications publish high quality manuscripts covering all of the natural sciences and work to serve the research community. The rationale for launching this title is to ensure NPG continues to serve the community with new publishing opportunities.A successful outcome would be a journal with an excellent reputation for quality and service, a good impact factor, a substantial archive of published papers that span the entire editorial scope and significant market share.

October 5, 2009January 3, 2010

Nature Communications: A breakthrough for open access?

A great deal of excitement but relatively little detailed information thus far has followed the announcement by Nature Publishing Group of a new online only journal with an author-pays open access option. NPG have managed and run a number of open access (although see caveats below) and hybrid journals as well as online only journals for a while now. What is different about Nature Communications is that it will be the first clearly Nature-branded journal that falls into either of these categories.

This is significant because it is bringing the Nature brand into the mix. Stephen Inchcoombe, executive director of NPG in email correspondence quoted in the The Scientist, notes the increasing uptake of open-access options and the willingness of funders to pay processing charges for publication as major reasons for NPG to provide a wider range of options.

In the NPG press release David Hoole, head of content licensing for NPG says:

“Developments in publishing and web technologies, coupled with increasing commitment by research funders to cover the costs of open access, mean the time is right for a journal that offers editorial excellence and real choice for authors.”

The reference to “editorial excellence” and the use of the Nature brand are crucial here and what makes this announcement significant. The question is whether NPG can deliver something novel and successful.

The journal will be called Nature Communications. “Communications” is a moniker usually reserved for “rapid publication” journals. At the same time the Nature brand is all about exclusivity, painstaking peer review, and editorial work. Can these two be reconciled successfully and, perhaps most importantly, how much will it cost? In the article in The Scientist a timeframe of 28 days from submission to publication is mentioned but as a minimum period. Four weeks is fast, but not super-fast for an online only journal.

But speed is not the only criterion. Reasonably fast and with a Nature brand may well be good enough for many, particularly those who have come out of the triage process at Nature itself. So what of that branding – where is the new journal pitched? The press release is a little equivocal on this:

Nature Communications will publish research papers in all areas of the biological, chemical and physical sciences, encouraging papers that provide a multidisciplinary approach. The research will be of the highest quality, without necessarily having the scientific reach of papers published in Nature and the Nature research journals, and as such will represent advances of significant interest to specialists within each field.

So more specific – less general interest, but still “the highest quality”. This is interesting because there is an argument that this could easily cannibalise the “Nature Baby” journals. Why wait for Nature Biotech or Nature Physics when you can get your paper out faster in Nature Communications? Or on the other hand might it be out-competed by the other Nature journals – if the selection criteria are more or less the same, highest quality but not of general interest, why would you go for a new journal over the old favourites? Particularly if you are the kind of person that feels uncomfortable with online only journals.

If the issue of the selectivity difference between the old and the new Nature journals then the peer review process can perhaps offer us clues. Again some interesting but not entirely clear statements in the press release:

A team of independent editors, supported by an external editorial advisory panel, will make rapid and fair publication decisions based on peer review, with all the rigour expected of a Nature-branded journal.

This sounds a little like the PLoS ONE model – a large editorial board with the intention of spreading the load of peer review so as to speed it up. With the use of the term “peer review” it is to be presumed that this means external peer review by referees with no formal connection to NPG. Again I would have thought that NPG are very unlikely to dilute their brand by utilising editorial peer review of any sort. Given the slow point of the process is getting a response back from peer reviewers, whether they are reviewing for Nature or for PLoS ONE, its not clear to me how this can be speed up or indeed even changed from the traditional process, without risking a perception of a quality drop. This is going to be a very tough balance to find.

So finally, does this meant that NPG are serious about Open Access? NPG have been running OA and online only journals (although see the caveat below about the licence) for a while now and appear to be serious about increasing this offering. They will have looked very seriously at the numbers before making a decision on this and my reading is that those numbers are saying that they need to have a serious offering. This is a hybrid and it will be easy to make accusations that, along with other fairly unsuccessful hybrid offerings, it is being set up to fail.

I doubt this is the case personally, but nor do I necessarily believe that the OA option will necessarily get the strong support it will need to thrive. The critical question will be pricing. If this is pitched at the level of other hybrid options, too high to be worth what is being offered in terms of access, then it will appear to have been set up to fail. Yet NPG can justifiably charge a premium if they are providing real editorial value.Â Indeed they have to. NPG has in the past said that they would have to charge enormous processing charges to published authors to recover costs of peer review. So they can’t offer something relatively cheap, yet claim the peer review is to the same standards. The price is absolutely critical to credibility. I would guess something around Â£2500 or $US4000. Higher than PLoS Biology/Medicine but lower than other hybrid offerings.

So then the question becomes value for money. Is the OA offering up to scratch? Again the press release is not as enlightening as one would wish:

Authors who choose the open-access option will be able to license their work under a Creative Commons license, including the option to allow derivative works.

So does that mean it will be a non-commercial license? In which case it is not Open Access under the BBB declarations (most explicitly in the Budapest Declaration). This would be consistent with the existing author rights that NPG allows and their current “Open Access” journal licences but in my opinion would be a mistake. If there is any chance of the accusation that this isn’t “real OA” sticking then NPG will make a rod for their own back. And I really can’t see it making the slightest difference to their cost recovery. Equally the option to allow derivative works? The BBB declarations are unequivocal about derivative works being at the core of Open Access. FromÂ a tactical perspective it would be much simpler and easier for them to go for straight CC-BY. It will get support (or at least neutralize opposition) from even the hardline OA community, and it doesn’t leave NPG open to any criticism of muddying the waters. The fact that such a journal is being released shows that NPG gets the growing importance of Open Access publication. This paragraph, in its current form, suggests that the organization as a whole hasn’t internalised the messages about why. There are people within NPG who get this through and through but this paragraph suggests to me that that understanding has not got far enough within the organisation to make this journal a success. The lack of mention of a specific licence is a red rag and an entirely unnecessary one.

So in summary the outlook is positive. The efforts of the OA movement are having an impact at the highest levels amongst traditional publishers. Whether you view this as a positive or a negative response it is a success in my view that NPG feels that a response is necessary. But the devil is in the details. Critical to both the journal’s success and the success of this initiative as a public relations exercise will be the pricing, the licence and acceptance of the journal by the OA movement. The press release is not as promising on these issues as might be hoped. But it is early days yet and no doubt there will be more information to come as the journal gets closer to going live.

There is a Nature Network Forum for discussions of Nature Communications which will be a good place to see new information as it comes out.

September 21, 2009December 30, 2009

Show us the data now damnit! Excuses are running out.

A very interesting paper from Caroline Savage and Andrew Vickers was published in PLoS ONE last week detailing an empirical study of data sharing of PLoS journal authors. The results themselves, that one out ten corresponding authors provided data, are not particularly surprising, mirroring as they do previous studies, both formal [pdf] and informal (also from Vickers, I assume this is a different data set), of data sharing.

Nor are the reasons why data was not shared particularly new. Two authors couldn’t be tracked down at all. Several did not reply and the remainder came up with the usual excuses; “too hard”, “need more information”, “university policy forbids it”. The numbers in the study are small and it is a shame it wasn’t possible to do a wider study that might have teased out discipline, gender, and age differences in attitude. Such a study really ought to be done but it isn’t clear to me how to do it effectively, properly, or indeed ethically. The reason why small numbers were chosen was both to focus on PLoS authors, who might be expected to have more open attitudes, and to make the request from the authors, that the data was to be used in a Master educational project, plausible.

So while helpful, the paper itself isn’t doesn’t provide much that is new. What will be interesting will be to see how PLoS responds. These authors are clearly violating stated PLoS policy on data sharing (see e.g. PLoS ONE policy). The papers should arguably be publicly pulled from the journals. Most journals have similar policies on data sharing, and most have no corporate interest in actually enforcing them. I am unaware of any cases where a paper has been retracted due to the authors unwillingness to share (if there are examples I’d love to know about them! [Ed. Hilary Spencer from NPG pointed us in the direction of some case studies in a presentation from Philip Campbell).

Is it fair that a small group be used as a scapegoat? Is it really necessary to go for the nuclear option and pull the papers? As was said in a Friendfeed discussion thread on the paper: “IME [In my experience] researchers are reeeeeeeally good at calling bluffs. I think there’s no other way“. I can’t see any other way of raising the profile of this issue. Should PLoS take the risk of being seen as hardline on this? Risking the consequences of people not sending papers there because of the need to reveal data?

The PLoS offering has always been about quality, high profile journals delivering important papers, and at PLoS ONE critical analysis of the quality of the methodology. The perceived value of that quality is compromised by authors who do not make data available. My personal view is that PLoS would win by taking a hard line and the moral high ground. Your paper might be important enough to get into Journal X, but is the data of sufficient quality to make it into PLoS ONE? Other journals would be forced to follow – at least those that take quality seriously.

There will always be cases where data can not or should not be available. But these should be carefully delineated exceptions and not the rule. If you can’t be bothered putting your data into a shape worthy of publication then the conclusions you have based on that data are worthless. You should not be allowed to publish. End of. We are running out of excuses. The time to make the data available is now. If it isn’t backed by the data then it shouldn’t be published.

Update: It is clear from this editorial blog post from the PLoS Medicine editors that PLoS do not in fact know which papers are involved.Â As was pointed out by Steve Koch in the friendfeed discussion there is an irony that Savage and Vickers have not, in a sense, provided their own raw data i.e. the emails and names of correspondents. However I would accept that to do so would be a an unethical breach of presumed privacy as the correspondents might reasonably have expected these were private emails and to publish names would effectively be entrapment. Life is never straightforward and this is precisely the kind of grey area we need more explicit guidance on.

Savage CJ, Vickers AJ (2009) Empirical Study of Data Sharing by Authors Publishing in PLoS Journals. PLoS ONE 4(9): e7078. doi:10.1371/journal.pone.0007078

Full disclosure: I am an academic editor for PLoS ONE and have raised the issue of insisting on supporting data for all charts and graphs in PLoS ONE papers in the editors’ forum. There is also a recent paper with my name on in which the words “data not shown” appear. If anyone wants that data I will make sure they get it, and as soon as Nature enable article commenting we’ll try to get something up there. The usual excuses apply, and don’t really cut the mustard.

September 7, 2009December 30, 2009

A question of trust

I have long being sceptical of the costs and value delivered by our traditional methods of peer review. This is really on two fronts, firstly that the costs, where they have been estimated are extremely high, representing a multi-billion dollar subsidy by governments of the scholarly publishing industry. Secondly the value that is delivered through peer review, the critical analysis of claims, informed opinion on the quality of the experiments, is largely lost. At best it is wrapped up in the final version of the paper. At worst it is simply completely lost to the final end user. A part of this, which the more I think about the more I find bizarre is that the whole process is carried on under a shroud of secrecy. This means that as an end user, as I do not know who the peer reviewers are, and do not necessarily know whatÂ process has been followed or even the basis of the editorial decision to publish. As a result I have no means of assessing the quality of peer review for any given journal, let alone any specific paper.

Those of us who see this as a problem have a responsibility to provide credible and workable alternatives to traditional peer review. So far despite many ideas we haven’t, to be honest, had very much success. Post-publication commenting, open peer review, and Digg like voting mechanisms have been explored but have yet to have any large success in scholarly publishing. PLoS is leading the charge on presenting article level metrics for all of its papers, but these remain papers that have also been through a traditional peer review process. Very little that is both radical with respect to the decision and means of publishing and successful in getting traction amongst scientists has been seen as yet.

Out on the real web it has taken non-academics to demonstrate the truly radical when it comes to publication. Whatever you may think of the accuracy of Wikipedia in your specific area, and I know it has some weaknesses in several of mine, it is the first location that most people find, and the first location that most people look for, when searching for factual information on the web. Roderic Page put up some interesting statistics when he looked this week at the top hits for over 5000 thousand mammal names in Google. Wikipedia took the top spot 48% of the time and was in the top 10 in virtually every case (97%). If you want to place factual information on the web Wikipedia should be your first port of call. Anything else is largely a waste of your time and effort. This doesn’t incidentally mean that other sources are not worthwhile or have a place, but that people need to work with the assumption that people’s first landing point will be Wikipedia.

“But”, I hear you say, “how do we know whether we can trust a given Wikipedia article, or specific statements in it?”

The traditional answer has been to say you need to look in the logs, check the discussion page, and click back the profiles of the people who made specific edits. However this in inaccessible to many people, simply because they do not know how to process the information. Very few universities have an “Effective Use of Wikipedia 101” course. Mostly because very few people would be able to teach it.

So I was very interested in an article on Mashable about marking up and colouring Wikipedia text according to its “trustworthiness”. Andrew Su kindly pointed me in the direction of the group doing the work and their papers and presentations. The system they are using, which can be added to any MediaWiki installation measures two things, how long a specific piece of text has stayed in situ, and who either edited it, or left it in place. People who write long lasting edits get higher status, and this in turn promotes the text that they have “approved” by editing around but not changing.

This to me is very exciting because it provides extra value and information for both users and editors without requiring anyone to do any more work than install a plugin. The editors and writers simply continue working as they have. The user can access an immediate view of the trustworthiness of the article with a high level of granularity, essentially at the level of single statements. And most importantly the editor gets a metric, a number that is consistently calculated across all editors, that they can put on a CV. Editors are peer reviewers, they are doing review, on a constantly evolving and dynamic article that can both change in response to the outside world and also be continuously improved. Not only does the Wikipedia process capture most of the valuable aspects of traditional peer review, it jettisons many of the problems. But without some sort of reward it was always going to be difficult to get professional scientists to be active editors. Trust metrics could provide that reward.

Now there are many questions to ask about the calculation of this “karma” metric, should it be subject biased so we know that highly ranked editors have relevant expertise, or should it be general so as to discourage highly ranked editors from modifying text that is outside of their expertise? What should the mathematics behind it be? It will take time clearly for such metrics to be respected as a scholarly contribution, but equally I can see the ground shifting very rapidly towards a situation where a lack of engagement, a lack of interest in contributing to the publicly accessible store of knowledge, is seen as a serious negative on a CV. However this particular initiative pans out it is to me this is one of the first and most natural replacements for peer review that could be effective within dynamic documents, solving most of the central problems without requiring significant additional work.

I look forward to the day when I see CVs with a Wikipedia Karma Rank on them. If you happen to be applying for a job with me in the future, consider it a worthwhile thing to include.

August 25, 2009January 3, 2010

Some (probably not original) thoughts about originality

A number of things have prompted me to be thinking about what makes a piece of writing “original” in a web based world where we might draft things in the open, get informal public peer review, where un-refereed conference posters can be published online, and pre-print servers of submitted versions of papers are increasingly widely used. I’m in the process of correcting an invited paper that derives mostly from a set of blog posts and had to revise another piece because it was too much like a blog post but what got me thinking most was a discussion on the PLoS ONE Academic Editors forum about the originality requirements for PLoS ONE.

In particular the question arose of papers that have been previously peer reviewed and published, but in journals that are not indexed or that very few people have access to. Many of us have one or two papers in journals that are essentially inaccessible, local society journals or just journals that were never online, and never widely enough distributed for anyone to find. I have a paper in Complex Systems (volume 17, issue 4 since you ask) that is not indexed in Pubmed, only available in a preprint archive and has effectively no citations. Probably because it isn’t in an index and no-one has ever found it.Â But it describes a nice piece of work that we went through hell to publish because we hoped someone might find it useful.

Now everyone agreed, and this is what the PLoS ONE submission policy says quite clearly, that such a paper cannot be submitted for publication. This is essentially a restatement of the Ingelfinger Rule. But being the contrary person I am I started wondering why. For a commercial publisher with a subscripton business model it is clear that you don’t want to take on content that you can’t exert a copyright over, but for a non-profit with a mission to bring science to wider audience does this really make sense? If the science is currently unaccessible and is of appropriate quality for a given journal and the authors are willing to pay the costs to bring it to a wider public, why is this not allowed?

The reason usually given is that if something is “already published” then we don’t need another version. But if something is effectively inaccessible is that not true. Are preprints, conference proceedings, even privately circulated copies, not “already published”. There is also still a strong sense that there needs to be a “version of record”, that there is a potential for confusion with different versions. There is a need for better practice in the citation of different versions of work but this is a problem we already have. Again a version in an obscure place is unlikely to cause confusion. Another reason is that refereeing is a scarce resource that needs to be protected. This points to our failure to publish and re-use referee’s reports within the current system, to actually realise the value that we (claim to) ascribe to them. But again, if the author is willing to pay for this, why should they not be allowed to?

However, in my view, at the core to the rejection of “republication” is an objection to the idea that people might manage to get double credit for a single publication. In a world where the numbers matter people do game the system to maximise the number of papers they have. Credit where credit’s due is a good principle and people feel, rightly, uneasy with people getting more credit for the same work published in the same form. I think there are three answers to this, one social, one technical, and one…well lets just call it heretical.

Firstly placing two versions of a manuscript on the same CV is simply bad practice. Physicists don’t list both the ArXiv and journal versions of papers on their publication lists. In most disciplines, where conference papers are not peer reviewed, they are listed separate to formally published peer reviewed papers in CVs. We have strong social norms around “double counting”. These differ from discipline to discipline as to whether work presented at conferences can be published as a journal paper, whether pre-prints are widely accepted, and how control needs to be exerted over media releases but while there may be differences over what constitutes “the same paper” there are storng social norms that you only publish the same thing once.Â These social norms are at the root of the objection to re-publication.

Secondly the technical identification of duplicate available versions, either deliberately by the authors to avoid potential confusion, or in an investigative roleto identify potential misconduct, is now trivial. A quick search can rapidly identify duplicate versions of papers. I note paranthetically that it would be even easier with a fully open access corpus but where there is either misconduct, or the potential for confusion, tools like Turnitin and Google will sort it out for you pretty quickly.

Finally though, for me the strongest answer to the concern over “double credit” is that this is a deep indication we have the whole issue backwards. Are we really more concerned about someone having an extra paper on their CV than we are about getting the science into the hands of as many people as possible? This seems to me a strong indication that we value the role of the paper as a virtual notch on the bedpost over its role in communicating results. We should never forget that STM publishing is a multibillion dollar industry supported primarily through public subsidy. There are cheaper ways to provide people with CV points if that is all we care about.

This is a place where the author (or funder) pays model really comes it in its own. If an author feels strongly enough that a paper will get to a wider audience in a new journal, if they feel strongly enough that it will benefit from that journal’s peer review process, and they are prepared to pay a fee for that publication, why should they be prevented from doing so? If that publication does bring that science to a wider audience, is not a public service publisher discharging their mission through that publication?

Now I’m not going to recommend this as a change in policy to PLoS. It’s far too radical and would probably raise more problems in terms of public relations than it would solve in terms of science communication. But I do want to question the motivations that lie at the bottom of this traditional prohibition. As I have said before and will probably say over and over (and over) again. We are spending public money here. We need to be clear about what it is we are buying, whether it is primarily for career measurement or communication, and whether we are getting the best possible value for money. If we don’t ask the question, then in my view we don’t deserve the funding.

August 23, 2009January 3, 2010

The Future of the Paperâ€¦does it have one? (and the answer is yes!)

A session entitled “The Future of the Paper” at Science Online London 2009 was a panel made up of an interesting set of people, Lee-Ann Coleman from the British Library, Katharine Barnes the editor of Nature Protocols, Theo Bloom from PLoS and Enrico Balli of SISSA Medialab.

The panelists rehearsed many of the issues and problems that have been discussed before and I won’t re-hash here. My feeling was that the panelists didn’t offer a radical enough view of the possibilities but there was an interesting discussion around what a paper was for and where it was going. My own thinking on this has been recently revolving around the importance of a narrative as a human route into the data. It might be argued that if the whole scientific enterprise could be made machine readable then we wouldn’t need papers. Lee-Ann argued and I agree that the paper as the human readable version will retain an important place. Our scientific model building exploits our particular skill as story tellers, something computers remain extremely poor at.

But this is becoming an increasingly smaller part of the overall record itself. For a growing band of scientists the paper is only a means of citing a dataset or an idea. We need to widen the idea of what the literature is and what it is made up of. To do this we need to make all of these objects stable and citeable. As Phil Lord pointed out this isn’t enough because you also have to make those objects and their citations “count” for career credit. My personal view is that the market in talent will actually drive the adoption of wider metrics that are essentially variations of Page Rank because other metrics will become increasingly useless, and the market will become increasingly efficient as geographical location becomes gradually less important. But I’m almost certainly over optimistic about how effective this will be.

Where I thought the panel didn’t go far enough was in questioning the form of the paper as an object within a journal. Essentially each presentation became “and because there wasn’t a journal for this kind of thing we created/will create a new one”. To me the problem isn’t the paper. As I said above the idea of a narrative document is a useful and important one. The problem is that we keep thinking in terms of journals, as though a pair of covers around a set of paper documents has any relevance in the modern world.

The journal used to play an important role in publication. The publisher still has an important role but we need to step outside the notion of the journal and present different types of content and objects in the best way for that set of objects. The journal as brand may still have a role to play although I think that is increasingly going to be important only at the very top of the market. The idea of the journal is both constraining our thinking about how best to publish different types of research object and distorting the way we do and communicate science. Data publication should be optimized for access to and discoverability of data, software publication should make the software available and useable. Neither are particularly helped by putting “papers” in “journals”. They are helped by creating stable, appropriate publication mechanisms, with appropriate review mechanisms, making them citeable and making them valued. The point at which our response to needing to publish things stops being “well we’d better create a journal for that” then we might just have made it into the 21st century.

But the paper remains the way we tell story’s about and around our science. And if us dumb humans are going to keep doing science then it will continue to be an important part of the way we go about that.

July 19, 2009December 30, 2009

Sci – Bar – Foo etc. Part III – Google Wave Session at SciFoo

Google Wave has got an awful lot of people quite excited. And others are more sceptical. A lot of SciFoo attendees were therefore very excited to be able to get an account on the developer sandbox as part of the weekend. At the opening plenary Stephanie Hannon gave a demo of Wave and, although there were numerous things that didn’t work live, that was enough to get more people interested. On the Saturday morning I organized a session to discuss what we might do and also to provide an opportunity for people to talk about technical issues. Two members of the wave team came along and kindly offered their expertise, receiving a somewhat intense grilling as thanks for their efforts.

I think it is now reasonably clear that there are two short to medium term applications for Wave in the research process. The first is the collaborative authoring of documents and the conversations around those. The second is the use of wave as a recording and analysis platform. Both types of functionality were discussed with many ideas for both. Martin Fenner has also written up some initial impressions.

Naturally we recorded the session in Wave and even as I type, over a week later, there is a conversation going in real time about the details of taking things forward. There are many things to get used to, not leastwhen it is polite to delete other people’s comments and clean them up, but the potential (and the weaknesses and areas for development) are becoming clear.

I’ve pasted our functionality brainstorm at the bottom to give people an idea of what we talked about but the discussion was very wide ranging. Functionality divided into a few categories. Firstly Robots for bringing scientific objects, chemical structures, DNA sequences, biomolecular structures, videos, and images into the wave in a functional form with links back to a canonical URI for the object. In its simplest form this might just provide a link back to a database. So typing “chem:benzene” or “pdb:1ecr” would trigger a robot to insert a link back to the database entry. More complex robots could insert an image of the chemical (or protein structure) or perhaps rdf or microformats that provide a more detailed description of the molecule.

Taking this one step further we also explored the idea of pulling data or status information from larboratory instruments to create a “laboratory dashboard” and perhaps controlling them. This discussion was helpful in getting a feel for what Wave can and can’t do as well as how different functionalities are best implemented. A robot can be built to populate a wave with information or data from laboratory instruments and such a robot could also pass information from the wave back to the instrument in principle. However both of these will still require some form of client running on the instrument side that is capable of talking to the robot web service. So the actual problem of interfacing with the instrument will remain. We can hope that instrument manufacturers might think of writing out nice simple XML log files at some point but in the meantime this is likely to involve hacking things together. If you can manage this then a Gadget will provide a nice way of providing a visual dashboard type interface to keep you updated as to what is happening.

Sharing data analysis is something of significant interest to me and the fact that there is already a robot (called Monty) that will intepret Python is a very interesting starting point for exploring this. There is some basic graphing functionality (Graphy naturally). For me this is where some of the most exciting potential lies; not just sharing printouts or the results of data analysis procedures but the details of the data and a live representation of the process that lead to the results. Expect much more from me on this in the future as we start to take it forward.

The final area of discussion, and the one we probably spent the most time on, was looking at Wave in the authoring and publishing process. Formatting of papers, sharing of live diagrams and charts, automated reference searching and formatting, as well as submission processes, both to journals and to other repositories, and even the running of peer review process were all discussed. This is the area where the most obvious and rapid gains can be made. In a very real sense Wave was designed to remove the classic problem of sending around manuscript versions with multiple figure and data files by email so you would expect it to solve a number of the obvious problems. The interesting thing in my view will be to try it out in anger.

Which was where we finished the session. I proposed the idea of writing a paper, in Wave, about the development and application of tools needed to author papers in Wave. As well as the technical side, such a paper would discuss the user experience, and any of the social issues that arise out of such a live collaborative authoring experience. If it were possible to run an actual peer review process in Wave that would also be very cool however this might not be feasible given existing journal systems. If not we will run a “mock” peer review process and look at how that works. If you are interested in being involved, drop a note in the comments, or join the Google Group that has been set up for discussions (or if you have a developer sandbox account and want access to the Wave drop me a line).

There will be lots of details to work through but the overall feel of the session for me was very exciting and very positive. There will clearly be technical and logistical barriers to be overcome. Not least that a a significant quantity of legacy toolingmay not be a good fit for Wave. Some architectural thinking on how to most effectively re-use existing code may be required. But overall the problem seems to be where to start on the large set of interesting possibilities. And that seems a good place to be with any new technology.

Continue reading “Sci – Bar – Foo etc. Part III – Google Wave Session at SciFoo”

June 8, 2009January 3, 2010

Google Wave in Research – the slightly more sober view – Part I – Papers

I, and many others have spent the last week thinking about Wave and I have to say that I am getting more, rather than less, excited about the possibilities that this represents. All of the below will have to remain speculation for the moment but I wanted to walk through two use cases and identify how the concept of a collaborative automated document will have an impact. In this post I will start with the drafting and publication of a paper because it is an easier step to think about. In the next post I will move on to the use of Wave as a laboratory recording tool.

Drafting and publishing a paper via Wave

I start drafting the text of a new paper. As I do this I add the Creative Commons robot as a participant. The robot will ask what license I wish to use and then provide a stamp, linked back to the license terms. When a new participant adds text or material to the document, they will be asked whether they are happy with the license, and their agreement will be registered within a private blip within the Wave controlled by the Robot (probably called CC-bly, pronounced see-see-bly). The robot may also register the document with a central repository of open content. A second robot could notify the authors respective institutional repository, creating a negative click repository in, well one click. More seriously this would allow the IR to track, and if appropriate modify, the document as well as harvest its content and metadata automatically.

I invite a series of authors to contribute to the paper and we start to write. Naturally the inline commenting and collaborative authoring tools get a good workout and it is possible to watch the evolution of specific sections with the playback tool. The authors are geographically distributed but we can organize scheduled hacking sessions with inline chat to work on sections of the paper. As we start to add references the Reference Formatter gets added (not sure whether this is a Robot or an Gadget, but it is almost certainly called “Reffy”). The formatter automatically recognizes text of the form (Smythe and Hoofback 1876) and searches the Citeulike libraries of the authors for the appropriate reference, adds an inline citation, and places a formatted reference in a separate Wavelet to keep it protected from random edits. Chunks of text can be collected from reports or theses in other Waves and the tracking system notes where they have come from, maintaing the history of the whole document and its sources and checking licenses for compatibility. Terminology checkers can be run over the document, based on the existing Spelly extension (although at the moment this works on the internal not the external API – Google say they are working to fix that) that check for incorrect or ambiguous use of terms, or identify gene names, structures etc. and automatically format them and link them to the reference database.

It is time to add some data and charts to the paper. The actual source data are held in an online spreadsheet. A chart/graphing widget is added to the document and formats the data into a default graph which the user can then modify as they wish. The link back to the live data is of course maintained. Ideally this will trigger the CC-bly robot to query the user as to whether they wish to dedicate the data to the Public Domain (therefore satisfying both the Science Commons Data protocol and the Open Knowledge Definition – see how smoothly I got that in?). When the users says yes (being a right thinking person) the data is marked with the chosen waiver/dedication and CKAN is notified and a record created of the new dataset.

The paper is cleaned up – informal comments can be easily obtained by adding colleagues to the Wave. Submission is as simple as adding a new participant, the journal robot (PLoSsy obviously) to the Wave. The journal is running its own Wave server so referees can be given anonymous accounts on that system if they choose. Review can happen directly within the document with a conversation between authors, reviewers, and editors. You don’t need to wait for some system to aggregate a set of comments and send them in one hit and you can deal with issues directly in conversation with the people who raise them. In addition the contribution of editors and referees to the final document is explicitly tracked. Because the journal runs its own server, not only can the referees and editors have private conversations that the authors don’t see, those conversations need never leave the journal server and are as secure as they can reasonably be expected to be.

Once accepted the paper is published simply by adding a new participant. What would traditionally happen at this point is that a completely new typeset version would be created, breaking the link with everything that has gone before. This could be done by creating a new Wave with just the finalized version visible and all comments stripped out. What would be far more exciting would be for a formatted version to be created which retained the entire history. A major objection to publishing referees comments is that they refer to the unpublished version. Here the reader can see the comments in context and come to their own conclusions. Before publishing any inline data will need to be harvested and placed in a reliable repository along with any other additional information. Supplementary information can simple be hidden under “folds” within the document rather than buried in separate documents.

The published document is then a living thing. The canonical “as published” version is clearly marked but the functionality for comments or updates or complete revisions is built in. The modular XML nature of the Wave means that there is a natural means of citing a specific portion of the document. In the future citations to a specific point in a paper could be marked, again via a widget or robot, to provide a back link to the citing source. Harvesters can traverse this graph of links in both directions easily wiring up the published data graph.

Based on the currently published information none of the above is even particularly difficult to implement. Much of it will require some careful study of how the work flows operate in practice and there will likely be issues of collisions and complications but most of the above is simply based on the functionality demonstrated at the Wave launch. The real challenge will lie in integration with existing publishing and document management systems and with the subtle social implications that changing the way that authors, referees, editors, and readers interact with the document. Should readers be allowed to comment directly in the Wave or should that be in a separate Wavelet? Will referees want to be anonymous and will authors be happy to see the history made public?

Much will depend on how reliable and how responsive the technology really is, as well as how easy it is to build the functionality described above. But the bottom line is that this is the result of about four day’s occasional idle thinking about what can be done. When we really start building and realizing what we can do, that is when the revolution will start.

Part II is here.

Related articles by Zemanta