“Friendfeeds for Science” pt II – Design ideas for a research focussed aggregator

Who likes me on friendfeed?
Image by cameronneylon via Flickr

This post, while only 48 hours old is somewhat outdated by these two Friendfeed discussions. This was written independently of those discussions so it seemed worth putting out in its original form rather than spending too much time rewriting.

I wrote recently about Sciencefeed, a Friendfeed like system aimed at scientists and was fairly critical. I also promised to write about what I thought a “Friendfeed for Researchers” should look like. To look at this we need to think about what Friendfeed, and other services including Twitter, Facebook, and Posterous are used for and what else they could do.

Friendfeed is an aggregator that enables, as I have written before, an “object-centric” means of interacting around those objects. As Alan Cann has pointed out this is not the only thing it does, also enabling the person-centric interactions that I see as more typical of Facebook and Twitter. Enabling both is important, as is the realization that all of these systems need to interoperate effectively with each other, something which is still evolving. But core to the development of something that works for researchers is that standard research objects and particularly papers, need to be first class objects. Author lists, one click to full text, one click to bookmark to my library.

Functionality 1: Treat research objects as first class citizens with special attention, start with journal papers and support for Citeulike/Zotero/Mendeley etc.

On top of this Friendfeed is a community, or rather several interlinked communities that have their own traditions, standards, and expectations, that are supported to a greater or lesser extent by the functionality of rooms, search, hiding, and administration found within Friendfeed. Any new service needs to understand and support these expectations.

Friendfeed also doesn’t so some things. It is not terribly effective as a bookmark tool, nor very good as tool for identifying and mining for objects or information that is more than a few days old although paradoxically it has served quite well as a means of archiving tweets and exposing them to search engines. The idea of a tool that surfaces objects to Google is an interesting one, and one we could take advantage of.  Granularity of sharing is also limited, what if I want slidesets to be public but tweets to be a private feed? Or to collect different feeds under different headings for different communities, public, domain-specific, and only for the interested specialist?

Finally Friendfeed doesn’t have a very sophisticated karma system.  While likes and comments will keep bringing specific objects (and by extension the people who have brought them in) into your attention stream there is none of the filtering power enabled by tools like StackOverflow. Whether or not such a thing is something we would want is an interesting question but it has the potential to enable much more sophisticated filtering and curation of content. StackOverflow itself has an interesting limitation as well; there is only one rank order of answers, I can’t choose to privelege the upmods of one specific curator rather than another. I certainly can’t choose to order my stream based on a persons upmods but not their downmods.

A user on Friendfeed plays three distinct roles, content author, content curator, and content consumer. Different people will emphasise different roles, from the pure broadcaster, to the pure reader who doesn’t ever interact. The real added value comes from the curation role and in particular enabling granular filtering based on your choice of curators. Curation comes in the form of choosing to push content to Friendfeed from outside servces, from “likes”, and from commenting. Commenting is both curation and authoring, providing context as well as providing new information or opinion. But supporting and validating this activity will be important. Whatever choice is made around “liking” or StackOverflow style up and down-modding needs to apply to comments as well as objects.

Functionality addition 2: Enable rating of comments and by extension, the people making them

If reputation gathering is to be useful in driving filtering functionality as I have suggested we will need good ways of separating content authoring from curation. One thing that really annoys me is seeing an interesting title and a friendly avatar on Friendfeed and clicking through to find something written by someone else. Not because I don’t want to read something written by someone else, but because my decision to click through was based on assumptions about who the author was.  We need to support a strong culture of citation and attribution in research. A Friendfeed for research will need to clearly mark the distinction between who has brought an object into the service, who has curated it, and who authored it. Both should be valued but the roles should be measured separately.

Functionality addition 3: Clearly designate authors and curators of objects brought into the stream. Possibly enable these activities to be rated separately?

If we recognize a role of author, outside that of the user’s curation activity we can also enable the rating of people and objects that don’t belong to users. This would allow researchers who are not users to build up reputation within the system. This has the potential to solve the “ghost town” phenomonen that plagues most science social networking sites. A new user could be able to claim the author role for objects that were originally brought  in by someone else. This would immediately connect them with other people who have commented on their work, and provide them with a reputation that can be further built upon through taking on curation activities.

This is a sensitive area, holding information on people without their knowledge, but it is something done already across indexing services, aggregation services, and chat rooms. The use of karma in this context would need to be very carefully thought out., and whether it would be made available either within or outside the system would be an important question to tackle.

Functionality addition 4: Collect reputation and comment information for authors who are not users to enable them to rapidly connect with relevant content if they choose to join.

Finally there is the question of interacting with this content and filtering it through the rating systems that have been created. The UI issues for this are formidable but there is a need to enable different views. A streaming view, and more static views of content a user has collected over long periods, as well as search. There is probably enough for another whole post in those issues.

Summary: Overall for me the key to building a service that takes inspiration from Friendfeed but delivers more functionality for researchers, while not alienating a wider potential user base is to build a tool that enables and supports curation rating and granular filtering of content. Authorship is key, as is quantitative measures of value and personal relevance that will enable users to build their own view of the content they are interested in, to collect it for themselves and to continue to curate it for themselves, either on their own or in collaboraton with others.

Reblog this post [with Zemanta]

The Panton Principles: Finding agreement on the public domain for published scientific data

Drafters of the Panton principlesI had the great pleasure and privilege of announcing the launch of the Panton Principles at the Science Commons Symposium – Pacific Northwest on Saturday. The launch of the Panton Principles, many months after they were first suggested is really largely down to the work of Jonathan Gray. This was one of several projects that I haven’t been able to follow through properly on and I want to acknowledge the effort that Jonathan has put into making that happen. I thought it might be helpful to describe where they came from, what they are intended to do and perhaps just as importantly what they don’t.

The Panton Principles aim to articulate a view of what best practice should be with respect to data publication for science. They arose out of an ongoing conversation between myself Peter Murray-Rust and Rufus Pollock. Rufus founded the Open Knowledge Foundation, an organisation that seeks to promote and support open culture, open source, and open science, with the emphasis on the open. The OKF position on licences has always been that share-alike provisions are an acceptable limitation to complete freedom to re-use content. I have always taken the Science Commons position that share-alike provisions, particularly on data have the potential to make it difficult or impossible to get multiple datasets or systems to interoperate. In another post I will explore this disagreement which really amounts to a different perspective on the balance of the risks and consequences of theft vs things not being used or useful. Peter in turn is particularly concerned about the practicalities – really wanting a straightforward set of rules to be baked right into publication mechanisms.

The Principles came out of a discussion in the Panton Arms a pub near to the Chemistry Department of Cambridge University, after I had given a talk in the Unilever Centre for Molecular Informatics. We were having our usual argument trying to win the others over when we actually turned to what we could agree on. What sort of statement could we make that would capture the best parts of both positions with a focus on science and data. We focussed further by trying to draw out one specific issue. Not the issue or when people should share results, or the details of how, but the mechanisms that should be used for re-use. The principles are intended to focus on what happens when a decision has been made to publish data and where we assume that the wish is for that data to be effectively re-used.

Where we found agreement was that for science, and for scientific data, and particularly science funded by public investment, that the public domain was the best approach and that we would all recommend it. We brought John Wilbanks in both to bring the views of Creative Commons and to help craft the words. It also made a good excuse to return to the pub. We couldn’t agree on everything – we will never agree on everything – but the form of words chosen – that placing data explicitly, irrevocably, and legally in the public domain satisfies both the Open Knowledge Definition and the Science Commons Principles for Open Data was something that we could all personally sign up to.

The end result is something that I have no doubt is imperfect. We have borrowed inspiration from the Budapest Declaration, but there are three B’s. Perhaps it will take three P’s to capture all the aspects that we need. I’m certainly up for some meetings in Pisa or Portland, Pittsburgh or Prague (less convinced about Perth but if it works for anyone else it would make my mother happy). For me it captures something that we agree on – a way forwards towards making the best possible practice a common and practical reality. It is something I can sign up to and I hope you will consider doing so as well.

Above all, it is a start.

Reblog this post [with Zemanta]

Friendfeed for Research? First impressions of ScienceFeed

Image representing FriendFeed as depicted in C...
Image via CrunchBase

I have been saying for quite some time that I think Friendfeed offers a unique combination of functionality that seems to work well for scientists, researchers, and the people they want to (or should want to) have conversations with. For me the core of this functionality lies in two places: first that the system explicitly supports conversations that centre around objects. This is different to Twitter which supports conversations but doesn’t centre them around the object – it is actually not trivial to find all the tweets about a given paper for instance. Facebook now has similar functionality but it is much more often used to have pure conversation. Facebook is a tool mainly used for person to person interactions, it is user- or person-centric. Friendfeed, at least as it is used in my space is object-centric, and this is the key aspect in which “social networks for science” need to differ from the consumer offerings in my opinion. This idea can trace a fairly direct lineage via Deepak Singh to the Jeff Jonas/Jon Udell concatenation of soundbites:

“Data finds data…then people find people”

The second key aspect about Friendfeed is that it gives the user a great deal of control over what they present to represent themselves. If we accept the idea that researchers want to interact with other researchers around research objects then it follows that the objects that you choose to represent yourself is crucial to creating your online persona. I choose not to push Twitter into Friendfeed mainly because my tweets are directed at a somewhat different audience. I do choose to bring in video, slides, blog posts, papers, and other aspects of my work life. Others might choose to include Flickr but not YouTube. Flexibility is key because you are building an online presence. Most of the frustration I see with online social tools and their use by researchers centres around a lack of control in which content goes where and when.

So as an advocate of Friendfeed as a template for tools for scientists it is very interesting to see how that template might be applied to tools built with researchers in mind. ScienceFeed launched yesterday by Ijad Madisch, the person behind ResearchGate. The first thing to say is that this is an out and out clone of Friendfeed, from the position of the buttons to the overall layout. It seems not to be built on the Tornado server that was open sourced by the Friendfeed team so questions may hang over scalability and architecture but that remains to be tested. The main UI difference with Friendfeed is that the influence of another 18 months of development of social infrastructure is evident in the use of OAuth to rapidly leverage existing networks and information on Friendfeed, Twitter, and Facebook. Although it still requires some profile setup, this is good to see. It falls short of the kind of true federation which we might hope to see in the future but then so does everything else.

In terms of specific functionality for scientists the main additions is a specialised tool for adding content via a search of literature databases. This seems to be adapted from the ResearchGate tool for populating a profile’s publication list. A welcome addition and certainly real tools for researchers must treat publications as first class objects. But not groundbreaking.

The real limitation of ScienceFeed is that it seems to miss the point of what Friendfeed is about. There is currently no mechanism for bringing in and aggregating diverse streams of content automatically. It is nice to be able to manually share items in my citeulike library but this needs to happen automatically. My blog posts need to come in as do my slideshows on slideshare, my preprints on Nature Precedings or Arxiv. Most of this information is accessible via RSS feeds so import via RSS/Atom (and in the future real time protocols like XMPP) is an absolute requirement. Without this functionality, ScienceFeed is just a souped up microblogging service. And as was pointed out yesterday in one friendfeed thread we have a twitter-like service for scientists. It’s called Twitter. With the functionality of automatic feed aggregation Friendfeed can become a presentation of yourself as a researcher on the web. An automated publication list that is always up to date and always contains your latest (public) thoughts, ideas, and content. In short your web-native business card and CV all rolled into one.

Finally there is the problem of the name. I was very careful at the top of this post to be inclusive in the scope of people who I think can benefit from Friendfeed. One of the great strengths of Friendfeed is that it has promoted conversations across boundaries that are traditionally very hard to bridge. The ongoing collision between the library and scientific communities on Friendfeed may rank one day as its most important achievement, at least in the research space. I wonder whether the conversations that have sparked there would have happened at all without the open scope that allowed communities to form without prejudice as to where they came from and then to find each other and mingle. There is nothing in ScienceFeed that precludes anyone from joining as far as I can see, but the name is potentially exclusionary, and I think unfortunate.

Overall I think ScienceFeed is a good discussion point, a foil to critical thinking, and potentially a valuable fall back position if Friendfeed does go under. It is a place where the wider research community could have a stronger voice about development direction and an opportunity to argue more effectively for business models that can provide confidence in a long term future. I think it currently falls far short of being a useful tool but there is the potential to use it as a spur to build something better. That might be ScienceFeed v2 or it might be an entirely different service. In a follow-up post I will make some suggestions about what such a service might look like but for now I’d be interested in what other people think.

Other Friendfeed threads are here and here and Techcrunch has also written up the launch.

Reblog this post [with Zemanta]

Peer review: What is it good for?

Peer Review Monster
Image by Gideon Burton via Flickr

It hasn’t been a real good week for peer review. In the same week that the Lancet fully retract the original Wakefield MMR article (while keeping the retraction behind a login screen – way to go there on public understanding of science), the main stream media went to town on the report of 14 stem cell scientists writing an open letter making the claim that peer review in that area was being dominated by a small group of people blocking the publication of innovative work. I don’t have the information to actually comment on the substance of either issue but I do want to reflect on what this tells us about the state of peer review.

There remains much reverence of the traditional process of peer review. I may be over interpreting the tenor of Andrew Morrison’s editorial in BioEssays but it seems to me that he is saying, as many others have over the years “if we could just have the rigour of traditional peer review with the ease of publication of the web then all our problems would be solved”.  Scientists worship at the altar of peer review, and I use that metaphor deliberately because it is rarely if ever questioned. Somehow the process of peer review is supposed to sprinkle some sort of magical dust over a text which makes it “scientific” or “worthy”, yet while we quibble over details of managing the process, or complain that we don’t get paid for it, rarely is the fundamental basis on which we decide whether science is formally published examined in detail.

There is a good reason for this. THE EMPEROR HAS NO CLOTHES! [sorry, had to get that off my chest]. The evidence that peer review as traditionally practiced is of any value at all is equivocal at best (Science 214, 881; 1981, J Clinical Epidemiology 50, 1189; 1998, Brain 123, 1954; 2000, Learned Publishing 22, 117; 2009). It’s not even really negative. That would at least be useful. There are a few studies that suggest peer review is somewhat better than throwing a dice and a bunch that say it is much the same. It is at its best at dealing with narrow technical questions, and at its worst at determining “importance” is perhaps the best we might say. Which for anyone who has tried to get published in a top journal or written a grant proposal ought to be deeply troubling. Professional editorial decisions may in fact be more reliable, something that Philip Campbell hints at in his response to questions about the open letter [BBC article]:

Our editors […] have always used their own judgement in what we publish. We have not infrequently overruled two or even three sceptical referees and published a paper.

But there is perhaps an even more important procedural issue around peer review. Whatever value it might have we largely throw away. Few journals make referee’s reports available, virtually none track the changes made in response to referee’s comments enabling a reader to make their own judgement as to whether a paper was improved or made worse. Referees get no public credit for good work, and no public opprobrium for poor or even malicious work. And in most cases a paper rejected from one journal starts completely afresh when submitted to a new journal, the work of the previous referees simply thrown out of the window.

Much of the commentary around the open letter has suggested that the peer review process should be made public. But only for published papers. This goes nowhere near far enough. One of the key points where we lose value is in the transfer from one journal to another. The authors lose out because they’ve lost their priority date (in the worse case giving the malicious referees the chance to get their paper in first). The referees miss out because their work is rendered worthless. Even the journals are losing an opportunity to demonstrate the high standards they apply in terms of quality and rigor – and indeed the high expectations they have of their referees.

We never ask what the cost of not publishing a paper is or what the cost of delaying publication could be. Eric Weinstein provides the most sophisticated view of this that I have come across and I recommend watching his talk at Science in the 21st Century from a few years back. There is a direct cost to rejecting papers, both in the time of referees and the time of editors, as well as the time required for authors to reformat and resubmit. But the bigger problem is the opportunity cost – how much that might have been useful, or even important, is never published? And how much is research held back by delays in publication? How many follow up studies not done, how many leads not followed up, and perhaps most importantly how many projects not refunded, or only funded once the carefully built up expertise in the form of research workers is lost?

Rejecting a paper is like gambling in a game where you can only win. There are no real downside risks for either editors or referees in rejecting papers. There are downsides, as described above, and those carry real costs, but those are never borne by the people who make or contribute to the decision. Its as though it were a futures market where you can only lose if you go long, never if you go short on a stock. In Eric’s terminology those costs need to be carried, we need to require that referees and editors who “go short” on a paper or grant are required to unwind their position if they get it wrong. This is the only way we can price in the downside risks into the process. If we want open peer review, indeed if we want peer review in its traditional form, along with the caveats, costs and problems, then the most important advance would be to have it for unpublished papers.

Journals need to acknowledge the papers they’ve rejected, along with dates of submission. Ideally all referees reports should be made public, or at least re-usable by the authors. If full publication, of either the submitted form of the paper or the referees report is not acceptable then journals could publish a hash of the submitted document and reports against a local key enabling the authors to demonstrate submission date and the provenance of referees reports as they take them to another journal.

In my view referees need to be held accountable for the quality of their work. If we value this work we should also value and publicly laud good examples. And conversely poor work should be criticised. Any scientist has received reviews that are, if not malicious, then incompetent. And even if we struggle to admit it to others we can usually tell the difference between critical, but constructive (if sometimes brutal), and nonsense. Most of us would even admit that we don’t always do as good a job as we would like. After all, why should we work hard at it? No credit, no consequences, why would you bother? It might be argued that if you put poor work in you can’t expect good work back out when your own papers and grants get refereed. This again may be true, but only in the long run, and only if there are active and public pressures to raise quality. None of which I have seen.

Traditional peer review is hideously expensive. And currently there is little or no pressure on its contributors or managers to provide good value for money. It is also unsustainable at its current level. My solution to this is to radically cut the number of peer reviewed papers probably by 90-95% leaving the rest to be published as either pure data or pre-prints. But the whole industry is addicted to traditional peer reviewed publications, from the funders who can’t quite figure out how else to measure research outputs, to the researchers and their institutions who need them for promotion, to the publishers (both OA and toll access) and metrics providers who both feed the addiction and feed off it.

So that leaves those who hold the purse strings, the funders, with a responsibility to pursue a value for money agenda. A good place to start would be a serious critical analysis of the costs and benefits of peer review.

Addition after the fact: Pointed out in the comments that there are other posts/papers I should have referred to where people have raised similar ideas and issues. In particular Martin Fenner’s post at Nature Network. The comments are particularly good as an expert analysis of the usefulness of the kind of “value for money” critique I have made. Also a paper in the Arxiv from Stefano Allesina. Feel free to mention others and I will add them here.

Reblog this post [with Zemanta]

Everything I know about software design I learned from Greg Wilson – and so should your students

Visualization of the "history tree" ...
Image via Wikipedia

Which is not to say that I am any good at software engineering, good practice, or writing decent code. And you shouldn’t take Greg to task for some of the dodgy demos I’ve done over the past few months either. What he does need to take the credit for is enabling me to go from someone who knew nothing at all about software design, the management of software development or testing to being able to talk about these things, ask some of the right questions, and even begin to make some of my own judgements about code quality in an amazingly short period of time. From someone who didn’t know how to execute a python script to someone who feels uncomfortable working with services where I can’t use a testing framework before deploying software.

This was possible through the online component of the training programme, called Software Carpentry, that Greg has been building, delivering and developing over the past decade. This isn’t a course in software engineering and it isn’t built for computer science undergraduates. It is a course focussed on taking scientists who have done a little bit of tinkering or scripting and giving them the tools, the literacy, and the knowledge to apply the best of knowledge base of software engineering to building useful high quality code that solves their problems.

Code and computational quality has never been a priority in science and there is a strong argument that we are currently paying, and will continue to pay a heavy price for that unless we sort out the fundamentals of computational literacy and practices as these tools become ubiquitous across the whole spread of scientific disciplines. We teach people how to write up an experiment; but we don’t teach them how to document code. We teach people the importance of significant figures but many computational scientists have never even heard of version control. And we teach the importance of proper experimental controls but never provide the basic training in testing and validating software.

Greg is seeking support to enable him to update Software Carpentry to provide an online resource for the effective training of scientists in basic computational literacy. It won’t cost very much money; we’re talking a few hundred thousand dollars here. And the impact is potentially both important and large. If you care about the training of computational scientists; not computer scientists, but the people who need, or could benefit from, some coding, data managements, or processing in their day to day scientific work, and you have money then I encourage you to contribute. If you know people or organizations with money please encourage them to contribute. Like everything important, especially anything to do with education and preparing for the future, these things are tough to fund.

You can find Greg at his blog: http://pyre.third-bit.com

His description of what wants to do and what he needs to do it is at: http://pyre.third-bit.com/blog/archives/3400.html

Reblog this post [with Zemanta]

Why I am disappointed with Nature Communications

Towards the end of last year I wrote up some initial reactions to the announcement of Nature Communications and the communications team at NPG were kind enough to do a Q&A to look at some of the issues and concerns I raised. Specifically I was concerned about two things. The licence that would be used for the “Open Access” option and the way that journal would be positioned in terms of “quality”, particularly as it related to the other NPG journals and the approach to peer review.

Unfortunately I have to say that I feel these have been fudged, and this is unfortunate because there was a real opportunity here to do something different and quite exciting.  I get the impression that that may even have been the original intention. But from my perspective what has resulted is a poor compromise between my hopes and commercial concerns.

At the centre of my problem is the use of a Creative Commons Attribution Non-commercial licence for the “Open Access” option. This doesn’t qualify under the BBB declarations on Open Access publication and it doesn’t qualify for the SPARC seal for Open Access. But does this really matter or is it just a side issue for a bunch of hard core zealots? After all if people can see it that’s a good start isn’t it? Well yes, it is a good start but non-commercial terms raise serious problems. Putting aside the fact that there is an argument that universities are commercial entities and therefore can’t legitimately use content with non-commercial licences the problem is that NC terms limit the ability of people to create new business models that re-use content and are capable of scaling.

We need these business models because the current model of scholarly publication is simply unaffordable. The argument is often made that if you are unsure whether you are allowed to use content then you can just ask, but this simply doesn’t scale. And lets be clear about some of the things that NC means you’re not licensed for: using a paper for commercially funded research even within a university, using the content of paper to support a grant application, using the paper to judge a patent application, using a paper to assess the viability of a business idea…the list goes on and on. Yes you can ask if you’re not sure, but asking each and every time does not scale. This is the central point of the BBB declarations. For scientific communication to scale it must allow the free movement and re-use of content.

Now if this were coming from any old toll access publisher I would just roll my eyes and move on, but NPG sets itself up to be judged by a higher standard. NPG is a privately held company, not beholden to share holders. It is a company that states that it is committed to advancing scientific communication not simply traditional publication. Non-commercial licences do not do this. From the Q&A:

Q: Would you accept that a CC-BY-NC(ND) licence does not qualify as Open Access under the terms of the Budapest and Bethesda Declarations because it limits the fields and types of re-use?

A: Yes, we do accept that. But we believe that we are offering authors and their funders the choices they require.Our licensing terms enable authors to comply with, or exceed, the public access mandates of all major funders.

NPG is offering the minimum that allows compliance. Not what will most effectively advance scientific communication. Again, I would expect this of a shareholder-controlled profit-driven toll access dead tree publisher but I am holding NPG to a higher standard. Even so there is a legitimate argument to be made that non-commercial licences are needed to make sure that NPG can continue to support these and other activities. This is why I asked in the Q&A whether NPG made significant money off re-licensing of content for commercial purposes. This is a discussion we could have on the substance – the balance between a commercial entity providing a valuable service and the necessary limitations we might accept as the price of ensuring the continued provision of that service. It is a value for money judgement. But not one we can make without a clear view of the costs and benefits.

So I’m calling NPG on this one. Make a case for why non-commercial licences are necessary or even beneficial, not why they are acceptable. They damage scientific communication, they create unnecessary confusion about rights, and more importantly they damage the development of new business models to support scientific communication. Explain why it is commercially necessary for the development of these new activities, or roll it back, and take a lead on driving the development of science communication forward. Don’t take the kind of small steps we expect from other, more traditional, publishers. Above all, lets have that discussion. What is the price we would have to pay to change the license terms?

Because I think it goes deeper. I think that NPG are actually limiting their potential income by focussing on the protection of their income from legacy forms of commercial re-use. They could make more money off this content by growing the pie than by protecting their piece of a specific income stream. It goes to the heart of a misunderstanding about how to effectively exploit content on the web. There is money to be made through re-packaging content for new purposes. The content is obviously key but the real value offering is the Nature brand. Which is much better protected as a trademark than through licensing. Others could re-package and sell on the content but they can never put the Nature brand on it.

By making the material available for commercial re-use NPG would help to expand a high value market for re-packaged content which they would be poised to dominate. Sure, if you’re a business you could print off your OA Nature articles and put them on the coffee table, but if you want to present them to investors you want that Nature logo and Nature packaging that you can only get from one place.  And that NPG does damn well. NPG often makes the case that it adds value through selection, presentation, and aggregation. It is the editorial brand that is of value. Let’s see that demonstrated though monetization of the brand, rather than through unnecessarily restricting the re-use of the content, especially where authors are being charged $5000 to cover the editorial costs.

Reblog this post [with Zemanta]

Google Wave: Ripple or Tsunami?

Big Wave Surfing in Tahiti at Teahupoo
Image by thelastminute via Flickr

A talk given at the Edinburgh University IT Futures meeting late in 2009. The talk discusses the strengths and weaknesses of Wave as a tool for research and provides some pointers on how to think about using it in an academic setting. The talk was recorded in a Wave with members of the audience taking notes around images of the slides which I had previously uploaded.

You will only be able to see the wave if you have a Wave preview account and are logged in. If you don’t have an account the slides are included below (or will be as soon as I can get slideshare to talk to me).

[wave id=”googlewave.com!w+-c2g1ggkA”]

Reblog this post [with Zemanta]

What should social software for science look like?

Nat Torkington, picking up on my post over the weekend about the CRU emails takes a slant which has helped me figure out how to write this post which I was struggling with. He says:

[from my post...my concern is that in a kneejerk response to suddenly make things available no-one will think to put in place the social and technical infrastructure that we need to support positive engagement, and to protect active researchers, both professional and amateur from time-wasters.] Sounds like an open science call for social software, though I’m not convinced it’s that easy. Humans can’t distinguish revolutionaries from terrorists, it’s unclear why we think computers should be able to.

As I responded over at Radar, yes I am absolutely calling for social software for scientists, but I didn’t mean to say that we could expect it to help us find the visionaries amongst the simply wrong. But this raises a very helpful question. What is it that we would hope Social Software for Science would do? And is that realistic?

Over the past twelve months I seem to have got something of a reputation for being a grumpy old man about these things, because I am deeply sceptical of most of the offerings out there. Partly because most of these services don’t actually know what it is they are trying to do, or how it maps on to the success stories of the social web. So prompted by Nat I would like to propose a list of what effective Social Software for Science (SS4S) will do and what it can’t.

  1.  SS4S will promote engagement with online scientific objects and through this encourage and provide paths to those with enthusiasm but insufficient expertise to gain sufficient expertise to contribute effectively (see e.g. Galaxy Zoo). This includes but is certainly not limited to collaborations between professional scientists. These are merely a special case of the general.
  2. SS4S will measure and reward positive contributions, including constructive criticism and disagreement (Stack overflow vs YouTube comments). Ideally such measures will value quality of contribution rather than opinion, allowing disagreement to be both supported when required and resolved when appropriate.
  3. SS4S will provide single click through access to available online scientific objects and make it easy to bring references to those objects into the user’s personal space or stream (see e.g. Friendfeed “Like” button)
  4. SS4S should provide zero effort upload paths to make scientific objects available online while simultaneously assuring users that this upload and the objects are always under their control. This will mean in many cases that what is being pushed to the SS4S system is a reference not the object itself, but will sometimes be the object to provide ease of use. The distinction will ideally be invisible to the user in practice barring some initial setup (see e.g. use of Posterous as a marshalling yard).
  5. SS4S will make it easy for users to connect with other users and build networks based on a shared interest in specific research objects (Friendfeed again).
  6. SS4S will help the user exploit that network to collaboratively filter objects of interest to them and of importance to their work. These objects might be results, datasets, ideas, or people.
  7. SS4S will integrate with the user’s existing tools and workflow and enable them to gradually adopt more effective or efficient tools without requiring any severe breaks (see Mendeley/Citeulike/Zotero/Papers and DropBox)
  8. SS4S will work reliably and stably with high performance and low latency.
  9. SS4S will come to where the researcher is working both with respect to new software and also unusual locations and situations requiring mobile, location sensitive, and overlay technologies (Layar, Greasemonkey, voice/gesture recognition – the latter largely prompted by a conversation I had with Peter Murray-Rust some months ago).
  10. SS4S will be trusted and reliable with a strong community belief in its long term stability. No single organization holds or probably even can hold this trust so solutions will almost certainly need to be federated, open source, and supported by an active development community.

What SS4S won’t do is recognize geniuses when they are out in the wilderness amongst a population of the just plain wrong. It won’t solve the cost problems of scientific publication and it won’t turn researchers into agreeable, supportive, and collaborative human beings. Some things are beyond even the power of Web 2.0

I was originally intending to write this post from a largely negative perspective, ranting as I have in the past about how current services won’t work. I think now there is a much more positive approach. Lets go out there and look at what has been done, what is being done, and how well it is working in this space. I’ve set up a project on my new wiki (don’t look too closely, I haven’t finished the decorating) and if you are interested in helping out with a survey of what’s out there I would appreciate the help. You should be able to log in with an OpenID as long as you provide an email address. Check out this Friendfeed thread for some context.

My belief is that we are near to position where we could build a useful requirements document for such a beast with references to what has worked and what hasn’t. We may not have the resources to build it and maybe the NIH projects currently funded will head in that direction. But what is valuable is to pull the knowledge together to figure out the most effective path forward.

Open Research: The personal, the social, and the political

Next Tuesday I’m giving a talk at the Institute for Science Ethics and Innovation in Manchester. This is a departure for me in terms of talk subjects, in as much as it is much more to do with policy and politics. I have struggled quite a bit with it so this is an effort to work it out on “paper”. Warning, it’s rather long. The title of the talk is “Open Research: What can we do? What should we do? And is there any point?”

I’d like to start by explaining where I’m coming from. This involves explaining a bit about me. I live in Bath. I work at the Rutherford Appleton Laboratory, which is near Didcot. I work for STFC but this talk is a personal view so you shouldn’t take any of these views as representing STFC policy. Bath and Didcot are around 60 miles apart so each morning I get up pretty early, I get on a train, then I get on a bus which gets me to work. I work on developing methodology to study complex biological structures. We have a particular interest in trying to improve methods for looking at proteins that live in biological membranes and protein-nucleic acid complexes. I also have done work on protein labelling that lets us make cool stuff and pretty pictures. This work involves an interesting mixture of small scale lab work, work at large facilities on big instruments, often multi-national facilities. It also involves far too much travelling.

A good question to ask at this point is “Why?” Why do I do these things? Why does the government fund me to do them? Actually it’s not so much why the government funds them as why the public does. Why does the taxpayer support our work? Even that’s not really the right question because there is no public. We are the public. We are the taxpayer. So why do we as a community support science and research? Historically science was carried out by people sufficiently wealthy to fund it themselves, or in a small number of cases by people who could find wealth patrons. After the second world war there was a political and social concensus that science needed to be supported and that concensus has supported research funding more or less to the present day. But with the war receding in public memory we seem to have retained the need to frame the argument for research funding in terms of conflict or threat. The War on Cancer, the threat of climate change. Worse, we seem to have come to believe our own propaganda, that the only way to justify public research funding is that it will cure this, or save us from that. And the reality is that in most cases we will probably not deliver on this.

These are big issues and I don’t really have answers to a lot them but it seems to me that they are important questions to think about. So here are some of my ideas about how to tackle them from a variety of perspectives. First the personal.

A personal perspective on why and how I do research

My belief is we have to start with being honest with ourselves, personally, about why and how we do research. This sounds like some sort of self-help mantra I know but let me explain what I mean. My personal aim is to maximise my positive impact on the world, either through my own work or through enabling the work of others. I didn’t come at this from first principles but it has evolved. I also understand I am personally motivated by recognition and reward and that I am strongly, perhaps too strongly, motivated by others opinions of me. My understanding of my own skills and limitations means that I largely focus my research work on methodology development and enabling others. I can potentially have a bigger impact by building systems and capabilities that help others do their research than I can by doing that research myself. I am lucky enough to work in an organization that values that kind of contribution to the research effort.

Because I want my work to be used as far as is possible I make as much as possible of it freely available. Again I am lucky that I live now when the internet makes this kind of publishing possible. We have services that enable us to easily publish ideas, data, media, and process and I can push a wide variety of objects onto the web for people to use if they so wish. Even better than that I can work on developing tools and systems that help other people to do this effectively. If I can have a bigger impact by enabling other peoples research then I can multiply that again by helping other people to share that research. But here we start to run into problems. Publishing is easy. But sharing is not so easy. I can push to the web, but is anyone listening? And if they are, can they understand what I am saying?

A social perspective (and the technical issues that go with it)

If I want my publishing to be useful I need to make it available to people in a way they can make use of. We know that networks increase in value as they grow much more than linearly. If I want to maximise my impact, I have to make connections and maximise the ability of other people to make connections. Indeed Merton made the case for this in scientific research 20 years ago.

I propose the seeming paradox that in science, private property is established by having its substance freely given to others who might want to make use of it.

This is now a social problem but a social problem with a distinct technical edge to it.  Actually we have two related problems. The issue of how I make my work available in a useful form and the separate but related issue of how I persuade others to make their work available for others to use.

The key to making my work useful is interoperability. This is at root a technical issue but at a purely technical level is one that has been solved. We can share through agreed data formats and vocabularies. The challenges we face in actually making it happen are less technical problems than social ons but I will defer those for the moment. We also need legal interoperability. Science Commons amongst others has focused very hard on this question and I don’t want to discuss it in detail here except to say that I agree with the position that Science Commons takes; that if you want to maximise the ability of others to re-use your work then you must make it available with liberal licences that do not limit fields of use or the choice of license on derivative works. This mean CC-BY, BSD etc. but if you want to be sure then your best choice is explicit dedication to the public domain.

But technical and legal interoperability are just subsets of what I think is more important;  process interoperability. If the object we publish are to be useful then they must be able to fit into the processes that researchers actually use. As we move to the question of persuading others to share and build the network this becomes even more important. We are asking people to change the way they do things, to raise their standards perhaps. So we need to make sure that this is as easy as possible and fits into their existing workflows. The problem with understanding how to achieve technical and legal interoperability is that the temptation is to impose it and I am as guilty of this as anyone. What I’d like to do is use a story from our work to illustrate an approach that I think can help us to make this easier.

Making life easier by capturing process as it happens: Objects first, structure later

Our own work on web based laboratory recording systems, which really originates in the group of Jeremy Frey at Southampton came out of earlier work on a fully semantic RDF backed system for recording synthetic chemistry. In contrast we took an almost completely unstructured approach to recording work in a molecular biology laboratory, not because we were clever or knew it would work out, but because it was a contrast to what had gone before. The LaBLog is based on a Blog framework and allows the user to put in completely free text, completely arbitrary file attachments, and to organize things in whichever way they like. Obviously a recipe for chaos.

And it was to start with as we found our way around but we went through several stages of re-organization and interface design over a period of about 18 months. The key realization we made was that while a lot of what we were doing was difficult to structure in advance that there were elements within that, specific processes, specific types of material that were consistently repeated, even stereotyped, and that structuring these gave big benefits. We developed a template system that made producing these repeated processes and materials much easier. These templates depended on how we organized our posts, and the metadata that described them, and the metadata in turn was driven by the need for the templates to be effective. A virtuous circle developed around the positive re-inforcement that the templates and associated metadata provided. More suprisingly the structure that evolved out of this matched in many cases well onto existing ontologies. In specific cases where it didn’t we could see that either the problem arose from the ontology itself, or the fact that our work simply wasn’t well mapped by that ontology. But the structure arose spontaneously out of a considered attempt to make the user/designer’s life easier. And was then mapped onto the external vocabularies.

I don’t want to suggest that our particular implementation is perfect. It is far from it, with gaping holes in the usability and our ability to actually exploit the structure that has developed. But I think the general point is useful. For the average scientist to be willing to publish more of their research, that process has to be made easy and it has to recognise the inherently unstructured nature of most research. We need to apply structured descriptions where they make the user’s life easier but allow unstructured or semi-structured representations elsewhere. But we need to build tools that make it easy to take those unstructured or semi-structure records and mold them into a specific structured narrative as part of a reporting process that the researcher has to do anyway. Writing a report, writing a paper. These things need to be done anyway and if we could build tools so that the easiest way to write the report or paper is to bring elements of the original record together and push those onto the web in agreed formats through easy to use filters and aggregators then we will have taken an enormous leap forward.

Once you’ve insinuated these systems into the researchers process then we can start talking about making that process better. But until then technical and legal interoperability are not enough – we need to interoperate with existing processes as well. If we could achieve this then much more research material would flow online, connections would be formed around those materials, and the network would build.

And finally – the political

This is all very well. With good tools and good process I can make it easier for people to use what I publish and I can make it easier for others to publish. This is great but it won’t make others want to publish. I believe that more rapid publication of research is a good thing. But if we are to have a rational discussion about whether this is true we need to have agreed goals. And that moves the discussion into the political sphere.

I asked earlier why it is that we do science as a society, why we fund it. As a research community I feel we have no coherent answer to these questions.  I also talked about being honest to ourselves. We should be honest with other researchers about what motivates us, why we choose to do what we do, and how we choose to divide limited resources. And as recipients of taxpayers money we need to be clear with government and the wider community about what we can achieve. We also have an obligation to optimize the use of the money we spend. And to optimize the effective use of the outputs derived from that money.

We need at core a much more sophisticated conversation with the wider community about the benefits that research brings; to the economy, to health, to the environment, to education. And we need a much more rational conversation within the research community as to how those different forms of impact are and should be tensioned against each other.  We need in short a complete overhaul if not a replacement of the post-war concensus on public funding of research. My fear is that without this the current funding squeeze will turn into a long term decline. And that without some serious self-examination the current self-indulgent bleating of the research community is unlikely to increase popular support for public research funding.

There are no simple answers to this but it seems clear to me that at a minimum we need to be demonstrating that we are serious about maximising the efficiency with which we spend public money. That means making sure that research outputs can be re-used, that wheels don’t need to re-invented, and innovation flows easily from the academic lab into the commercial arena. And it means distinguishing between the effective use of public money to address market failures and subsidising UK companies that are failing to make effective investments in research and development.

The capital generated by science is in ideas, capability, and people. You maximise the effective use of capital by making it easy to move, by reducing barriers to trade. In science we can achieve this by maximising the ability transfer research outputs. If we to be taken seriously as guardians of public money and to be seen as worthy of that responsibility our systems need to make ideas, data, methodology, and materials flow easily. That means making our data, our process, and our materials freely available and interoperable. That means open research.

We need a much greater engagement with the wider community on how science works and what science can do. The web provides an immense opportunity to engage the public in active research as demonstrated by efforts as diverse as Galaxy Zoo with 250,000 contributors and millions of galaxy classifications and the Open Dinosaur Project with people reading online papers and adding the measurements of thigh bones to an online spreadsheet. Without the publicly available Sloan Digital Sky Survey, without access to the paleontology papers, and without the tools to put the collected data online and share them these people, this “public”, would be far less engaged. That means open research.

And finally we need to turn the tools of our research on ourselves. We need to critically analyse our own systems and processes for distributing resources, for communicating results, and for apportioning credit. We need to judge them against the value for money they offer to the taxpayer and where they are found wanting we need to adjust. In the modern networked world we need to do this in a transparent and honest manner. That means open research.

But even if we agree these things are necessary, or a general good, they are just policy. We already have policies which are largely ignored. Even when obliged to by journal publication policies or funder conditions researchers avoid, obfuscate, and block attempts to gain access to data, materials, and methdology. Researchers are humans too with the same needs to get ahead and to be recognized as anyone else. We need to find a way to map those personal needs, and those personal goals, onto the community’s need for more openness in research. As with the tooling we need to “bake in” the openness to our processes to make it the easiest way to get ahead. Policy can help with cultural change but we need an environment in which open research is the simplest and easiest approach to take. This is interoperability again but in this case the policy and process has to interoperate with the real world. Something that is often a bit of a problem.

So in conclusion…

I started with a title I’ve barely touched on.  But I hope with some of the ideas I’ve explored we are in a position to answer the questions I posed. What can we do in terms of Open Research? The web makes it technically possible for us the share data, process, and records in real time. It makes it easier for us to share materials though I haven’t really touched on that. We have the technical ability to make that data useful through shared data formats and vocabularies. Many of the details are technically and socially challenging but we can share pretty much anything we choose to on a wide variety of timeframes.

What should we do? We should make that choice easier through the development of tools and interfaces that recognize that it is usually humans doing and recording the research and exploiting the ability of machines to structure that record when they are doing the work. These tools need to exploit structure where it is appropriate and allow freedom where it is not. We need tools to help us map our records onto structures as we decide how we want to present them. Most importantly we need to develop structures of resource distribution, communication, and recognition that encourage openness by making it the easiest approach to take. Encouragement may be all that’s required. The lesson from the web is that once network effects take hold they can take care of the rest.

But is there any point? Is all of this worth the effort? My answer, of course, is an unequivocal yes. More open research will be more effective, more efficient, and provide better value for the taxpayer’s money. But more importantly I believe it is the only credible way to negotiate a new concensus on the public funding of research. We need an honest conversation with government and the wider community about why research is valuable, what the outcomes are, and how the contribute to our society. We can’t do that if the majority cannot even see those outcomes. The wider community is more sophisticated that we give it credit for. And in many ways the research community is less sophisticated than we think. We are all “the public”. If we don’t trust the public to understand why and how we do research, if we don’t trust ourselves to communicate the excitement and importance of our work effectively, then I don’t see why we deserve to be trusted to spend that money.

Nature Communications: A breakthrough for open access?

A great deal of excitement but relatively little detailed information thus far has followed the announcement by Nature Publishing Group of a new online only journal with an author-pays open access option. NPG have managed and run a number of open access (although see caveats below) and hybrid journals as well as online only journals for a while now. What is different about Nature Communications is that it will be the first clearly Nature-branded journal that falls into either of these categories.

This is significant because it is bringing the Nature brand into the mix. Stephen Inchcoombe, executive director of NPG in email correspondence quoted in the The Scientist, notes the increasing uptake of open-access options and the willingness of funders to pay processing charges for publication as major reasons for NPG to provide a wider range of options.

In the NPG press release David Hoole, head of content licensing for NPG says:

“Developments in publishing and web technologies, coupled with increasing commitment by research funders to cover the costs of open access, mean the time is right for a journal that offers editorial excellence and real choice for authors.”

The reference to “editorial excellence” and the use of the Nature brand are crucial here and what makes this announcement significant. The question is whether NPG can deliver something novel and successful.

The journal will be called Nature Communications. “Communications” is a moniker usually reserved for “rapid publication” journals. At the same time the Nature brand is all about exclusivity, painstaking peer review, and editorial work. Can these two be reconciled successfully and, perhaps most importantly, how much will it cost? In the article in The Scientist a timeframe of 28 days from submission to publication is mentioned but as a minimum period. Four weeks is fast, but not super-fast for an online only journal.

But speed is not the only criterion. Reasonably fast and with a Nature brand may well be good enough for many, particularly those who have come out of the triage process at Nature itself. So what of that branding – where is the new journal pitched? The press release is a little equivocal on this:

Nature Communications will publish research papers in all areas of the biological, chemical and physical sciences, encouraging papers that provide a multidisciplinary approach. The research will be of the highest quality, without necessarily having the scientific reach of papers published in Nature and the Nature research journals, and as such will represent advances of significant interest to specialists within each field.

So more specific – less general interest, but still “the highest quality”. This is interesting because there is an argument that this could easily cannibalise the “Nature Baby” journals. Why wait for Nature Biotech or Nature Physics when you can get your paper out faster in Nature Communications? Or on the other hand might it be out-competed by the other Nature journals – if the selection criteria are more or less the same, highest quality but not of general interest, why would you go for a new journal over the old favourites? Particularly if you are the kind of person that feels uncomfortable with online only journals.

If the issue of the selectivity difference between the old and the new Nature journals then the peer review process can perhaps offer us clues. Again some interesting but not entirely clear statements in the press release:

A team of independent editors, supported by an external editorial advisory panel, will make rapid and fair publication decisions based on peer review, with all the rigour expected of a Nature-branded journal.

This sounds a little like the PLoS ONE model – a large editorial board with the intention of spreading the load of peer review so as to speed it up. With the use of the term “peer review” it is to be presumed that this means external peer review by referees with no formal connection to NPG. Again I would have thought that NPG are very unlikely to dilute their brand by utilising editorial peer review of any sort. Given the slow point of the process is getting a response back from peer reviewers, whether they are reviewing for Nature or for PLoS ONE, its not clear to me how this can be speed up or indeed even changed from the traditional process, without risking a perception of a quality drop. This is going to be a very tough balance to find.

So finally, does this meant that NPG are serious about Open Access? NPG have been running OA and online only journals (although see the caveat below about the licence) for a while now and appear to be serious about increasing this offering. They will have looked very seriously at the numbers before making a decision on this and my reading is that those numbers are saying that they need to have a serious offering. This is a hybrid and it will be easy to make accusations that, along with other fairly unsuccessful hybrid offerings, it is being set up to fail.

I doubt this is the case personally, but nor do I necessarily believe that the OA option will necessarily get the strong support it will need to thrive. The critical question will be pricing. If this is pitched at the level of other hybrid options, too high to be worth what is being offered in terms of access, then it will appear to have been set up to fail. Yet NPG can justifiably charge a premium if they are providing real editorial value.  Indeed they have to. NPG has in the past said that they would have to charge enormous processing charges to published authors to recover costs of peer review. So they can’t offer something relatively cheap, yet claim the peer review is to the same standards. The price is absolutely critical to credibility. I would guess something around £2500 or $US4000. Higher than PLoS Biology/Medicine but lower than other hybrid offerings.

So then the question becomes value for money. Is the OA offering up to scratch? Again the press release is not as enlightening as one would wish:

Authors who choose the open-access option will be able to license their work under a Creative Commons license, including the option to allow derivative works.

So does that mean it will be a non-commercial license? In which case it is not Open Access under the BBB declarations (most explicitly in the Budapest Declaration). This would be consistent with the existing author rights that NPG allows and their current “Open Access” journal licences but in my opinion would be a mistake. If there is any chance of the accusation that this isn’t “real OA” sticking then NPG will make a rod for their own back. And I really can’t see it making the slightest difference to their cost recovery. Equally the option to allow derivative works? The BBB declarations are unequivocal about derivative works being at the core of Open Access. From  a tactical perspective it would be much simpler and easier for them to go for straight CC-BY. It will get support (or at least neutralize opposition) from even the hardline OA community, and it doesn’t leave NPG open to any criticism of muddying the waters. The fact that such a journal is being released shows that NPG gets the growing importance of Open Access publication. This paragraph, in its current form, suggests that the organization as a whole hasn’t internalised the messages about why. There are people within NPG who get this through and through but this paragraph suggests to me that that understanding has not got far enough within the organisation to make this journal a success. The lack of mention of a specific licence is a red rag and an entirely unnecessary one.

So in summary the outlook is positive. The efforts of the OA movement are having an impact at the highest levels amongst traditional publishers. Whether you view this as a positive or a negative response it is a success in my view that NPG feels that a response is necessary. But the devil is in the details. Critical to both the journal’s success and the success of this initiative as a public relations exercise will be the pricing, the licence and acceptance of the journal by the OA movement. The press release is not as promising on these issues as might be hoped. But it is early days yet and no doubt there will be more information to come as the journal gets closer to going live.

There is a Nature Network Forum for discussions of Nature Communications which will be a good place to see new information as it comes out.