A new way of looking at science?

I’ve spent a long time talking about two things that our LaBLog enables, or rather that it should enable. One is that by changing the way we view the record we can look at our results and materials in a new way. The second is that we want to enable a machine to read the lab book. Andrew Milsted, the main developer of the LaBLog and a PhD student in Jeremy Frey’s group, has just enabled a significant step in that direction. He’s managed to dump my lab book as rdf which enables us to look at it in an rdf viewer such as Welkin, developed by the Simile group at MIT.

At the moment this just shows each post as a node and the links between posts as edges. But there a number of thNetwork view of my labbookings that are immediately obvious. The first is that I start a lot of things and don’t necessarily manage to get very far with them and that I do a number of (currently) unrelated things (isolated subgraphs aren’t connected). Also that there are some materials that get widely re-used and some that don’t. There are also clearly things that I haven’t finished entering properly (isolated nodes). Finally, that we need a more sophisticated tool for playing with the view because building a human readable version of the graph will require some manipulation, grabbing subgraphs and moving them around. Welkin is great but after 30 minutes playing I have a bunch of feature requests. But this is what I’ve done so far. I am sure there are many things that can be done with this kind of view – but for the moment what is important is that it is an entirely new kind of way of looking at the record.

For those interested in following progress on another story, the data and analysis built on the model that Pawel Szczesny built for us is in the bottom right hand corner of the graph. You can see thatat the moment it is isolated from the rest of the graph because we haven’t yet compared these models with our experimental results (actually the relevant experiments aren’t on this graph because it was dumped before we did them). That’s something we should be doing in the next few days. If the data matches the model (current indications are that it does, but data quality is an issue) then we will have something very interesting to say about the structural changes on ligand binding in ligand gated ion channels.

Practical communications management in the laboratory – getting semantics from context

Rule number one: Never give your students your mobile number. They have a habit of ringing it.

Our laboratory is about a ten minute walk from my office. Some of the other staff have offices five minutes away in the other direction and soon we will have another lab which is another ten minute walk away in a third direction. I am also offsite a lot of the time. Somehow we need to keep in contact between the labs and between the people. This is a question of passing queries around but also of managing the way these queries interrupt what I and others are doing.

Having broken rule #1 I am now trying to manage my attention when my phone keeps going off with updates, questions, and details. Much of it at inconvenient times and much of it things that other people could answer. So what is the best way to spread the load and manage the inbox?

What I am going to propose is to setup a lab account on Twitter. If I we get everyone to follow this account and set updates to be sent via SMS to everyone’s phones we have a nice simple notification system. We just set up a Twitter client on each computer in the lab, logged into that account, agree a partly standardised format for Tweets (primarily including person’s name) and go from there. This will enable people to ask questions (and anyone to answer them), provide important updates or notices (equipment broken, or working again), and to keep people updated with what is happening. It also means that we will have a log of everyone’s queries, answers, and notices that we can go back to and archive.

So a fair question at this point would be why don’t we do this through the LaBLog? Surely it would be better to keep all these queries in one place? Well one answer is that we are still struggling to deploy the LaBLog at RAL, but that’s a story for a separate post. But there is a fundamental difference in the way we interact with Twitter/SMS and notifications through the LaBLog via RSS. Notification of new material on the LaBLog via RSS is slow, but more importantly it is fundamentally a ‘pull’ interaction. I choose when to check it. Twitter and specifically the SMS notification is a ‘push’ interaction which will be better when you need people to notice, such as when you’re asking an urgent question, or need to post an urgent notice (e.g. don’t use the autoclave!). However, both allow me to see the content before deciding whether to answer, a crucial difference with a mobile phone call, and they give me options over what medium to respond with. They return the control over my time back to me rather than my phone.

The point is that these different streams have different information content, different levels of urgency, and different currency (how long they are important for). We need different types of action and different functionality for both. Twitter provides forwarding to our mobile devices, regardless (almost) of where in the world we are currently located, providing a mechanism for direct delivery. One of the fundamental problems with all streaming protocols and applications is that they have no internal notion of priority, urgency, or currency. We are rapidly approaching the point where to simple skim all of our incoming streams (currently often in many different places) is not an option. Aggregating things into one place where we can triage them will help but we need some mechanism for encoding urgency, importance, and currency. The easiest way for us to achieve this at the moment is to use multiple services.

One approach to this problem would be a single portal/application that handled all these streams and understood how to deal with them. My guess is that Workstreamr is aiming to fit into this niche as an enterprise solution to handling all workstreams from the level of corporate governance and strategic project management through to the office watercooler conversation. There is a challenging problem in implementing this. If all content is coming into one portal, and can be sent (from any appropriate device) through the same portal, how can the system know what to do with it? Does it pop up as an urgent message demanding the bosses attention or does it just go into a file that can be searched at a later date? This requires that the system either infer or have users provide an understanding of what should be done with a specific message. Each message therefore requires a rich semantic content indicating its importance, possibly its delivery mechanism, and whether this differs for different recipients. The alternative approach is to do exactly what I plan to do – use multiple services so that the semantic information about what should be done with each post is encoded from its context. It’s a bit crude but the level of urgency or importance is encoded in the choice of messenging service.

This may seem like rather a lot of weight to give to the choice between tweeting and putting up a blog post but this is part of a much larger emerging theme. When I wrote about data repositories I mentioned the implicit semantics that comes from using repositories such as slideshare and Flickr (or the PDB) that specialise in a specific kind of content. We talk a lot about semantic publishing and complain that people ‘don’t want to put into the metadata’ but if we recorded data at source, when it is produced, then a lot of the metadata would be built in. This is fundamentally the publish@source concept that I was introduced to by the group of Jeremy Frey at Southampton University. If someone logs into an instrument, we know who generated the data file and when, and we know what that datafile is about and looks like. The datafile itself will contain date and instrument settings. If the sample list refers back to URIs in a notebook then we have all the information on the samples and their preparation. If we know when and where the datafile was recorded and we are monitoring room conditions then we have all of that metadata built in as well.

The missing piece is the tools that bring all this together and a more sophisticated understanding of how we can bring all these streams together and process them. But at the core, if we capture context, capture user focus, and capture the connections to previous work then most of the hard work will be done. This will only become more true as we start to persuade instrument manufacturers to output data in standard formats. If we try and put the semantics back in after the fact, after we’ve lost those connections, then we are just creating more work for ourselves. If the suite of tools can be put together to capture and collate it at source then we can make our lives easier – and that in turn might actually persuade people to adopt these tools.

The key question of course…which Twitter client should I use? :)

We need to get out more…

The speaker had started the afternoon with a quote from Ian Rogers, ‘Losers wish for scarcity. Winners leverage scale.’ He went on to eloquently, if somewhat bluntly, make the case for exposing data and discuss the importance of making it available in a useable and re-useable form. In particular he discussed the sophisticated re-analysis and mashing that properly exposed data enables while excoriating a number of people in the audience for forcing him to screen scrape data from their sites.

All in all, as you might expect, this was music to my ears. This was the case for open science made clearly and succinctly, and with passion, over the course of several days. The speaker? Mike Ellis from EduServ; I suspect both a person and an organization of which most of the readers of this blog have never heard. Why? Because he comes from a background in museums, the data he wanted was news streams, addresses, and lat long for UK higher education institutions, or library catalogues, not NMR spectra or gene sequences. Yet the case to be made is the same. I wrote last week about the need to make better connections between the open science blogosphere and the wider interested science policy and funding community. But we also need to make more effective connections with those for whom the open data agenda is part of their daily lives.

I spent several enjoyable days last week at the UKOLN Institutional Web Managers’ Workshop in Aberdeen. UKOLN is a UK centre of excellence for web based activities in HE in the UK and IWMW is their annual meeting. It is attended primarily by the people who manage web systems within UK HE including IT services, Web services, and library services, as well as the funders, and support organisations associated with these activities.

There were a number of other talks that would be of interest to this community and many of the presentations are available as video at the conference website. James Curral on Web Archiving, Stephanie Taylor on Institutional Repositories, and David Hyett of the British Antarctic Survey providing the sceptics view of implementing Web2.0 services for communicating with the public. His central point, which was well made, was that there is no point adding a whole bunch of wizzbang features to an institutional website if you haven’t got the fundamentals right: quality content; straightforward navigation; relevance to the user. Where I disagreed with his position was that I felt he extrapolated from the fact that most user generated content is poor to the presumption that ‘user generated content on my site will be poor’. This to me misses the key point: that it is by focussing on community building that you generate high quality content that is of relevance to that community. Nonetheless, his central point, don’t build in features that your users don’t want or need, is well made.

David made the statement ‘90% of blogs are boring’ during his talk. I took some exception to this (I am sure the situation is far, far, worse than that). In a question I made the point that it was generally accepted that Google had made the web useable by making things findable amongst the rubbish but that for social content we needed to adopt a different kind of ‘social search’ strategy with different tools. That with the right strategies and the right tools every person could find their preferred 10% (or 1% or 0.00001%) of the world’s material. That in fact this social search approach led to the formation of new communities and new networks.

After the meeting however it struck me that I had failed to successfully execute my own advice. Mike Ellis blogs a bit, twitters a lot, and is well known within the institutional web management community. He lives not far away from me. He is a passionate advocate of data availability and has the technical smarts to do clever staff with the data that is available. Why hadn’t I already made this connection? If I go around making the case that web based tools will transform our ability to communicate where is the evidence that this happens in practice. Our contention is that online publishing frees up communication and allows the free flow of information and ideas. The sceptics contention is that it just allows us to be happy in our own little echo chamber. Elements of both are true but I think it is fair to say that we are not effectively harnessing the potential of the medium to drive forward our agenda. By broadening the community and linking up with like minded people in museums, institutional web services, archives, and libraries we can undoubtedly do better.

So there are two approaches to solving this problem, the social approach and the technical approach. Both are intertwined but can be separated to a certain extent. The social approach is to link existing communities and allow the interlinks between them to grow. This blog post is one attempt – some of you may go on to look at Mike’s Blog. Another is for people to act as supernodes within the community network. Michael Nielsen’s joining of the (mostly) life science oriented community on FriendFeed and more widely in the blogosphere has connected that community with a theoretical physics community and another ‘Open Science’ community that was largely separate from the existing online community. A small number of connections made a big difference to overall network size. I was very happy to accept the invitation to speak at the IWMW meeting precisely because I hoped to make these kinds of connections. Hopefully a few people from the meeting may read this blog post (if so please do leave a comment – lets build on this!). We make contacts we expand the network – but this relies very heavily on supernodes within the network and their ability to cope with the volume.

So is there a technical solution to the problem? Well in this specific case there is a technical problem to the problem. Mike doesn’t use Friendfeed but is a regular Twitter user. My most likely connection to Mike is Brian Kelly, based at UKOLN, who does have a Friendfeed account but I suspect doesn’t monitor it. The connection fails because the social networks don’t effectively interconnect. It turns out the web management community aren’t convinced by FriendFeed and prefer Twitter. So a technical solution would somehow have to bridge this gap. Right at the moment that bridge is most likely to be a person, not a machine, which leaves us back where we started, and I don’t see that changing anytime soon. The problem is an architectural one, not an application or service one. I can aggregate Twitter, FriendFeed or anything else in one place but unless everyone else does the same thing its not really going to help.

I don’t really have a solution except once again to make the case for the value of those people who build stronger connections between poorly interconnected networks. It is not just that information is valuable, but the timely delivery of that information is valuable. These people add value. What is more, if we are going to fully exploit the potential of the web in the near term, not to mention demonstrate the value of exploiting it to others, we need to value these people and support their activities. How we do that is an open question. It will clearly cost money. The question is where to get it from and how to get it to where it needs to be.

Pedro Beltrao writes on the backlash against open science

Pedro has written a thoughtful post detailing arguments he has received against Open Practice in science. He makes a good point that as the ideas around Open Science spread there will inevitably be a backlash. Part of the response to this is to keep saying – as Pedro does and as Jean-Claude, Bill Hooker and others have said repeatedly that we are not forcing anyone to take this approach. Research funders, such as the BBSRC, may have data sharing policies that require some measure of openness but, at the end of the day, if they are paying they get to call the shots.

The other case to make is that this is a more efficient and effective way of doing science. There is a danger, particularly in the US, that open approaches get labelled as ‘socialist’ or something similar. PRISM and the ACS when attacking open access have used the term ‘socialized science’. This has a particular resonance in the US and, I think, is seen as a totally bizarre argument elsewhere in the world but that is not the point. The key point to make is that the case for Open Science is a pure market based argument. Reducing barriers for re-use, breaking out of walled gardens, adds value and makes the market more efficient not less. John Wilbanks has some great blog posts on this subject and an article in Nature Precedings which I highly recommend.

In the comments in Pedro’s post Michael Kuhn asks:

Hmm, just briefly some unbalanced thoughts (I don’t have time to offer more than the advocatus diaboli argument):

Open Science == Communism? I’m wondering if a competition of scientific theories is actually necessary to further science in a sound way. Just to draw the parallel, a lot of R&D in the private sector is done in parallel and in competition, with the result of increased productivity. On the other side, we’ve had things like Comecon and five-year plans to “order” the development and reduce competition, and the result was lower productivity.

I think it is important to counter this kind of argument (and I note that Michael is playing devil’s advocate here – albeit in latin) with arguments that use the economic benefits, and case studies, such as those used by James Boyle in his talk at the recent Science Commons run meeting in Barcelona (which I blogged about here), to show that there is a strong business case to be made. Openness may be more social but it isn’t in any sense socialist. In fact it drives us closer to a pure market than the current system in many way. The business of building value on open content has taken off on the web. Science can do the same and open approaches are more efficient.

Person Bill Hooker

Right click for SmartMenu shortcuts

The full Web2.0 experience – My talk tomorrow at IWMW in Aberdeen

Tomorrow I am giving a talk at the UKOLN Institutional Web Managers workshop starting at 12:45 British Summer Time (GMT+1). In principle you will be able to see the talk video cast at the links on the video streaming page. The page also has a liveblogging tool (OpenID enabled apparently!). I won’t be liveblogging my own talk but I will be attempting to respond to comments or questions either on that tool, via FriendFeed, or @cameronneylon on Twitter. I make no promises that this will work but if it all fails then I will record a live screencast.

Policy for Open Science – reflections on the workshop

Written on the train on the way from Barcelona to Grenoble. This life really is a lot less exotic than it sounds… 

The workshop that I’ve reported on over the past few days was both positive and inspiring. There is a real sense that the ideas of Open Access and Open Data are becoming mainstream. As several speakers commented, within 12-18 months it will be very unusual for any leading institution not to have a policy on Open Access to its published literature. In many ways as far as Open Access to the published literature is concerned the war has been won. There will remains battles to be fought over green and gold routes – the role of licenses and the need to be able to text mine – successful business models remain to be made demonstrably sustainable – and there will be pain as the inevitable restructuring of the publishing industry continues. But be under no illusions that this restructuring has already begun and it will continue in the direction of more openness as long as the poster children of the movement like PLoS and BMC continue to be successful.

Open Data remains further behind, both with respect to policy and awareness. Many people spoke over the two days about Open Access and then added, almost as an addendum ‘Oh and we need to think about data as well’. I believe the policies will start to grow and examples such as the BBSRC Data Sharing Policy give a view of the future. But there is still much advocacy work to be done here. John Wilbanks talked about the need to set achievable goals, lines in the sand which no-one can argue with. And the easiest of these is one that we have discussed many times. All data associated with a published paper, all analysis, and all processing procedures, should be made available. This is very difficult to argue with – nonetheless we know of examples where the raw data of important experiments is being thrown away. But if an experiment cannot be shown to have been done, cannot be replicated and checked, can it really be publishable? Nonetheless this is a very useful marker, and a meme that we can spread and promote.

In the final session there was a more critical analysis of the situation. A number of serious questions were raised but I think they divide into two categories. The first involves the rise of the ‘Digital Natives’ or the ‘Google Generation’. The characteristics of this new generation (a gross simplification in its own right) are often presented as a pure good. Better networked, more sharing, better equipped to think in the digital network. But there are some characteristics that ought to give pause. A casualness about attribution, a sense that if something is available then it is fine to just take it (its not stealing after all, just copying). There is perhaps a need to recover the roots of ‘Mertonian’ science, to as I think James Boyle put it, publicise and embed the attitudes of the last generation of scientists, for whom science was a public good and a discipline bounded by strict rules of behaviour. Some might see this as harking back to an elitist past but if we are constructing a narrative about what we want science to be then we can take the best parts of all of our history and use it to define and refine our vision. There is certainly a place for a return to the compulsory study of science history and philosophy.

The second major category of issues discussed in the last session revolved around the question of what do we actually do now. There is a need to move on many fronts, to gather evidence of success, to investigate how different open practices work – and to ask ourselves the hard questions. Which ones do work, and indeed which ones do not. Much of the meeting revolved around policy with many people in favour of, or at least not against, mandates of one sort or another. Mike Carroll objected to the term mandate – talking instead about contractual conditions. I would go further and say that until these mandates are demonstrated to be working in practice they are aspirations. When they are working in practice they will be norms, embedded in the practice of good science. The carrot may be more powerful than the stick but peer pressure is vastly more powerful than both.

So they key questions to me revolve around how we can convert aspirations into community norms. What is needed in terms of infrastructure, in terms of incentives, and in terms of funding to make this stuff happen? One thing is to focus on the infrastructure and take a very serious and critical look at what is required. It can be argued that much of the storage infrastructure is in place. I have written on my concerns about institutional repositories but the bottom line remains that we probably have a reasonable amount of disk space available. The network infrastructure is pretty good so these are two things we don’t need to worry about. What we do need to worry about, and what wasn’t really discussed very much in the meeting, is the tools that will make it easy and natural to deposit data and papers.

The incentive structure remains broken – this is not a new thing – but if sufficiently high profile people start to say this should change, and act on those beliefs, and they are, then things will start to shift. It will be slow but bit by bit we can imagine getting there. Can we take shortcuts. Well there are some options. I’ve raised in the past the idea of a prize for Open Science (or in fact two, one for an early career researcher and one for an established one). Imagine if we could make this a million dollar prize, or at least enough for someone to take a year off. High profile, significant money, and visible success for someone each year. Even without money this is still something that will help people – give them something to point to as recognition of their contribution. But money would get people’s attention.

I am sceptical about the value of ‘microcredit’ systems where a person’s diverse and perhaps diffuse contributions are aggregated together to come up with some sort of ‘contribution’ value, a number by which job candidates can be compared. Philosophically I think it’s a great idea, but in practice I can see this turning into multiple different calculations, each of which can be gamed. We already have citation counts, H-factors, publication number, integrated impact factor as ways of measuring and comparing one type of output. What will happen when there are ten or 50 different types of output being aggregated? Especially as no-one will agree on how to weight them. What I do believe is that those of us who mentor staff, or who make hiring decisions should encourage people to describe these contributions, to include them in their CVs. If we value them, then they will value them. We don’t need to compare the number of my blog posts to someone else’s – but we can ask which is the most influential – we can compare, if subjectively, the importance of a set of papers to a set of blog posts. But the bottom line is that we should actively value these contributions – let’s start asking the questions ‘Why don’t you write online? Why don’t you make your data available? Where are your protocols described? Where is your software, your workflows?’

Funding is key, and for me one of the main messages to come from the meeting was the need to think in terms of infrastructure, and in particular, to distinguish what is infrastructure and what is science or project driven. In one discussion over coffee I discussed the problem of how to fund development projects where the two are deeply intertwined and how this raises challenges for funders. We need new funding models to make this work. It was suggested in the final panel that as these tools become embedded in projects there will be less need to worry about them in infrastructure funding lines. I disagree. Coming from an infrastructure support organisation I think there is a desperate need for critical strategic oversight of the infrastructure that will support all science – both physical facilities, network and storage infrastructure, tools, and data. This could be done effectively using a federated model and need not be centralised but I think there is a need to support the assumption that the infrastructure is available and this should not be done on a project by project basis. We build central facilities for a reason – maybe the support and development of software tools doesn’t fit this model but I think it is worth considering.

This ‘infrastructure thinking’ goes wider than disk space and networks, wider than tools, and wider than the data itself. The concept of ‘law as infrastructure’ was briefly discussed. There was also a presentation looking at different legal models of a ‘commons’; the public domain, a contractually reconstructed commons, escrow systems etc. In retrospect I think there should have been more of this. We need to look critically at different models, what they are good for, how they work. ‘Open everything’ is a wonderful philosophical position but we need to be critical about where it will work, where it won’t, and where it needs contractual protection, or where such contractual protection is counter productive. I spoke to John Wilbanks about our ideas on taking Open Source Drug Discovery into undergraduate classes and schools and he was critical of the model I was proposing, not from the standpoint of the aims or where we want to be, but because it wouldn’t be effective at drawing in pharmaceutical companies and protecting their investment. His point was, I think, that by closing off the right piece of the picture with contractual arrangements you bring in vastly more resources and give yourself greater ability to ensure positive outcomes. That sometimes to break the system you need to start by working within it by, in this case, making it possible to patent a drug. This may not be philosophically in tune with my thinking but it is pragmatic. There will be moments, especially when we deal with the interface with commerce, where we have to make these types of decisions. There may or may not be ‘right’ answers, and if there are they will change over time but we need to know our options and know them well so as to make informed decisions on specific issues.

But finally, as is my usual wont, I come back to the infrastructure of tools. The software that will actually allow us to record and order this data that we are supposed to be sharing. Again there was relatively little on this in the meeting itself. Several speakers recognised the need to embed the collection of data and metadata within existing workflows but there was very little discussion of good examples of this. As we have discussed before this is much easier for big science than for ‘long tail’ or ‘small science’. I stand by my somewhat provocative contention that for the well described central experiments of big science this is essentially a solved problem – it just requires the will and resources to build the language to describe the data sets, their formats, and their inputs. But the problem is that even for big science, the majority of the workflow is not easily automated. There are humans involved, making decisions moment by moment, and these need to be captured. The debate over institutional repositories and self archiving of papers is instructive here. Most academics don’t deposit because they can’t be bothered. The idea of a negative click repository – where this is a natural part of the workflow can circumvent this. And if well built it can make the conventional process of article submission easier. It is all a question of getting into the natural workflow of the scientist early enough that not only do you capture all the contextual information you want, but that you can offer assistance that makes them want to put that information in.

The same is true for capturing data. We must capture it at source. This is the point where it has the potential to add the greatest value to the scientist’s workflow by making their data and records more available, by making them more consistent, by allowing them to reformat and reanalyse data with ease, and ultimately by making it easy for them to share the full record. We can and we will argue about where best to order and describe the elements of this record. I believe that this point comes slightly later – after the experiment – but wherever it happens it will be made much easier by automatic capture systems that hold as much contextual information as possible. Metadata is context – almost all of it should be possible to catch automatically. Regardless of this we need to develop a diverse ecosystem of tools. It needs to be an open and standards based ecosystem and in my view needs to be built up of small parts, loosely coupled. We can build this – it will be tough, and it will be expensive but I think we know enough now to at least outline how it might work, and this is the agenda that I want to explore at SciFoo.

John Wilbanks had the last word, and it was a call to arms. He said ‘We are the architects of Open’. There are two messages in this. The first is we need to get on and build this thing called Open Science. The moment to grasp and guide the process is now. The second is that if you want to have a part in this process the time to join the debate is now. One thing that was very clear to me was that the attendees of the meeting were largely disconnected from the more technical community that reads this and related blogs. We need to get the communication flowing in both directions – there are things the blogosphere knows, that we are far ahead on, and we need to get the information across. There are things we don’t know much about, like the legal frameworks, the high level policy discussions that are going on. We need to understand that context. It strikes me though that if we can combine the strengths of all of these communities and their differing modes of communication then we will be a powerful force for taking forward the open agenda.

Policy for open science – the wrap up session

Today Science Commons sponsored a meeting looking at the policy issues involved in Open Access and Open Science more widely. I blogged James Boyle’s keynote earlier and there was some notes along the way via Twitter. This is a set of notes from the last session of the meeting, a panel with Alexis-Michel Mugabushaka (European Science Foundation), Javier Hernandez-Ros (European Commission), Michael Carroll (University of Villanova, Creative Commons) and John Wilbanks (Science Commons). These notes were taken at speed and are my own record of what happened. They should not necessarily be taken as a transcript of what the panellists said and any inaccuracies are my own fault.

It is important to focus on the barriers to open science. Much has been said about ideals and beliefs but there remains a paucity of real evidence to support the assertions of the open science community. Changing policy, culture, and challenging entrenched positions, putting 50,000 jobs in the publishing industry at risk, requires strong evidence of benefits.  On top of this the consideration of efficiency both with respect to time and money. Details are important. What is data? What is meant by data sharing? Both from a legal and descriptive perspective.

Some issues looked at by the European Commission:

1) Who does what, how is this to be organised? At national, institutional, discpline, European, level? A mixture of top down and bottom up witll be required to make progress. Again efficient provision is required because uneceessary duplication wll be counterproductive and expensive.

2)  The legal issues, copright etc. The issue is not copyright per se but licensing and the power games associated with the ecomonic need of commercial entities. There is a need for a meaningful discussion across the stakeholders. 

3) Technical issues. The whole area is developing and it is not clear where the funding streams will continue as these become more embedded in general scientific practice.

The key issues are to bring everyone on board as part of the discussion. What is the contribution that publishers can make? View this from a positive angle, not a a combative angle.  What are the roles of the various stakeholders.

Things actually need to be done! Much talk about what should be done. Less on how to actually do it.

This meeting is a momentin time where the idea of Openness has become mainstream. The argument started on the fringe. There is a risk of over confidence in the arguments. The issue of evidence is an important one. Can ROI be actually shown? It is known that there are unexpected audiences for information from the web – new innovation will emerge but it is unclear how much or where it will come from.

In the short term there is still a fight but in the long term the world will change – as the next generation of scientists come through. But we don’t necessarily need or want to wait for them to arrive. We may want to ensure that some values from the previous generation of scientists are transmitted effectively. This moment in time is still transitional. Most of us were born into an analogue world but we still don’t think digitally. When people design initiatives and projects with the network in mind – rather than as an afterthought or add on – the efficiencies will be seen much more strongly. There is an argument that the EU model of funding and thinking can help to drive the network effects that will demonstrate the benefits more effectively than the US funding models. A missing part is the link to business models – what is the profit model for openness? Again, Google works (and is profitable).

The fight will go on – but thinking forward, how would we do work and plan for future projects built into the network.

The law as infrastructure. Creative Commons came out of the wish of lawyers to build a technical infrastructure that works for open networks. The legal issues of sharing data haven’t been discussed very much. How can a legal infrastructure be built to assist openness. This is the role of Creative/Science Commons. ‘No-one want to invite lawyers to the party’ but some of them really want to help it go off well!

The purpose of the meeting was to bring people together. There is a need for people to understand they are doing the same thing in different domains. It is difficult to fund the commons – but by brnging people together it is possible to build a social network that may assist in identifying funding options. Science funders don’t do policy, but policy people don’t fund science. A policy statement will be forthcoming in the future taking the current recommendations.

The benefits of the open web come from the explosion of people actually using a computer network. We must think of the users of an open architected science with the same potential for explosion. Can we make it possible to do science withouot being a) rich and b) a member of a closed guild. What would happen if 100 million people could ask scientific questions – even if relatively few of them were really smart questions.

The tower of babel is all over science but also all over computation. What we need is an institute of science langugages to standardise. A cancer marker with 94 synonyms would make Academic Francaise apoplectic. We are a long way from being able to solve either of these problems. How can the tools and standards be built.

We are the architects of Open. We need to link all the work that has been described in the meeting. We need an Open architecture to describe and deliver openness. The short term impact of technology is always over estimated but the long term is underestimated. We don’t have to worry too much about the long term impacr – we can just get on and do it.

‘In theory there is no difference between theory and practice. In practice there is’ – lets just deploy and get on with it. Lets build the evidence and the tools and the standards that will take us forward.

Policy and technology for e-science – A forum on on open science policy

I’m in Barcelona at a satellite meeting of the EuroScience Open Forum organised by Science Commons and a number of their partners.  Today is when most of the meeting will be with forums on ‘Open Access Today’, ‘Moving OA to the Scientific Enterprise:Data, materials, software’, ‘Open access in the the knowledge network’, and ‘Open society, open science: Principle and lessons from OA’. There is also a keynote from Carlos Morais-Pires of the European Commission and the lineup for the panels is very impressive.

Last night was an introduction and social kickoff as well. James Boyle (Duke Law School, Chair of board of directors of Creative Commons, Founder of Science commons) gave a wonderful talk (40 minutes, no slides, barely taking breath) where his central theme was the relationship between where we are today with open science and where international computer networks were in 1992. He likened making the case for open science today with that of people suggesting in 1992 that the networks would benefit from being made freely accessible, freely useable, and based on open standards. The fears that people have today of good information being lost in a deluge of dross, of their being large quantities of nonsense, and nonsense from people with an agenda, can to a certain extent be balanced against the idea that to put it crudely, that Google works. As James put it (not quite a direct quote) ‘You need to reconcile two statements; both true. 1) 99% of all material on the web is incorrect, badly written, and partial. 2) You probably  haven’t opened an encylopedia as a reference in ten year.

James gave two further examples, one being the availability of legal data in the US. Despite the fact that none of this is copyrightable in the US there are thriving businesses based on it. The second, which I found compelling, for reasons that Peter Murray-Rust has described in some detail. Weather data in the US is free. In a recent attempt to get long term weather data a research effort was charged on the order of $1500, the cost of the DVDs that would be needed to ship the data, for all existing US weather data. By comparison a single German state wanted millions for theirs. The consequence of this was that the European data didn’t go into the modelling. James made the point that while the European return on investment for weather data was a respectable nine-fold, that for the US (where they are giving it away remember) was 32 times. To me though the really compelling part of this argument is if that data is not made available we run the risk of being underwater in twenty years with nothing to eat. This particular case is not about money, it is potentially about survival.

Finally – and this you will not be surprised was the bit I most liked – he went on to issue a call to arms to get on and start building this thing that we might call the data commons. The time has come to actually sit down and start to take these things forward, to start solving the issues of reward structures, of identifying business models, and to build the tools and standards to make this happen. That, he said was the job for today. I am looking forward to it.

I will attempt to do some updates via twitter/friendfeed (cameronneylon on both) but I don’t know how well that will work. I don’t have a roaming data tariff and the charges in Europe are a killer so it may be a bit sparse.

Science in the YouTube Age – introductory screencast for a talk I’m giving at IWMW

The UKOLN Institutional Web Managers Workshop is running in Aberdeen from 22-24 July and I am giving a talk discussing the impact of Web2.0 tools on science. My main theme will be the that the main cultural reasons for lack of uptake relate to the fear of losing control over data and ideas. Web2.0 tools rely absolutely on the willingness of people to make useful material available. In science this material is data, ideas, protocols, and analyses. Prior to publication most scientists are very sceptical of making their hard earned data available – but without this the benefits that we know can be achieved through network effects, re-use of data, and critical analysis of data and analysis, will not be seen. The key to benefiting from web based technologies is adopting more open practices.

The video below is a screencast of a shorter version of the talk intended to give people the opportunity to make comments, ask questions, or offer suggestions. I wanted to keep it short so there are relatively few examples in there – there will be much more in the full talk. For those who can’t make it to Aberdeen I am told that the talks are expected to be live videocast and I will provide a URL as soon as I can. If this works I am also intending to try and respond to comments and questions via FriendFeed or Twitter in real time. This may be foolhardy but we’ll see how it goes. Web2 is supposed to be about real time interaction after all!

I don’t seem to be able to embed the video but you can find it here.

What I missed on my holiday or Why I like refereeing for PLoS ONE

I was away last week having a holiday and managed to miss the whole Declan Butler/PLoS/Blogosphere dustup. Looked like fun. I don’t want to add to the noise as I think there was a lot of knee jerk reactions and significantly more heat than light. For anyone coming here without having heard about this I will point at the original article, Bora’s summary of reactions, and Timo Hannay’s reply at Nature. What I wanted to add to the discussion was a point that I haven’t seen in my quick skimming of the whole debate (which is certainly not complete so if I missed this then please drop in a comment).

No-one as far as I can see has really twigged as to just how disruptive PLoS ONE really is. In this I agree with Timo, in that I think publishers, from BMC, to Elsevier, ACS and Nature Publishing Group itself, should be very worried about the impact that it will have and think very hard about what it means for their future business models. Where we disagree, I think, is that I find this very exciting and think that it shows the way towards a scientific publishing industry that will look very different from todays’. Diffentiating on quality prior to publication was always difficult, and certainly expensive. The question for the future is whether we are prepared to pay for it, and are we getting value for money?

The criticism levelled at PLoS ONE is that it uses a ‘light touch’ refereeing process with the only criterion for publication being that a paper is methodologically sound. This, it is implied leads to a ‘low quality journal’ or perhaps rather a journal with a large number of relatively uncited articles. However there are very strong positives to this ‘light touch’ approach. It is fast. And it is cheap. The issue here is business models and the business model of PLoS ONE is highly disruptive. And financially successful. To me this is the big news. People are flocking to PLoS ONE because it is a quick and straightforward way of getting interesting (but perhaps not career making) results out there.

From an author’s perspective PLoS ONE cuts out the crap in getting papers published. The traditional approach (send to Nature/Science/Cell, get rejected, send to Nature/Science/Cell baby journal, get rejected, send to top tier specific journal, get rejected, end up eventually going to a journal that no-one subscribes to) takes time and effort and by the time you win someone else has usually published it anyway. It also costs the authors money in staff time to re-format, rejig, appease referees, re-jig again to appease a different set of referees. I haven’t done the sums but worst case scenario this could probably cost as much as a PLoS ONE publication charge.  Save time, save money, still get indexed in PubMed. It starts to sound good, especially for all that material that you are not quite sure where to pitch.

But what about that stuff that is really hard hitting? That you know is important. Here you now have an interesting choice. You can send to Nature/Science/Cell/PLoS Biology and if you get past the initial editorial review stage and get to referees then you are probably looking at around six to nine months before publication. You will be in a high profile journal, can generate good publicity, have great paper on your CV. Alternatively you can send to PLoS ONE and have it on the web and in PubMed in perhaps two to four weeks. If the paper is as strong as you believe then you will still get your hundreds of citations, still have a great paper, still get good publicity. It probably doesn’t look quite so good on the traditional CV, but try putting the number of citations for each paper on your CV – that puts it in perspective. And it will be out a lot faster, you will be ahead of the game and you can apply for your next grant with ‘paper published and already cited three times’ not ‘paper submitted’ (read ‘about to be rejected’, there is an art in submitting papers just before the grant deadline).  This makes for an interesting choice and one which cuts directly across the usual high impact/low impact criterion. It puts speed and convenience on the table as market differentiators in a way they haven’t been before.

As a referee PLoS ONE has a lot of appeal as well. You are being asked a very specific question. I recently refereed one paper for PLoS ONE at the same time as one for another (fairly low impact) journal. The PLoS ONE paper was a very simple case, the methodological detail was exemplary; easy to read, clear, and detailed. You get the impression the authors took care over it, possibly because they knew that was what it would be judged on (it is of course entirely possible that this group just writes good papers). The other paper was a distinct case of salami slicing – but I was left with trying to figure out whether it had been cut too thin for this specific journal. This is not just a difficult judgement to make. It is a highly subjective and probably meaningless one. The data was still useful and publishable, just probably not in that specific journal. Which one do you think took me longer? And which one left me with a warm feeling?

What about the reader? There is a lot of interesting stuff in PLoS ONE. There is also a lot of dross. But why should that matter? I don’t look at the dross; I often don’t even know that it exists. I can’t remember the last time I actually looked at a a journal table of contents. It doesn’t matter to me whether a paper is in Nature, Science, PLoS ONE, or Journal of the society for some highly specific thing in some rather small place. If it is searchable, and I have access to it then that’s all I need. If it is not both of these then for me it simply does not exist.  And I don’t judge the value or reliability of an article based on where it is, I judge the article on what it contains. PLoS ONE actually wins here because its hard focus on being ‘methodologically sound’ tends to lead referees and editors (as well as authors) to focus on this aspect.

To me the truly radical thing about PLoS ONE is that is has redefined the nature of peer review and that people have bought into this model. The idea of dropping any assessment of ‘importance’ as a criterion for publication had very serious and very real risks for PLoS. It was entirely possible that the costs wouldn’t be usefully reduced. It was more than possible that authors simply wouldn’t submit to such a journal. PLoS ONE has successfully used a difference in its peer review process as the core of its appeal to its customers. The top tier journals have effectively done this for years at one end of the market. The success of PLoS ONE shows that it can be done in other market segments. What is more it suggests it can be done across  existing market segments. That radical shift in the way scientific publishing works that we keep talking about? It’s starting to happen.