Talking to the next generation – NESTA Crucible Workshop

Yesterday I was privileged to be invited to give a talk at the NESTA Crucible Workshop being held in Lancaster. You can find the slides on slideshare. NESTA, the National Endowment for Science, Technology, and the Arts,  is an interesting organization funded via a UK government endowment to support innovation and enterprise and more particularly the generation of a more innovative and entrepreneurial culture in the UK. Among the programmes it runs in pursuit of this is the Crucible program where a small group of young researchers, generally looking for or just in their first permanent or independent positions, attend a series of workshops to get them thinking broadly about the role of their research in the wider world and to help them build new networks for support and collaboration.

My job was to talk about “Science in Society” or “Open Science”. My main theme was the question of how we justify taxpayer expenditure on research; that to me this implies an obligation to maximise the efficiency of how we do our research. Research is worth doing but we need to think hard about how and what we do. Not surprisingly I focussed on the potential of using web based tools and open approaches to make things happen cheaper, quicker, and more effectively. To reduce waste and try to maximise the amount of research output for the money spent.

Also not surprisingly there was significant pushback – much of it where you would expect. Concerns over data theft, over how “non-traditional” contributions might appear (or not) on a CV, and over the costs in time were all mentioned. However what surprised me most was the pushback against the idea of putting material on the open web versus traditional journal formats. There was a real sense that the group had a respect for the authority of the printed, versus online, word which really caught me out. I often use a gotcha moment in talks to try and illustrate how our knowledge framework is changed by the web. It goes “how many people have opened a physical book for information in the last five years?”. Followed by “and how many haven’t used Google in the last 24 hours”. This is shamelessly stolen from Jamie Boyle incidentally.

Usually you get three or four sheepish hands going up admitting a personal love of real physical books. Generally it is around 5-10% of the audience, and this has been pretty consistent amongst mid-career scientists in both academia and industry, and people in publishing. In this audience about 75% put their hands up.  Some of these were specialist “tool” books, mathematical forms, algorithmic recipes, many of them were specialist texts and many referred to the use of undergraduate textbooks. Interestingly they also brought up an issue that I’ve never had an audience bring up before; that of how do you find a good route into a new subject area that you know little about, but that you can trust?

My suspicion is that this difference comes from three places, firstly that these researchers were already biased towards being less discipline bound by the fact that they’d applied for the workshop. They were therefore more likely to discipline hoppers,  jumping into new fields where they had little experience and needed a route in. Secondly, they were at a stage of their career where they were starting to teach, again possibly slightly outside their core expertise and therefore looking for good, reliable material, to base their teaching on. Finally though there was a strong sense of respect for the authority of the printed word. The printing of the German Wikipedia was brought up as evidence that printed matter was, at least perceived to be, more trustworthy. Writing this now I am reminded of the recent discussion on the hold that the PDF has over the imagination of researchers. There is a real sense that print remains authoritative in a way that online material is not. Even though the journal may never be printed the PDF provides the impression that it could or should be. I would guess also that the group were young enough also to be slightly less cynical about authority in general.

Food for thought, but it was certainly a lively discussion. We actually had to be dragged off to lunch because it went way over time (and not I hope just because I had too many slides!). Thanks to all involved in the workshop for such an interesting discussion and thanks also to the twitter people who replied to my request for 140 character messages. They made a great way of structuring the talk.

What would you say to Elsevier?

In a week or so’s time I have been invited to speak as part of a forward planning exercise at Elsevier. To some this may seem like an opportunity to go in for an all guns blazing OA rant or perhaps to plant some incendiary device but I see it more as opportunity to nudge, perhaps cajole, a big player in the area of scholarly publishing in the right direction. After all if we are right about the efficiency gains for authors and readers that will be created by Open Access publication and we are right about the way that web based systems utterly changes the rules of scholarly communication then even an organization of the size of Elsevier has to adapt or wither away. Persuading them to move in right direction because it is in their own interests would be an effective way of speeding up the process of positive change.

My plan is to focus less on the arguments for making more research output Open Access and more on what happens as a greater proportion of those outputs become freely available, something that I see as increasingly inevitable. Where that proportion may finally be is anyone’s guess but it is going to be a much bigger proportion than it is now. What will authors and funders want and need from their publication infrastructure and what are the business opportunities that arise from those. For me these fall into four main themes:

  • Tracking via aggregation. Funders and institutions want more and more to track the outputs of their research investment. Providing tools and functionality that will enable them to automatically aggregate and slice and dice these outputs is a big business opportunity. The data themselves will be free but providing it in the form that people need it rapidly and effectively will add value that they will be prepared to pay for.
  • Speed to publish as a market differentiator. Authors will want their content out and available and being acted on fast. Speed to publication is potentially the biggest remaining area for competition between journals. This is important because there will almost certainly be less journals with greater “quality” or “brand” differentiation. There is a plausible future in which there are only two journals, Nature and PLoS ONE.
  • Data publication, serving, and archival. There may be less journals but there will be much greater diversity of materials being published through a larger number of mechanisms. There are massive opportunities in providing high quality infrastructure and services to funders and institutions to aggregate, publish, and archive the full set of research outputs. I intend to draw heavily on Dorothea Salo‘s wonderful slideset on data publication for this part.
  • Social search. Literature searching is the main area where there are plausible efficiency gains to be made in the current scholarly publications cycle. According to the Research Information Network‘s model of costs search accounts for a very significant proportion of the non-research costs of  publishing. Building the personal networks (Bill Hooker‘s, Distributed Wetware Online Information Filter [down in the comments] or DWOIF) that make this feasible may well be the new research skill of the 21st century. Tools that make this work effectively are going to be very popular. What will they look like?

But what have I missed? What (constructive!) ideas and thoughts would you want to place in the minds of the people thinking about where to take one of the world’s largest scholarly publication companies and its online information and collaboration infrastructure.?

Full disclosure: Part of the reason for writing this post is to disclose publicly that I am doing this gig. Elsevier are covering my travel and accommodation costs but are not paying any fee.

New Year’s Resolutions 2009

Sydney Harbour Bridge NYE fireworksAll good traditions require someone to make an arbitrary decision to do something again. Last year I threw up a few New Year’s resolutions in the hours before NYE in the UK. Last night I was out on the shore of Sydney Harbour. I had the laptop – I thought about writing something – and then I thought – nah I can just lie here and look at the pretty lights. However I did want to follow up the successes and failures of last year’s resolutions and maybe make a few more for this year.

So last year’s resolutions were, roughly speaking, 1) to adopt the principles of the NIH Open Access mandate when choosing journals for publications, 2) to get more of the existing data within my group online and available, 3) to take the whole research group fully open notebook, 4) to mention Open Notebooks in every talk I gave, and 5) attempt to get explicit funding for developing open notebook approaches.

So successes – the research group at RAL is now (technically) working on an Open Notebook basis. This has taken a lot longer than we expected and the guys are still really getting a feel for what that means both in terms of how the record things and how they feel about it. I think it will improve over time and it just reinforces the message that none of this is easy.  I also made a point about talking about the Open Notebook approach is every talk I gave – mostly this was well received – often there was some scepticism but the message is getting out there.

However we didn’t do so well on picking journals – most of the papers I was on this year were driven by other people or were directed requests for special issues, or both. The papers that I had in mind I still haven’t got written, some drafts exist, but they’re definitely not finished. I also haven’t done any real work on getting older data online – it has been enough work just trying to manage the stuff we already have.

Funding is a mixed bag – the network proposal that was in last New Year’s was rejected. A few proposals have gone in – more haven’t gone in but exist in draft form – and a group of us went close to getting a tender to do some research into the uptake of Web 2. tools in science (more on that later but Gavin Baker has written about it and our tender document itself is available). The success of the year was the funding that Jean-Claude Bradley obtained from Submeta (as well as support from Aldrich Chemicals and Nature Publishing Group) to support the Open Notebook Science Challenge. I can’t take any credit for this but I think it is a good sign that we may have more luck this coming year.

So for this year – there are some follow ons – and some new ones:

  1. I will re-write the network application (and will be asking for help) and re-submit it to a UK funder
  2. I will clean up the “Personal View of Open Science” series of blog posts and see if I can get it published as a perspectives article in a high ranking journal
  3. I will get some of those damn papers finished – and decide which ones are never going to be written and give up on them. Papers I have full control over will go by first preference to Gold OA journals.
  4. I will pull together the pieces needed to take action on the ideas that came out of the Southampton Open Science workshop, specifically the idea of a letter signed by a wide range of scientists and interested people to a high ranking journal stating the importance of working towards published papers being fully supported by data and methodological detail that is fully available
  5. I will focus on doing less things and doing them better – or at least making sure the resources are available to do more of the things I take on…

I think five is enough things to be going on with. Hope you all have a happy new year, whenever it may start, and that it takes you further in the direction you want to go (whether you know what that is now or not) than you thought was possible.

p.s. I noticed in the comments to last year’s post a comment from one Shirley Wu suggesting the idea of running a session at the 2009 Pacific Symposium on Biocomputing – a proposal that resulted in the session we are holding in a few days (again more later on – we hope – streaming video, micro blogging etc). Just thinking about how much has changed in the way such an idea would be raised and explored in the last twelve months is food for thought.

The people you meet on the train…

Yesterday on the train I had a most remarkable experience of synchronicity. I had been at the RIN workshop on the costs of scholarly publishing (more on that later) in London and was heading of to Oxford for a group dinner. On the train I was looking for a seat with a desk and took one up opposite a guy with a slightly battered looking mac laptop. As I pulled out my new Macbook (13” 2.4 GHz, 4 Gb memory since you ask) he leaned across to have a good look, as you do, and we struck up a conversation. He asked what I did and I talked a little about being a scientist and my role at work. He was a consultant who worked on systems integration.
At some stage he made a throwaway comment about the fact that he had been going back to learn or re-learn some fairly advanced statistics and that he had had a lot of trouble getting access to some academic papers, certainly he didn’t want to pay for them, but had managed to find free versions of what he wanted online. I managed to keep my mouth somewhat shut at this point, except to say I had been at a workshop looking at these issues. However it gets better, much better. He was looking into quantitative risk issues and this lead into a discussion about the problems of how science and particularly medicine reporting in the media doesn’t provide links back to the original research (which is generally not accessible anyway) and that, what is worse, the original data is usually not available (and this was all unprompted by me, honestly!). To paraphrase his comment “the trouble with science is that I can’t get at the numbers behind the headlines; what is the sample size, how was the trial run…” Well at this point, all thought of getting any work done went out the window and we had a great discussion about data availability, the challenges of recording it in the right form (his systems integration work includes efforts to deal with mining of large, badly organised data sets), drifted into identity management and trust networks and was a great deal of fun.
What do I take from this? That there is a a demand for this kind of information and data from an educated and knowledgable public. One of the questions he asked was whether as a scientist I ever see much in the way of demand from the public. My response was that, aside from pushing the taxpayer access to taxpayer funded research myself, I hadn’t seen much evidence of real demand. His argument was that there is a huge nascent demand there from people who haven’t thought about their need to get into the detail of news stories that effect them. People want the detail, they just have no idea of how to go about getting it. Spread the idea that access to that detail is a right and we will see the demand for access to the outputs of research grow rapidly. The idea that “no-one out there is interested or competent to understand the details” is simply not true. The more respect we have for the people who fund our research the better frankly.

A personal view of open science – Part III – Social issues

The third installment of the paper (first part, second part) where I discuss social issues around practicing more Open Science.

Scientists are inherently rather conservative in their adoption of new approaches and tools. A conservative approach has served the community well in the process of sifting ideas and claims; this approach is well summarised by the aphorism ‘extraordinary claims require extraordinary evidence’. New methodologies and tools often struggle to be accepted until the evidence of their superiority is overwhelming. It is therefore unreasonable to expect the rapid adoption of new web based tools and even more unreasonable to expect scientsits to change their overall approach to their research en masse. The experience of adoption of new Open Access journals is a good example of this.

Recent studies have shown that scientists are, in principle, in favour of publishing in Open Access journals yet show marked reluctance to publish in such journals in practice [1]. The most obvious reason for this is the perceived cost. Because most operating Open Access publishers charge a publication fee, and until recently such charges were not allowable costs for many research funders, it can be challenging for researchers to obtain the necessary funds. Although most OA publishers will waive these charges there is anecdotally a marked reluctance to ask for such a waiver. Other reasons for not submitting papers to OA journals include the perception that most OA journals are low impact and a lack of OA journals in specific fields. Finally, simple inertia can be a factor where the traditional publication outlets for a specific field are well defined and publishing outside the set of ‘standard’ journals runs the risk of the work simply not being seen by peers. As there is no perception of a reward for publishing in open access journals, and a perception of significant risk, uptake remains relatively small.

Making data available faces similar challenges but here they are more profound. At least when publishing in an open access journal it can be counted as a paper. Because there is no culture of citing primary data, but rather of citing the papers they are reported in, there is no reward for making data available. If careers are measured in papers published then making data available does not contribute to career development. Data availability to date has generally been driven by strong community norms, usually backed up by journal submission requirements. Again this links data publication to paper publication without necessarily encouraging the release of data that is not explicitly linked to a peer reviewed paper. The large scale DNA sequencing and astronomy facilities stand out as cases where data is automatically made available as it is taken. In both cases this policy is driven largely by the funders, or facility providers, who are in position to make release a condition of funding the data collection. This is not, however a policy that has been adopted by other facilities such as synchrotrons, neutron sources, or high power photon sources.

In other fields where data is more heterogeneous and particular where competition to publish is fierce, the idea of data availability raises many fears. The primary one is of being ‘scooped’ or data theft where others publish a paper before the data collector has had the ability to fully analyse the data. This again is partly answered by robust data citation standards but this does not prevent another group publishing an analysis quicker, potentially damaging the career or graduation prospects of the data collector. A principle of ‘first right to publish’ is often suggested. Other approaches include timed embargoes for re-use or release. All of these have advantages and disadvantages which depend to a large extent on how well behaved members of a specific field are. Another significant concern is that the release of substandard, non peer-reviewed, or simply innaccurate data into the public domain will lead to further problems of media hype and public misunderstanding. This must be balanced against the potential public good of having relevant research data available.

The community, or more accurately communities, in general, are waiting for evidence of benefits before adopting either open access publication or open data policies. This actually provides the opportunity for individuals and groups to take first mover advantages. While remaining controversial [3, 4] there is some evidence that publication in open access journals leads to higher citation counts for papers [5, 6] and that papers for which the supporting data is available receive more citations [7]. This advantage is likely to be at its greatest early in the adoption curve and will clearly disappear if these approaches become widespread. There are therefore clear advantages to be had in rapidly adopting more open approaches to research which can be balanced against the risks described above.

Measuring success in the application of open approaches and particularly quantifying success relative to traditional approaches is a challenge, as is demonstrated by the continuing controversy over the citation advantage of open access articles. However pointing to examples of success is relatively straightforward. In fact Open Science has a clear public relations advantage as the examples are out in the open for anyone to see. This exposure can be both good and bad but it makes publicising best practice easy. In many ways the biggest successes of open practice are the ones that we miss because they are right in front of us, the global databases of freely accessible data in biological databases such as the Protein Data Bank, NCBI, and many others that have driven the massive advances in biological sciences over the past 20 years. The ability to analyse and consider the implications of genome scale DNA sequence data, as it is being generated, is now

In the physical sciences, the arXiv has long stood as an example to other disciplines of how the research literature can be made available in an effective and rapid manner, and the availability of astronomical data from efforts such as the Sloan Digital Sky Survey make efforts combining public outreach and the crowdsourcing of data analysis such as Galaxy Zoo possible. There is likely to be a massive expansion in the availability of environmental and ecological data globally as the potential to combine millions of data gatherers holding mobile phones, and sophisticated data aggregation and manipulation tools is realised.

Closer to the bleeding edge of radical sharing there have been less high profile successes, a reflection both of the limited amount of time these approaches have been pursued and the limited financial and personnel resources that have been available. Nonetheless there are examples. Garret Lisi’s high profile preprint on the ArXiv, An exceptionally simple theory of everything, [8] is supported by a comprehensive online notebook at http://deferentialgeometry.org that contains all the arguments as well as the background detail and definitions that support the paper. The announcement by Jean-Claude Bradley of the successful identification of several compounds with activity against malaria [9] is an example where the whole research process was carried out in the open, from the decision on what the research target should be, through the design and in silico testing of a library of chemicals, to the synthesis and testing of those compounds. For every step of this process the data is available online and several of the collaborators that made the study possible made contact due to finding that material online. The potential for a coordinated global synthesis and screening effort is currently being investigated.

There are both benefits and risks associated with open practice in research. Often the discussion with researchers is focussed on the disadvantages and risks. In an inherently conservative pursuit it is perfectly valid to ask whether changes of the type and magnitude offer any benefits given the potential risks they pose. These are not concerns that should be dismissed or ridiculed, but ones that should be taken seriously, and considered. Radical change never comes without casualties, and while some concerns may be misplaced, or overblowm, there are many that have real potential consequences. In a competitive field people will necessarily make diverse decisions on the best way forward for them. What is important is providing as good information to them as is possible to help them balance the risks and benefits of any approach they choose to take.

The fourth and final part of this paper can be found here.

  1. Warlick S E, Vaughan K T. Factors influencing publication choice: why faculty choose open access. Biomedical Digital Libraries. 2007;4:1-12.
  2. Bentley D R. Genomic Sequence Information Should Be Released Immediately and Freely. Science. 1996;274(October):533-534.
  3. Piwowar H A, Day R S, Fridsma D B. Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE. 2007;1(3):e308.
  4. Davis P M, Lewenstein B V, Simon D H, Booth J G, Connolly M J. Open access publishing, article downloads, and citations: randomised controlled trial. BMJ. 2008;337(October):a568.
  5. Rapid responses to David et al., http://www.bmj.com/cgi/eletters/337/jul31_1/a568
  6. Eysenbach G. Citation Advantage of Open Access Articles. PLoS Biology. 2006;4(5):e157.
  7. Hajjem, C., Harnad, S. and Gingras, Y. (2005) Ten-Year Cross-Disciplinary Comparison of the Growth of Open Access and How it Increases Research Citation Impact. IEEE Data Engineering Bulletin, 28 (4). pp. 39-47. http://eprints.ecs.soton.ac.uk/12906/
  8. Lisi G, An exceptionally simple theory of everything, arXiv:0711.0770v1 [hep-th], November 2007.
  9. Bradley J C, We have antimalarial activity!, UsefulChem Blog, http://usefulchem.blogspot.com/2008/01/we-have-anti-malarial-activity.html, January 25 2008.

Where does Open Access stop and ‘just doing good science’ begin?

open access banner
I had been getting puzzled for a while as to why I was being characterised as an ‘Open Access’ advocate. I mean, I do adovcate Open Access publication and I have opinions on the Green versus Gold debate. I am trying to get more of my publications into Open Access journals. But I’m no expert, and I’ve certainly been around this community for a much shorter time and know a lot less about the detail than many other people. The giants of the Open Access movement have been fighting the good fight for many years. Really I’m just a late comer cheering from the sidelines.

This came to a head recently when I was being interviewed for a piece on Open Access. We kept coming round to the question of what it was that motivated me to be ‘such a strong’ advocate of open access publication. I must have a very strong motivation to have such strong views surely? And I found myself thinking that I didn’t. I wasn’t that motivated about open access per se. It took some thinking and going back over where I had come from to realise that this was because of where I was coming from.

I guess most people come to the Open Science movement firstly through an interest in Open Access. The frustration of not being able to access papers, followed by the realisation that for many other scientists it must be much worse. Often this is followed by the sense that even when you’ve got the papers they don’t have the information you want or need, that it would be better if they were more complete, the data or software tools available, the methodology online. There is a logical progression from ‘better access to the literature helps’ to ‘access to all the information would be so much better’.

I came at the whole thing from a different angle. My Damascus moment came when I realised the potential power of making everything available; the lab book, the data, the tools, the materials, and the ideas. Once you connect the idea of the read-write web to science communication, it is clear that the underlying platform has to be open, accessible, and re-useable to get the benefits. Science is perhaps the ultimate open platform available to build on. From this perspective it is immediately self evident that the current publishing paradigm and subscription access publication in particular is broken. But it is just one part of the puzzle, one of the barriers to communication that need to be attacked, broken down, and re-built. It is difficult, for these reasons, for me to separate out a bit of my motivation that relates just to Open Access.

Indeed in some respects Open Access, at least in the form in which it is funded by author charges can be a hindrance to effective science communication. Many of the people I would like to see more involved in the general scientific community, who would be empowered by more effective communication, cannot afford author charges. Indeed many of my colleagues in what appear to be well funded western institutions can’t afford them either. Sure you can ask for a fee waiver but no-one likes to ask for charity.

But I think papers are important. Some people believe that the scientific paper as it exists today is inevitably doomed. I disagree. I think it has an important place as a static document, a marker of what a particular group thought at a particular time, based on the evidence they had assembled. If we accept that the paper has a place then we need to ask how it is funded, particularly the costs of peer and editorial review, and the costs maintaining that record into the future. If you believe, as I do, that in an ideal world this communication would be immediately available to all then there are relatively few viable business models available. What has been exciting about the past few months, and indeed the past week has been the evidence that these business models are starting to work through and make sense. The purchase of BioMedCentral by Springer may raise concerns for the future but it also demonstrates that a publishing behemoth has faith in the future of OA as a publishing business model.

For me, this means that in many ways the discussion has moved on. Open Access, and Open Access publication in particular, has proved its viability. The challenges now lie in widening the argument to include data, to include materials, to include process. To develop the tools that will allow us to capture all of this in a meaningful way and to make sense of other people’s record. None of which should in any way belittle the achievement of those who have brought the Open Access movement to its current point. Immense amounts of blood, sweat, and tears, from thousands of people have brought what was once a fringe movement to the centre of the debate on science communication. The establishing of viable publishers and repositories for pre-prints, the bringing of funders and governments to the table with mandates, and of placing the option of OA publication at the fore of people’s minds are huge achievements, especially given the relatively short time it has taken. The debate on value for money, on quality of communication, and on business models and the best practical approaches will continue, but the debate about the value of, indeed the need for, Open Access has essentially been won.

And this is at the core of what Open Access means for me. The debate has placed, or perhaps re-placed, right at the centre of the discussion of how we should do science, the importance of the quality of communication. It has re-stated the principle of placing the claims that you make, and the evidence that supports them, in the open for criticism by anyone with the expertise to judge, regardless of where they are based or who is funding them. And it has made crystal clear where the deficiencies in that communication process lie and exposed the creeping tendency of publication over the past few decades to become more an exercise in point scoring than communication. There remains much work to be done across a wide range of areas but the fact that we can now look at taking those challenges on is due in no small part to the work of those who have advocated Open Access from its difficult beginnings to today’s success. Open Access Day is a great achievment in its own right and it should be celebration of the the efforts of all those people who have contributed to making it possible as well as an opportunity to build for the future.

High quality communication, as I and others have said, and will continue to say, is Just Good Science. The success of Open Access has shown how one aspect of that communication process can be radically improved. The message to me is a simple one. Without open communication you simply can’t do the best science. Open Access to the published literature is simply one necessary condition of doing the best possible science.

BioBarCamp – Meeting friends old and new and virtual

So BioBarCamp started yesterday with a bang and a great kick off. Not only did we somehow manage to start early we were consistently running ahead of schedule. With several hours initially scheduled for introductions this actually went pretty quick, although it was quite comprehensive. During the introduction many people expressed an interest in ‘Open Science’, ‘Open Data’, or some other open stuff, yet it was already pretty clear that many people meant many different things by this. It was suggested that with the time available we have a discussion session on what ‘Open Science’ might mean. Pedro and mysey live blogged this at Friendfeed and the discussion will continue this morning.

I think for me the most striking outcome of that session was that not only is this a radically new concept for many people but that many people don’t have any background understanding of open source software either which can make the discussion totally impenetrable to them. This, in my view strengthens the need for having some clear brands, or standards, that are easy to point to and easy to sign up to (or not). I pitched the idea, basically adapting from John Wilbank’s pitch at the meeting in Barcelona, that our first target should that all data and analysis associated with a published paper should be available. This seems an unarguable basic standard, but is one that we currently fall far short of. I will pitch this again in the session I have proposed on ‘Building a data commons’.

The schedule for today is up as a googledoc spreadsheet with many difficult decisions to make. My current thinking is;

  1. Kaitlin Thaney – Open Science Session
  2. Ricardo Vidal and Vivek Murthy (OpenWetWare and Epernicus).  Using online communities to share resources efficiently.
  3. Jeremy England & Mark Kaganovich – Labmeeting, Keeping Stalin Out of Science (though I would also love to do John Cumbers on synthetic biology for space colonization, that is just so cool)
  4. Pedro Beltrao & Peter Binfield – Dealing with Noise in Science / How should scientific articles be measured.
  5. Hard choice: Andrew Hessel – building an open source biotech company or Nikesh Kotecha + Shirley Wu – Motivating annotation
  6. Another doozy: John Cumbers – Science Worship / Science Marketing or Hilary Spencer & Mathias Crawford – Interests in Scientific IP – Who Owns/Controls Scientific Communication and Data?  The Major Players.
  7. Better turn up to mine I guess :)
  8.  Joseph Perla – Cloud computing, Robotics and the future of Science and  Joel Dudley & Charles Parrot – Open Access Scientific Computing Grids & OpenMac Grid

I am beginning to think I should have brought two laptops and two webcams. Then I could have recorded one and gone to the other. Whatever happens I will try to cover as much as I can in the BioBarCamp room at FriendFeed, and where possible and appropriate I will broadcast and record via Mogulus. The wireless was a bit tenuous yesterday so I am not absolutely sure how well this will work.

Finally, this has been great opportunity to meet up with people I know and have met before, those who I feel I know well but have never met face to face, and indeed those whose name I vaguely know (or should know) but have never connected with before. I’m not going to say who is in which list because I will forget someone! But if I haven’t said hello yet do come up and harass me because I probably just haven’t connected your online persona with the person in front of me!

Policy and technology for e-science – A forum on on open science policy

I’m in Barcelona at a satellite meeting of the EuroScience Open Forum organised by Science Commons and a number of their partners.  Today is when most of the meeting will be with forums on ‘Open Access Today’, ‘Moving OA to the Scientific Enterprise:Data, materials, software’, ‘Open access in the the knowledge network’, and ‘Open society, open science: Principle and lessons from OA’. There is also a keynote from Carlos Morais-Pires of the European Commission and the lineup for the panels is very impressive.

Last night was an introduction and social kickoff as well. James Boyle (Duke Law School, Chair of board of directors of Creative Commons, Founder of Science commons) gave a wonderful talk (40 minutes, no slides, barely taking breath) where his central theme was the relationship between where we are today with open science and where international computer networks were in 1992. He likened making the case for open science today with that of people suggesting in 1992 that the networks would benefit from being made freely accessible, freely useable, and based on open standards. The fears that people have today of good information being lost in a deluge of dross, of their being large quantities of nonsense, and nonsense from people with an agenda, can to a certain extent be balanced against the idea that to put it crudely, that Google works. As James put it (not quite a direct quote) ‘You need to reconcile two statements; both true. 1) 99% of all material on the web is incorrect, badly written, and partial. 2) You probably  haven’t opened an encylopedia as a reference in ten year.

James gave two further examples, one being the availability of legal data in the US. Despite the fact that none of this is copyrightable in the US there are thriving businesses based on it. The second, which I found compelling, for reasons that Peter Murray-Rust has described in some detail. Weather data in the US is free. In a recent attempt to get long term weather data a research effort was charged on the order of $1500, the cost of the DVDs that would be needed to ship the data, for all existing US weather data. By comparison a single German state wanted millions for theirs. The consequence of this was that the European data didn’t go into the modelling. James made the point that while the European return on investment for weather data was a respectable nine-fold, that for the US (where they are giving it away remember) was 32 times. To me though the really compelling part of this argument is if that data is not made available we run the risk of being underwater in twenty years with nothing to eat. This particular case is not about money, it is potentially about survival.

Finally – and this you will not be surprised was the bit I most liked – he went on to issue a call to arms to get on and start building this thing that we might call the data commons. The time has come to actually sit down and start to take these things forward, to start solving the issues of reward structures, of identifying business models, and to build the tools and standards to make this happen. That, he said was the job for today. I am looking forward to it.

I will attempt to do some updates via twitter/friendfeed (cameronneylon on both) but I don’t know how well that will work. I don’t have a roaming data tariff and the charges in Europe are a killer so it may be a bit sparse.

More on the science exchance – or building and capitalising a data commons

Image from Wikipedia via ZemantaBanknotes from all around the World donated by visitors to the British Museum, London

Following on from the discussion a few weeks back kicked off by Shirley at One Big Lab and continued here I’ve been thinking about how to actually turn what was a throwaway comment into reality:

What is being generated here is new science, and science isn’t paid for per se. The resources that generate science are supported by governments, charities, and industry but the actual production of science is not supported. The truly radical approach to this would be to turn the system on its head. Don’t fund the universities to do science, fund the journals to buy science; then the system would reward increased efficiency.

There is a problem at the core of this. For someone to pay for access to the results, there has to be a monetary benefit to them. This may be through increased efficiency of their research funding but that’s a rather vague benefit. For a serious charitable or commercial funder there has to be the potential to either make money, or at least see that the enterprise could become self sufficient. But surely this means monetizing the data somehow? Which would require restrictive licences, which is not at the end what we’re about.

The other story of the week has been the, in the end very useful, kerfuffle caused by ChemSpider moving to a CC-BY-SA licence, and the confusion that has been revealed regarding data, licencing, and the public domain. John Wilbanks, whose comments on the ChemSpider licence, sparked the discussion has written two posts [1, 2] which I found illuminating and have made things much clearer for me. His point is that data naturally belongs in the public domain and that the public domain and the freedom of the data itself needs to be protected from erosion, both legal, and conceptual that could be caused by our obsession with licences. What does this mean for making an effective data commons, and the Science Exchange that could arise from it, financially viable? Continue reading “More on the science exchance – or building and capitalising a data commons”

Protocols for Open Science

interior detail, stata center, MIT. just outside science commons offices.

One of the strong messages that came back from the workshop we held at the BioSysBio meeting was that protocols and standards of behaviour were something that people would appreciate having available. There are many potential issues that are raised by the idea of a ‘charter’ or ‘protocol’ for open science but these are definitely things that are worth talking about. I thought I would through a few ideas out and see where they go. There are some potentially serious contradictions to be worked through. Continue reading “Protocols for Open Science”