Evidence to the European Commission Hearing on Access to Scientific Information

European Commission
Image by tiseb via Flickr

On Monday 30 May I gave evidence at a European Commission hearing on Access to Scientific Information. This is the text that I spoke from. Just to re-inforce my usual disclaimer I was not speaking on behalf of my employer but as an independent researcher.

We live in a world where there is more information available at the tips of our fingers than even existed 10 or 20 years ago. Much of what we use to evaluate research today was built in a world where the underlying data was difficult and expensive to collect. Companies were built, massive data sets collected and curated and our whole edifice of reputation building and assessment grew up based on what was available. As the systems became more sophisticated new measures became incorporated but the fundamental basis of our systems weren’t questioned. Somewhere along the line we forgot that we had never actually been measuring what mattered, just what we could.

Today we can track, measure, and aggregate much more, and much more detailed information. It’s not just that we can ask how much a dataset is being downloaded but that we can ask who is downloading it, academics or school children, and more, we can ask who was the person who wrote the blog post or posted it to Facebook that led to that spike in downloads.

This is technically feasible today. And make no mistake it will happen. And this provides enormous potential benefits. But in my view it should also give us pause. It gives us a real opportunity to ask why it is that we are measuring these things. The richness of the answers available to us means we should spend some time working out what the right questions are.

There are many reasons for evaluating research and researchers. I want to touch on just three. The first is researchers evaluating themselves against their peers. While this is informed by data it will always be highly subjective and vary discipline by discipline. It is worthy of study but not I think something that is subject to policy interventions.

The second area is in attempting to make objective decisions about the distribution of research resources. This is clearly a contentious issue. Formulaic approaches can be made more transparent and less easy to legal attack but are relatively easy to game. A deeper challenge is that by their nature all metrics are backwards looking. They can only report on things that have happened. Indicators are generally lagging (true of most of the measures in wide current use) but what we need are leading indicators. It is likely that human opinion will continue to beat naive metrics in this area for some time.

Finally there is the question of using evidence to design the optimal architecture for the whole research enterprise. Evidence based policy making in research policy has historically been sadly lacking. We have an opportunity to change that through building a strong, transparent, and useful evidence base but only if we simultaneously work to understand the social context of that evidence. How does collecting information change researcher behavior? How are these measures gamed? What outcomes are important? How does all of this differ cross national and disciplinary boundaries, or amongst age groups?

It is my belief, shared with many that will speak today, that open approaches will lead to faster, more efficient, and more cost effective research. Other groups and organizations have concerns around business models, quality assurance, and sustainability of these newer approaches. We don’t need to argue about this in a vacuum. We can collect evidence, debate what the most important measures are, and come to an informed and nuanced inclusion based on real data and real understanding.

To do this we need to take action in a number areas:

1. We need data on evaluation and we need to able to share it.

Research organizations must be encouraged to maintain records of the downstream usage of their published artifacts. Where there is a mandate for data availability this should include mandated public access to data on usage.

The commission and national funders should clearly articulate that that provision of usage data is a key service for publishers of articles, data, and software to provide, and that where a direct payment is made for publication provision for such data should be included. Such data must be technically and legally reusable.

The commission and national funders should support work towards standardizing vocabularies and formats for this data as well critiquing it’s quality and usefulness. This work will necessarily be diverse with disciplinary, national, and object type differences but there is value in coordinating actions. At a recent workshop where funders, service providers, developers and researchers convened we made significant progress towards agreeing routes towards standardization of the vocabularies to describe research outputs.

2. We need to integrate our systems of recognition and attribution into the way the web works through identifying research objects and linking them together in standard ways.

The effectiveness of the web lies in its framework of addressable items connected by links. Researchers have a strong culture of making links and recognizing contributions through attribution and citation of scholarly articles and books but this has only recently being surfaced in a way that consumer web tools can view and use. And practice is patchy and inconsistent for new forms of scholarly output such as data, software and online writing.

The commission should support efforts to open up scholarly bibliography to the mechanics of the web through policy and technical actions. The recent Hargreaves report explicitly notes limitations on text mining and information retrieval as an area where the EU should act to modernize copyright law.

The commission should act to support efforts to develop and gain wide community support for unique identifiers for research outputs, and for researchers. Again these efforts are diverse and it will be community adoption which determines their usefulness but coordination and communication actions will be useful here. Where there is critical mass, such as may be the case for ORCID and DataCite, this crucial cultural infrastructure should merit direct support.

Similarly the commission should support actions to develop standardized expressions of links, through developing citation and linking standards for scholarly material. Again the work of DataCite, CoData, Dryad and other initiatives as well as technical standards development is crucial here.

3. Finally we must closely study the context in which our data collection and indicator assessment develops. Social systems cannot be measured without perturbing them and we can do no good with data or evidence if we do not understand and respect both the systems being measured and the effects of implementing any policy decision.

We need to understand the measures we might develop, what forms of evaluation they are useful for and how change can be effected where appropriate. This will require significant work as well as an appreciation of the close coupling of the whole system.
We have a generational opportunity to make our research infrastructure better through effective evaluation and evidence based policy making and architecture development. But we will squander this opportunity if we either take a utopian view of what might technically feasible, or fail to act for a fear of a dystopian future. The way to approach this is through a careful, timely, transparent and thoughtful approach to understanding ourselves and the system we work within.

The commission should act to ensure that current nascent efforts work efficiently towards delivering the technical, cultural, and legal infrastructure that will support an informed debate through a combination of communication, coordination, and policy actions.

Enhanced by Zemanta

Best practice in Science and Coding. Holding up a mirror.

The following is the text from which I spoke today at the .Astronomy conference. I think there is some video available on the .Astronomy UStream account and I also have audio which I will put up somewhere soon.

There’s a funny thing about the science and coding communities. Each seems to think that the other has all the answers. Maybe the grass is just greener…For many years as an experimental scientist I looked jealously at both computational scientists and coders in general.  Wouldn’t it be so much less stressfull, I naively thought, to have systems that would do what they were told, to be easily able to re-run experiments and to be able to rely on getting the same answer.  Above all, I thought, imagine the convenience of just being able to take someone else’s work and being able to easily and quickly apply it to my own problems.

There is something of a mythology around code, and perhaps more so around open source, that it can be relied on, that there is a toolkit out there already for every problem. That there is a Ruby Gem, or an R library for every problem, or most memorably that I can sit on a python command line and just demand antigravity by importing it. Sometimes these things are true, but I’m guessing that everyone has experience of it not being true. Of the python library that looks as though it is using dictionaries but is actually using some bizarre custom data type, the badly documented ruby gem, or the perl…well, just the perl really. The mythology doesn’t quite live up to the hype. Or at least not as often as we might like.

But if us experimental scientists have an overoptimistic view of how repeatable and reliable computational tools are then computer scientists have an equally unrealistic view of how experimental science works. Greg Wilson, one of the great innovators in computer science education once said, while criticizing documentation and testing standards of scientific code “An experimental scientist would never get away with not providing their data, not providing their working. Experimental science is expected to be reproducible from the detailed methodology given….”….data provided…detailed methodology…reproducible…this doesn’t really sound like the experimental science literature that I know.

Ted Pedersen in an article with the wonderful title “Empiricism is not a matter of faith” excoriates computational linguistics by holding it up to what he sees as the much higher standards of reproducibility and detailed description of methodology in experimental science. Yet I’ve never been able to reproduce an experiment based only on a paper in my life.

What is interesting about both of these view points is that we are projecting our very real desire to raise standards against a mythology of someone else’s practice. There seems to be a need to view some other community’s practice as the example rather than finding examples within our own. This is odd because it is precisely the best examples, within each community, that inspire the other. There are experimental scientists that give detailed step by step instructions to enable others to repeat their work, who make the details of the protocols available online, and who work within their groups to the highest standards of reproducibility that are possible in the physical world.  Equally there are open source libraries and programmes with documentation that are both succinct and detailed, that just works when you import the library, that is fully tested and comes with everything you need to make sure it will work with your systems. Or that breaks in an informative way, making it clear what you need to do with your own code to get it working.

If we think about what makes science work; effective communication, continual testing and refinement, public criticism of claims and ideas; the things that make up good science, and mean that I had a laptop to write this talk on this morning, that meant the train and taxi I caught actually run, that, more seriously a significant portion of the people in this room did not in fact die in childhood. If we look at these things then we see a very strong correspondence with good practice in software development. High quality and useful documentation is key to good software libraries.  You can be as open source as you like but if no-one can understand your code they’re not going to use it. Controls, positive and negative, statistical and analytical are basically unit tests. Critique of any experimental result comes down to asking whether each aspect of the experiment is behaving the way it should, has each process been tested that a standard input gives the expected output. In a very real sense experiment is an API layer we use to interact with the underlying principles of nature.

So this is a nice analogy, but I think we can take this further, in fact I think that code and experiment are actually linked at a deeper level. Both are an instantiation of process that take inputs and generate outputs. These are (to a first approximation – good enough for this discussion) deterministic in any given instance. But they are meaningless without context. Useless without the meaning that documentation and testing provide.

Let me give you an example. Ana Nelson has written a lovely documentation tool called Dexy. This builds on concepts of literate programming in a beautifully elegant and simple way. Take a look for the details but in essence it enables you to directly incorporate the results of arbitrary running code into your documentation. As you document what your code does you provide examples, parts of the process that are actively running, and testing the code as you go. If you break the method you break your documentation. It is also not an accident that if you are thinking about documentation as you build your code then it helps to create good modular structures that are easy to understand and therefore both easy to use and easy to communicate. They may be a little more work to write but the value you are creating by thinking about the documentation up front means you are motivated to capture this up front. Design by contract and test driven development are tough, Documentation Driven Development can really help drive good process.

Too often when we write a scientific paper it’s the last part of the process. We fabricate a story that makes sense so that we can fit in the bits we want to. Now there’s nothing wrong with this. Humans are narrative processing systems, we need stories to make sense of the world. But its not the whole story. What if, as we collect and capture the events that we ultimately use to tell our story, that we also collect and structure the story of what actually happened? Of the experiments that didn’t work, of the statistical spread of good and bad results. There’s a sarcastic term in synthetic organic chemistry, the “American Yield” in which we imagine that 20 PhD students have been tasked with making a compound and the one who manages to get the highest overall yield gets to be first author. This isn’t actually a particularly useful number. Much more useful to the chemist who wants to use this prep is the spread of values, information that is generally thrown away. The difference between actually incorporating the running of the code into the documentation, and just showing one log file, cut and pasted, from when it worked well. You lose the information about when it doesn’t work.

Other tools from coding can also provide inspiration. Tools like Hudson for continuous integration. Everytime the code base is changed everything gets re-built, dependencies are tested, unit tests run, and a record of what gets broken. If you want to do X you do want to use this version of that library.  This isn’t a problem. In any large codebase things are going to get broken as changes are made, you change something, see what is broken, then go back and gradually fix those things until you’re ready to create commit to the main branch (at which point someone else has broken something…)

Science is continuous integration. This is what we do, we make changes , we check what they break, see if the dependencies still hold and if necessary go back and fix them. This is after all where the interesting science is. Or it would be if we did it properly. David Shotton and others have spoken about the question of “citation creep” or “hedging erosion” [see for example this presentation by Anita de Waard]. This is where something initially reported in one paper as a possibility, or even just a speculation gets converted into fact by a process of citation. What starts as “…it seems possible that…” can get turned into “…as we know that X causes Y (Bloggs et al, 2009)…” within 18 months or a couple of citations. Scientists are actually not very good at checking their dependencies. And they have a tendency of coming back to bite us in exactly the same way as a quick patch that wasn’t properly tested can.

Just imagine if we could do this. If everytime a new paper was added to the literature we could run a test against the rest. Check all the dependencies…if this isn’t true then all of these other papers in doubt as well…indeed if we could unit test papers would it be worth peer reviewing them? There is good evidence that pair coding works, and little evidence that traditional peer review does. What can we learn from this to make the QA processes in science and software development better?

I could multiply examples. What would an agile lab look like? What would be needed to make it work? What can successful library development communities tell us about sharing samples, and what can the best data repositories tell us about building the sites for sharing code? How can we apply the lessons of StackOverflow to a new generation of textbooks and how can we best package up descriptions of experimental protocols in a way that provides the same functionality as sharing an Amazon Machine Image.

Best practice in coding mirrors best practice in science. Documentation, testing, integration are at the core. Best practice is also a long way ahead of common practice in both science and coding. Both, perhaps are driven increasingly by a celebrity culture that is more dependent on what your outputs look like (and where they get published) than whether anyone uses them. Testing and documentation are hardly glamorous activities.

So what can we do about it? Improving practice is an arduous task. Many people are doing good work here with training programmes, tools, standards development and calls to action online and in the scientific literature. Too many people and organizations for me to call out and none of them getting the credit they deserve.

One of the things I have been involved with is to try and provide a venue, a prestigious venue, where people can present code that has been developed to the highest standards. Open Research Computation, a new Open Access journal from BioMedCentral, will publish papers that describe software for research. Our selection criteria don’t depend on how important the research problem is, but on the availability, documentation, and testing of the code. We expect the examples given in these papers to be reproducible, by which we mean that the software, the source code, the data, and the methodology are provided and described well enough that it is possible to reproduce those examples.  By applying high standards, and by working with authors to help them reach those standards we aim to provide a venue which is both useful and prestigious. Think about it, a journal that contains papers describing the most useful and useable tools and libraries is going to get a few citations and (whisper it) ought to get a pretty good impact factor. I don’t care about impact factors but I know the reality on the ground is that that those of you looking for jobs or trying to keep them do need to worry about them.

In the end, the problem with a journal, or with code, or with science is that we want everyone else to provide the best documentation, the best tested code and procedures, but its hard to justify doing it ourselves. I mean I just need something that works; yesterday. I don’t have time to write the tests in advance, think about the architecture, re-read all of that literature to check what it really said. Tools that make this easier will help, tools like Dexy and Hudson, or lab notebooks that capture what we’re doing and what we are creating, rather than what we said we would do, or what we imagine we did in retrospect.

But it’s motivation that is key here. How do you motivate people do the work up front? You can tell them that they have to of course but really these things work best when people want to make the effort. The rewards for making your work re-usable can be enormous but they are usually further down the road than the moment where you make the choice not to bother. And those rewards are less important to most people than getting to the Nature paper, or getting mentioned in Tim O’Reilly’s tweet stream.

It has to be clear that making things re-usable is the highest contribution that you can make, and for it to be rewarded accordingly.  I don’t even really care what forms of re-use are counted, re-use in research, re-use in education, in commerce, in industry, in policy development. ORC is deliberately – very deliberately – intended to hack the impact factor system by featuring highly re-usable tools that will gain lots of citations. We need more of these hacks.

I think this shift is occurring. It’s not widely know just how close UK science funding went to being slashed in the comprehensive spending review. That it wasn’t was due to a highly coherent and well organized campaign that convinced ministers and treasury that the re-use of UK research outputs generated enormous value, both economic, social and educational for the country and indeed globally. That the Sloan Digital Sky Survey was available in a form that could be re-used to support the development of something like Galaxy Zoo played a part in this. The headlong rush of governments worldwide to release their data is a massive effort to realize the potential value of the re-use of that data.

This change in focus is coming. It will no longer be enough in science to just publish. As David Willetts said in [answer to a question] in his first policy speech, “I’m very much in favour of peer review, but I worry when the only form of review is for journals”.  Government wants evidence of wider use. They call it impact, but its basically re-use. The policy changes are coming, the data sharing policies, the public engagement policies, the impact assessments. Just showing outputs will no be enough, showing that you’ve configured those outputs so that the potential for re-use is maximized will be an assumption of receiving funding.

William Gibson said the future is already here, its just unevenly distributed. They Might Be Giants asked, not quite in response, “but where’s my jetpack?”  The jetpacks, the tools, are around us and being developed if you know where to look. Best practice is unevenly distributed both in science and in software development but it’s out there if you want to go looking. The motivation to adopt it? The world around us is changing. The expectations of the people who fund us are changing. Best practice in code and in science have an awful lot in common. If you can master one you will have to tools to help you with the other. And if you have both then you’ll be well positioned to ride the wave of change as it sweeps by.


Enhanced by Zemanta

Now about that filter…

A talk given in two slightly different forms at the NFAIS annual meeting 2010 (where I followed Clay Shirkey, hence the title) and at the Society for General Microbiology in Edinburgh in March. In the first case the talk was part of a panel of presentations intended to give the view of “scholars” to the information professionals. In the second it was part of a session looking at the application of web based tools to research and education.

Abstract (NFAIS meeting): There was more scientific information generated in the past five years than has previously existed, and scientists are simply not coping. Despite the fact that the web was built to enable the communication of data and information between scientists the scientific community has been very slow in exploiting its full capabilities. I will discuss the development and state of the art of collaborative communication and filtering tools and their use by the scientific research community. The reasons for the lack of penetration and resistance to change will be discussed along with the outlines of what a fully functional system would like and most importantly the way this would enable more effective and efficient research.

Enhanced by Zemanta

Google Wave: Ripple or Tsunami?

Big Wave Surfing in Tahiti at Teahupoo
Image by thelastminute via Flickr

A talk given at the Edinburgh University IT Futures meeting late in 2009. The talk discusses the strengths and weaknesses of Wave as a tool for research and provides some pointers on how to think about using it in an academic setting. The talk was recorded in a Wave with members of the audience taking notes around images of the slides which I had previously uploaded.

You will only be able to see the wave if you have a Wave preview account and are logged in. If you don’t have an account the slides are included below (or will be as soon as I can get slideshare to talk to me).

[wave id=”googlewave.com!w+-c2g1ggkA”]

Reblog this post [with Zemanta]