public-engagement – Science in the Open

July 30, 2009December 30, 2009

Telling stories…

On Tuesday I was able to sit in on a conversation that is regularly held within the Computer Science department at University of Toronto that focuses broadly on what can computer science bring as a discipline and a skill set to the sciences more generally. The conversation is lead by Steve Easterbrook so there is a focus on climate science but we also roamed much more widely than that.

A key question that was raised, one which many of us have been struggling with for some time, is how to describe and publish descriptions of the progress of research projects in a way that provides a route in for non-specialists. Blogs provide a great way to do this, either as a generic journal with more or less detail, or as an overlay over a more detailed open notebook. Jean-Claude Bradley’s UsefulChem blog is a great example of the latter, and the blogs of Rosis Redfield’s group an example of the former.

The conversation was interesting for me in that it pinned down the idea and necessity of creating a narrative. This contrasts with the kind of (largely incomprehensible) detail found in a notebook which is usually fragmented and often distributed. One of the things that researchers are quite poor at in my experience is actually recording the why of an experiment; the question of how it fits into the wider context. Again blogs are a great format for doing this but where is the motivation? Writing that narrative in any form is hard work, a classic example of work that “takes me away from the bench” so how can it be justified?

One reason is that it raises the profile of the research, always an important issue in today’s research environment. But this is more important to some people than to others. Another very valid reason is to take personal notes, to create a personal narrative of what you are doing and have done that you can return to and use as an index to your own work. In a later discussion with Alicia Grubb she mentioned that her supervisors insisted on her blogging about literature that she collected. Equally taking notes in a bookmarking service could provide the same functionality. But understanding the context in which you bookmarked something is valuable. Brent Mombourquette, an undergraduate student also demonstrated a nice Firefox plugin that he had developed which captures and displays browsing history as a directed graph which is an interesting tool to think about in this context. I’ll write more about some of the fabulous student demos I saw later.

For me though, the biggest benefit of making your research accessible, is that it provides an entry route for new people to come in and help. The story of Galaxy Zoo shows how by placing a question within an understandable narrative you can enable people to come in and help out. No-one is going to come in from the outside and comment on my lab notebook unless they are already a specialist with a specific question. I’ve often thought I should start another blog to discuss more generally what is happening in my “real” research. Maybe this is the time to do that.

But you can also take this one step further. There was a debate in the comments on the RealClimate blog a few months ago about making the details of data and analysis publicly available. One real and valid concern was that denialists would dig into the detail and mis-represent problems or mistakes to advance their own agenda. Dealing with this kind of thing takes up valuable time, time the average researcher, particularly if they are committed to taking time to engage with a wider community, doesn’t have. My question was whether you could configure that public release of data and process in such a way that even those who are working against you are helping you. If people are searching your code for bugs then surely there must be a way of taking advantage of that?

The argument that releasing data costs you time sounds compelling when it comes from a researcher. But equally the same argument sounds dangerous when it is made by, for example, a government. As Steve said tongue in cheek, perhaps channeling some recently removed political leaders, “clearly we’re not going to release this data because it would take us time to deal with public complaints, and that will costs taxpayer. In fact, we’re not even going to run the consultation because that would cost money. It’s much cheaper and more effective use of your tax dollars if you just trust us to do the right thing.” The situations clearly aren’t exact parallels and resources for communication are much more limited in a research setting but it would be interesting to think about parallel cases in different domains, such as government and research, and how the domain effects the credibility of the argument. If you believe in the value of sunshine as disinfectant for government data then you need a strong case to argue the same doesn’t apply to research data.

But if you decide that you want to make that narrative public, or even better the narrative along with the underlying data, it does take work to make it comprehensible. As I’ve discovered recently such posts don’t translate easily into papers so making the argument that you can re-use the text doesn’t really work, at least for me. To make this worthwhile you either have to be required to do it: JISC in the UK basically requires that all funded projects have blogs; or you have to believe and work towards the benefits it can bring you. In a sense this is actually just recapturing the idea of the research notebook that many historical scientists kept, and which make such rich pickings for modern historians of science. Somewhere along the line we lost that. There are lots of tools around that can help you create that narrative, from clickstreams, records, environmental capture, but these remain only an aide memoire. Making the story is something that will probably remain a purely human activity. It is something that we seem highly evolved to do and it remains the most effective means of human to human communication. The computers can help, and they can provide the detail to dig down into if desired, but the story itself will remain ours alone for a while yet I think.

July 17, 2009December 30, 2009

Sci – Bar – Foo etc. Part II – SciFoo – Engaging with the world

Last Friday afternoon (was it really only a week ago?) about 200 people made their way to the Googleplex in Mountain View for the fourth SciFoo. There are many people who got their blog posts out well before me so I will focus on the sessions which don’t seem to have been heavily discussed and try to draw a few themes out.

For me, the over riding theme that came through was Engagement. Engaging people beyond the narrow confines of the professional research community in real research projects, making science more engaging for students, and engaging in a serious way with both the tools that are available to help us do these things, and increasingly with data generation and dissemination processes that are not under our control.

I was involved in running two sessions. The first with Peter Murray-Rust was on Open Data, focussed on getting feedback on the current form of the Panton Principles and has been blogged in detail by Peter. For me the main message from this was a lack of push-back. Many of the more technical people in the room were bemused that there was a problem. “Just put it on the web” was a common response. Other’s were concerned about where data stops and creative works begin but the main message for me was that “for published data just put it explicitly in the public domain” was seen as the right thing to do by the people in the room. Indeed most were suprised it was even worth discussing.

The second session I ran was on Google Wave in research and this will get a whole post of its own very soon so I won’t discuss it in detail here. Suffice to say that there was excitement, great ideas about what could be done, and concerns about the details of technical implementation. Which to me seems like an excellent mix to make progress with. Engagement for these two sessions was engagement with the data and engagement with the technology for generating, annotating, and sharing that data.

The other sessions I would like to draw a common theme through were more focussed on public engagement and education. The first session I attended on Saturday morning was run by Daniel Glaser called Doing Science in Non-Science Spaces. This was an interesting discussion on many levels but particularly for me because it challenged my ideas about multi-disciplinary working and deploying research projects into an educational setting. Daniel described disciplinary boundaries as fractal and described multidisciplinary projects as requiring as space where people can come together in a safe common space to share ideas, but also a requirement for people to then disperse again and re-intepret the outputs in the context of their own experience and discipline. In this view disciplinary boundaries are important in enabling effective summarisation and communication of outputs. I’ve been kicking myself ever since for not thinking to ask whether that means these boundaries are any less arbitrary than those of us who are interdisciplinary always feel.

Another challenge to my thinking from this session was the need to give up control over the shared collaboration space. In thinking about putting research projects into educational settings I’ve always looked at the process as trying to find a question within the research that can be understood and answered by students. The argument here was that to truly engage students it would be necessary to let them find and answer their own questions. I’m not sure how in practice to think about that in terms of drug discovery or how it maps on the success of projects like Galaxy Zoo but it bears some thinking about.

Also focussed on interactions beyond the professional research community was Ariel Waldman‘s session “Open collaboration between scientists, communities, and the unknown” which followed on from a session of the same title at SciBarCamp which I somehow missed. Here the focus was on problems with sharing research with the wider world, with similar problems to those of sharing between researchers identified,Â and potential solutions. Some great projects were discussed and showcased with contributions on a new collaboration site for research into Parkinsons, getting the public to search for surface exposed fossils in high resolution ground images (Louise Leakey, Turkana Basin Institute), and the experience of being the public conduit for a spacecraft from Veronica “Mars Phoenix” McGregor. Once again a major theme was “just get the data out there” so that people can do something with it if they want to. If it isn’t available no-one is going to do anything.

The final session was lead by Joan Peckham on Computational Thinking, the idea that the principles behind good computing design should be taught as a core skill on a par with reading and writing, and that this techniques are widely applicable beyond computing per se. For more on the background to this you can checkout John Udell interviewing Joan on his Interviews with Innovators podcast. The point for me was to try and understand how I can most effectively learn these principles and techniques as it is clear to me that I need a better understanding of good software and system design for the work I would like to do. What was interesting to me was whether my needs mapped onto what would be required for teaching children and whether willing and interested guinea pigs such as myself might be useful in helping to develop educational programmes. Here engagement means effective use of technology and design of systems that will make our work and collaborations efficient.

Scifoo is always challenging, requiring that you re-think and re-examine many of the assumptions that your everyday work is built on. Many smart people with very different perspectives and experiences make a great environment to stress test your ideas, sometimes to destruction. The challenge can be actually applying those insights in the real world with limited resources and time. But it provides some goals to work towards and much food for thought.

March 15, 2009December 30, 2009

There are crowds, and then there are crowds…

I am constantly intrigued by the way that some sort of collective web-based intelligence seems to cause specific issues to pop up in multiple places at the same time. The probable reason is that my primitive brain, while trying to create a narrative out of all the things that I am attempting to absorb, creates these specific reference points to make it easier for me to understand what is going on. One of the things I am interested in is how we take the behavioral tendencies that developed in small groups of social primates and work with those to get the kind of results we need or want. In my case I find a lot of these web discussions work quite effectively as a way of either tipping off my own subconscious or perhaps acting as an extended subconscious that helps me to surface ideas and helps me to write about them.

Co-incidentally in the past week, two things that got my attention were about crowd-sourcing in science. The first was a quote from a talk that John Wilbanks gave at ETech 2009 on Open Data and building a data commons. The quote that got the publicity was “there is no crowd”, by which I think John meant that scientific communities are not large and homogeneous enough for the kind of crowd sourcing and opinion markets that are beloved of some Web2 commentators to work. In John’s own terms the knowledge required to make a meaningful contribution is uncommon making both the logistics and economics of what is in effect a transfer of labour very different than for problems or questions which have direct meaning for the wider population.

The second piece that got me thinking was Tim Gower‘s dicussion of the experience of the Polymath project (via Michael Nielsen). For those who missed this, the project aimed to solve a defined mathematical problem through a series of small contributions from a group of people; essentially building a mathematical proof, or the outlines of one, via crowd-sourcing. Tim’s blogpost and the accompanying comments are a gold mine for anyone who wants to understand how this type of project does and does not work. The main success of the project was in solving the problem, or in fact a more general form of the originally stated problem. The main “failure” was that the team of actively involved people was rather small and made up of people who might be seen as “the usual suspects” in this context, a group of the best mathematicians who are active on the web.

Many of the reasons for the relatively small core group of people involved should be very familiar. Lack of time to invest, the pace of conversation being too fast for people to dip in and dip out of, fear of asking “stupid” questions, and finally a lack of awareness that the project was even occurring. Let me be clear that from what I can see this project was an immense success, but that it is also a very clear and time bounded example from which we can draw lessons. Any social scientists out there interested in how this type of collaboration is evolving should be piling in now and grabbing as much data as they can on the process!

Some of these problems are logistical and design based and some of the them are technical and interface based. There is a clear need both for well designed interfaces that let people rapidly triage a barrage of threads to determine what and where they can contribute. Equally there is a need for an engineering approach to dividing up problems into modular sections that can be attacked independently. The theme of mismatched intrinsic time frames for people and conversations also keeps coming up, and is something I have been struggling to get my head around for some time, with little progress. Equally the social problems (am I going to ask my silly question in the presence of several of the world’s best mathematicians? what can I possibly contribute? will I get anything for my small contribution?) are familiar require a mixture of technical, interface, and social solutions. In aggregate though it would appear that the experience of the polymath project supports John’s soundbite, “There is no crowd” because only a small number of people can usefully contribute.

Now I disagree with John rarely and I think very hard before doing so. In this case he is actually trying to make a different point entirely, to do with the reasons why sharing is less immediately popular in science, but I think the isolated soundbite is misleading in terms of what crowdsourcing can do for science.Â One point is that there are at least two distinctly different types of crowd sourcing. The “wisdom of crowds” or opinion markets style would not at first glance appear to be applicable to science, but I will come back to how I think it can be. The second type of crowd sourcing doesn’t seem to have a good name as far as I know, so I will characterise it as “broadcast request – expert response”. This is when you make a broadcast request to a community or network in the hope that someone will have the knowledge you require. A similar thing is seen when a request to work with a specific problem is broadcast, and a small group of interested people contribute. The expectation is not that thousands will appear to solve your problem in small fragments but that a small group of interested, dedicated, and appropriately skilled people will put in the time because they think it is worth their while. In the extreme this is one person who just happens to have the correct answer to your question.

Where scientists tend to go wrong is in mixing these two approaches up. There is an expectation that by broadcasting a request you will gather a huge number of people to contribute a small amount of time to a project.Â Both the recent project on putting together a dictionary of named chemical reactions, and the Polymath project started with the expectation of large numbers of contributors but ended up with a smaller number of interested people. On a smaller scale the broadcast request has delivered on many occasions for any people, again generally when the request involves a small amount of work, and again where there is an existing community that tends to contain the expertise that is required.

Clearly these two approaches are not orthogonal and there is a massive grey area between them. You can use Twitter both to ask “what do you think of Obama’s performance so far?” and “does anyone know the density of D2O to four decimal places?”. What is interesting is that you would probably build your network on Twitter differently depending on the type of questions you want to ask. Building these networks online, as well as in real life, is going to be an increasingly important part of being a successful person. It is still the case that who you know matters more than what you know. In fact if your community becomes your knowledge baseÂ it is going to become much more the case. You just have more options and ability to ensure that you know the right people.

You would also probably ask different questions in different ways. And this last point to me is the most important. There are successful examples of scientific crowd sourcing in the “mass market” sense. Projects like SETI@home provided a very low barrier to entry for anyone with a computer. Two very good examples that get people directly involved are Galaxy Zoo and FoldIt. What unites these is the effort put into framing the scientific question in a way that enables a large number of people to contribute. Usually in tackling a scientific question an experimental scientist asks, “how many postdocs and what new piece of kit do I need to answer this question?” In todays world of rapidly dwindling government income, poor public perception of science, and a stated need for more science graduates and teachers, perhaps it is more appropriate to be asking “how can I modularise this research to make specific parts of it accessible to students/teachers/the general public”. And by accessible I don’t mean that they can understand it, but that they can do it. It is not an accident that this notion of dividing the problem up keeps appearing.

Jean-Claude Bradley’s UsefulChem project and some of our efforts to take parts of it into class rooms has become a great example of this approach. The project itself has been divided up into separate modules, computational screening, compound synthesis, in vitro testing of compounds, and the measurement and modelling of compound solubility, which was developed into a separate strand for the Open Notebook Science Challenge. I’m not sure to what extent Jean-Claude’s original intentions involved taking the project into undergraduate or high school teaching settings, but the explicit aim was always to structure the project so that anyone could become involved. And as some of us think about how we can push that towards schools and public workshops we need to think hard about how we can make both the science, its motivation, and the experiments themselves accessible (which includes safe!) for people with no chemistry experience to carry out. It’s a challenge but I think it is worth it.

So in the end, my belief is that there is a crowd, or a set of crowds, all of which may be useful for your science. To pursue crowd sourcing in its broadest sense requires a lot of work and thought, but the rewards in terms of public engagement, involvement, and the volume of science you can do are enormous. To pursue “agile collaboration” and “broadcast request – expert answer” styles of crowd sourcing work is also required to build and care for a community and network that will provide you with access to the right people at the right time. Different approaches are better for different problems and for different timescales. Both require work and investment of time. But with that initial input it is still possible for your crowd(s) to pay you back a big dividend