Beyond the Impact Factor: Building a community for more diverse measurement of research

An old measuring tape
Image via Wikipedia

I know I’ve been a bit quiet for a few weeks. Mainly I’ve been away for work and having a brief holiday so it is good to be plunging back into things with some good news. I am very happy to report that the Open Society Institute has agreed to fund the proposal that was built up in response to my initial suggestion a month or so ago.

OSI, which many will know as one of the major players in bringing the Open Access movement to its current position, will fund a workshop that will identify both potential areas where the measurement and aggregation of research outputs can be improved as well as barriers to achieving these improvements. This will be immediately followed by a concentrated development workshop (or hackfest) that will aim to deliver prototype examples that show what is possible. The funding also includes further development effort to take one or two of these prototypes and develop them to proof of principle stage, ideally with the aim of deploying these into real working environments where they might be useful.

The workshop structure will be developed by the participants over the 6 weeks leading up to the date itself. I aim to set that date in the next week or so, but the likelihood is early to mid-March. The workshop will be in southern England, with the venue to be again worked out over the next week or so.

There is a lot to pull together here and I will be aiming to contact everyone who has expressed an interest over the next few weeks to start talking about the details. In the meantime I’d like to thank everyone who has contributed to the effort thus far. In particular I’d like to thank Melissa Hagemann and Janet Haven at OSI and Gunner from Aspiration who have been a great help in focusing and optimizing the proposal. Too many people contributed to the proposal itself to name them all (and you can check out the GoogleDoc history if you want to pull apart their precise contributions) but I do want to thank Heather Piwowar and David Shotton in particular for their contributions.

Finally, the success of the proposal, and in particular the community response around it has made me much more confident that some of the dreams we have for using the web to support research are becoming a reality. The details I will leave for another post but what I found fascinating is how far the network of people spread who could be contacted, essentially through a single blog post. I’ve contacted a few people directly but most have become involved through the network of contacts that spread from the original post. The network, and the tools, are effective enough that a community can be built up rapidly around an idea from a much larger and more diffuse collection of people. The challenge of this workshop and the wider project is to see how we can make that aggregated community into a self sustaining conversation that produces useful outputs over the longer term.

It’s a complete co-incidence that Michael Nielsen posted a piece in the past few hours that forms a great document for framing the discussion. I’ll be aiming to write something in response soon but in the meantime follow the top link below.

Enhanced by Zemanta

Some notes on Open Access Week

Open Access logo, converted into svg, designed...
Image via Wikipedia

Open Access Week kicks off for the fourth time tomorrow with events across the globe. I was honoured to be asked to contribute to the SPARC video that will be released tomorrow. The following are a transcription of my notes – not quite what I said but similar. The video was released at 9:00am US Eastern Time on Monday 18 October.

It has been a great year for Open Access. Open Access publishers are steaming ahead, OA mandates are spreading and growing and the quality and breadth of repositories is improving across institutions, disciplines, and nations. There have problems and controversies as well, many involving shady publishers seeking to take advantage of the Open Access brand, but even this in its way is a measure of success.

Beyond traditional publication we’ve also seen great strides made in the publication of a wider diversity of research outputs. Open Access to data, to software, and to materials is moving up the agenda. There have been real successes. The Alzheimer’s Disease Network showed what can change when sharing becomes a part of the process. Governments and Pharmaceutical companies are releasing data. Publicly funded researchers are falling behind by comparison!

For me although these big stories are important, and impressive, it is the little wins that matter. The thousands or millions of people who didn’t have to wait to read a paper, who didn’t need to write an email to get a dataset, who didn’t needlessly repeat and experiment known not to work. Every time a few minutes, a few hours, a few weeks, months, or years is saved we deliver more for the people who pay for this research. These small wins are the hardest to measure, and the hardest to explain, but they make up the bulk of the advantage that open approaches bring.

But perhaps the most important shift this year is something more subtle. Each morning I listen to the radio news, and every now and then there is a science story. These stories are increasingly prefaced with “…the research, published in the journal of…” and increasingly that journal is Open Access. A long running excuse for not referring the wider community to original literature has been its inaccessibility. That excuse is gradually disappearing. But more importantly there is a whole range of research outcomes that people, where they are interested, where they care enough to dig deeper, can inform themselves about. Research that people can use to reach their own conclusions about their health, the environment, technology, or society.

I find it difficult to see this as anything but a good thing, but nonetheless we need to recognize that it brings challenges. Challenges of explaining clearly, challenges in presenting the balance of evidence in a useful form, but above all challenges of how to effectively engage those members of the public who are interested in the details of the research. The web has radically changed the expectations of those who seek and interact with information. Broadcast is no longer enough. People expect to be able to talk back.

The last ten years of the Open Access movement has been about how to make it possible for people to touch, read, and interact with the outputs of research. Perhaps the challenge for the next ten years is to ask how we can create access opportunities to the research itself. This won’t be easy, but then nothing that is worthwhile ever is.

Open Access Week 2010 from SPARC on Vimeo.

Enhanced by Zemanta

It’s not information overload, nor is it filter failure: It’s a discovery deficit

Clay Shirky
Image via Wikipedia

Clay Shirky’s famous soundbite has helped to focus on minds on the way information on the web needs to be tackled and a move towards managing the process of selecting and prioritising information. But in the research space I’m getting a sense that it is fuelling a focus on preventing publication in a way that is analogous to the conventional filtering process involved in peer reviewed publication.

Most recently this surfaced at Chronicle of Higher Education, to which there were many responses, Derek Lowe’s being one of the most thought out. But this is not isolated.

@JISC_RSC_YH: How can we provide access to online resources and maintain quality of content?  #rscrc10 [twitter via@branwenhide]

Me: @branwenhide @JISC_RSC_YH isn’t the point of the web that we can decouple the issues of access and quality from each other? [twitter]

There is a widely held assumption that putting more research onto the web makes it harder to find the research you are looking for. Publishing more makes discovery easier.

The great strength of the web is that you can allow publication of anything at very low marginal cost without limiting the ability of people to find what they are interested in, at least in principle. Discovery mechanisms are good enough, while being a long way from perfect, to make it possible to mostly find what you’re looking for while avoiding what you’re not looking for.  Search acts as a remarkable filter over the whole web through making discovery possible for large classes of problem. And high quality search algorithms depend on having a lot of data.

It is very easy to say there is too much academic literature – and I do. But the solution which seems to be becoming popular is to argue for an expansion of the traditional peer review process. To prevent stuff getting onto the web in the first place. This is misguided for two important reasons. Firstly it takes the highly inefficient and expensive process of manual curation and attempts to apply it to every piece of research output created. This doesn’t work today and won’t scale as the diversity and sheer number of research outputs increases tomorrow. Secondly it doesn’t take advantage of the nature of the web. They way to do this efficiently is to publish everything at the lowest cost possible, and then enhance the discoverability of work that you think is important. We don’t need publication filters, we need enhanced discovery engines. Publishing is cheap, curation is expensive whether it is applied to filtering or to markup and search enhancement.

Filtering before publication worked and was probably the most efficient place to apply the curation effort when the major bottleneck was publication. Value was extracted from the curation process of peer review by using it reduce the costs of layout, editing, and printing through simple printing less.  But it created new costs, and invisible opportunity costs where a key piece of information was not made available. Today the major bottleneck is discovery. Of the 500 papers a week I could read, which ones should I read, and which ones just contain a single nugget of information which is all I need? In the Research Information Network study of costs of scholarly communication the largest component of publication creation and use cycle was peer review, followed by the cost of finding the articles to read which represented some 30% of total costs. On the web, the place to put in the curation effort is in enhancing discoverability, in providing me the tools that will identify what I need to read in detail, what I just need to scrape for data, and what I need to bookmark for my methods folder.

The problem we have in scholarly publishing is an insistence on applying this print paradigm publication filtering to the web alongside an unhealthy obsession with a publication form, the paper, which is almost designed to make discovery difficult. If I want to understand the whole argument of a paper I need to read it. But if I just want one figure, one number, the details of the methodology then I don’t need to read it, but I still need to be able to find it, and to do so efficiently, and at the right time.

Currently scholarly publishers vie for the position of biggest barrier to communication. The stronger the filter the higher the notional quality. But being a pure filter play doesn’t add value because the costs of publication are now low. The value lies in presenting, enhancing, curating the material that is published. If publishers instead vied to identify, markup, and make it easy for the right people to find the right information they would be working with the natural flow of the web. Make it easy for me to find the piece of information, feature work that is particularly interesting or important, re-intepret it so I can understand it coming from a different field, preserve it so that when a technique becomes useful in 20 years the right people can find it. The brand differentiator then becomes which articles you choose to enhance, what kind of markup you do, and how well you do it.

All of these are things that publishers already do. And they are services that authors and readers will be willing to pay for. But at the moment the whole business and marketing model is built around filtering, and selling that filter. By impressing people with how much you are throwing away. Trying to stop stuff getting onto the web is futile, inefficient, and expensive. Saving people time and money by helping them find stuff on the web is an established and successful business model both at scale, and in niche areas. Providing credible and respected quality measures is a viable business model.

We don’t need more filters or better filters in scholarly communications – we don’t need to block publication at all. Ever. What we need are tools for curation and annotation and re-integration of what is published. And a framework that enables discovery of the right thing at the right time. And the data that will help us to build these. The more data, the more reseach published, the better. Which is actually what Shirky was saying all along…

Enhanced by Zemanta

New Year – New me

FireworksApologies for any wierdness in your feed readers. The following is the reason why as I try to get things working properly again.

The past two years on this blog I wrote made some New Year’s resolutions and last year I assessed my performance against the previous year’s aims. This year I will admit to simply being a bit depressed about how much I achieved in real terms and how effective I’ve been at getting ideas out and projects off the ground. This year I want to do more in terms of walking the walk, creating examples, or at least lashups of the things I think are important.

One thing that has been going around in my head for at least 12 months is the question of identity. How I control what I present, who I depend on, and in the world of a semantic web where I am represented by a URL what should actually be there when someone goes to that address. So the positive thing I did over the holiday break, rather than write a new set of resolutions was to start setting up my own presence on the web, to think about what I might want to put there and what it might look like.

This process is not as far along as I would like but its far enough along that this will be the last post at this address. OpenWetWare has been an amazing resource for me over the past several years and we will continue to use the wiki for laboratory information and I hope to work with the team in whatever way I can as the next generation of tools develops. OpenWetWare was also a safe place where I could learn about blogging without worrying about the mechanics, confident in the knowledge that Bill Flanagan was covering the backstops. Bill is the person who has kept things running through the various technical ups and down and I’d particularly like to thank him for all his help.

However I have now learnt enough to be dangerous and want to try some more things out on my own. More than can be conveniently managed on a website that someone else has to look after. I will write a bit more about the idea and choices I’ve made in setting up the site soon but for the moment I just want to point you to the new site and offer you some choices about subscribing to different feeds.

If you are on the feedburner feed for the blog you should be automatically transferred over to the feed on the new site. If you’re reading in a feed reader you can check this by just clicking through to the item on my site. If you end up at a url starting https://cameronneylon.net/ then you are in the right place. If not, just change your reader to point at http://feeds.feedburner.com/ScienceInTheOpen.

This feed will include posts on things like papers and presentations as well as blog posts so if you are already getting that content in another stream and prefer to just get the blog posts via RSS you should point your reader at http://feeds.feedburner.com/ScienceInTheOpen_blog.  I can’t test this until I actually post something so just hold tight if it doesn’t work and I will try to get it working as soon as I can. The comments feed for all seven of you subscribed to it should keep working. All the posts are mirrored on the new site and will continue to be available at OpenWetWare

Once again I’d like to thank all the people at OpenWetWare that got me going in the blogging game and hope to see you over at the new site as I figure out what it means to present yourself as a scientist on the web.

Reblog this post [with Zemanta]