software – Science in the Open

December 13, 2010November 9, 2016

Open Research Computation: An ordinary journal with extraordinary aims.

I spend a lot of my time arguing that many of the problems in the research community are caused by journals. We have too many, they are an ineffective means of communicating the important bits of research, and as a filter they are inefficient and misleading. Today I am very happy to be publicly launching the call for papers for a new journal. How do I reconcile these two statements?

Computation lies at the heart of all modern research. Whether it is the massive scale of LHC data analysis or the use of Excel to graph a small data set. From the hundreds of thousands of web users that contribute to Galaxy Zoo to the solitary chemist reprocessing an NMR spectrum we rely absolutely on billions of lines of code that we never think to look at. Some of this code is in massive commercial applications used by hundreds of millions of people, well beyond the research community. Sometimes it is a few lines of shell script or Perl that will only ever be used by the one person who wrote it. At both extremes we rely on the code.

We also rely on the people who write, develop, design, test, and deploy this code. In the context of many research communities the rewards for focusing on software development, of becoming the domain expert, are limited. And the cost in terms of time and resource to build software of the highest quality, using the best of modern development techniques, is not repaid in ways that advance a researcherâ€™s career. The bottom line is that researchers need papers to advance, and they need papers in journals that are highly regarded, and (say it softly) have respectable impact factors. I donâ€™t like it. Many others donâ€™t like it. But that is the reality on the ground today, and we do younger researchers in particular a disservice if we pretend it is not the case.

Open Research Computation is a journal that seeks to directly address the issues that computational researchers have. It is, at its heart, a conventional peer reviewed journal dedicated to papers that discuss specific pieces of software or services. A few journals now exist in this space that either publish software articles or have a focus on software. Where ORC will differ is in its intense focus on the standards to which software is developed, the reproducibility of the results it generates, and the accessibility of the software to analysis, critique and re-use.

The submission criteria for ORC Software Articles are stringent. The source code must be available, on an appropriate public repository under an OSI compliant license. Running code, in the form of executables, or an instance of a service must be made available. Documentation of the code will be expected to a very high standard, consistent with best practice in the language and research domain, and it must cover all public methods and classes. Similarly code testing must be in place covering, by default, 100% of the code. Finally all the claims, use cases, and figures in the paper must have associated with them test data, with examples of both input data and the outputs expected.

The primary consideration for publication in ORC is that your code must be capable of being used, re-purposed, understood, and efficiently built on. You work must be reproducible. In short, we expect the computational work published in ORC to deliver at the level that is expected in experimental research.

In research we build on the work of those that have gone before. Computational research has always had the potential to deliver on these goals to a level that experimental work will always struggle to, yet to date it has not reliably delivered on that promise. The aim of ORC is to make this promise a reality by providing a venue where computational development work of the highest quality can be shared, and can be celebrated. To provide a venue that will stand for the highest standards in research computation and where developers, whether they see themselves more as software engineers or as researchers who code, will be proud to publish descriptions of their work.

These are ambitious goals and getting the technical details right will be challenging. We have assembled an outstanding editorial board, but we are all human, and we donâ€™t expect to get it all right, first time. We will be doing our testing and development out in the open as we develop the journal and will welcome comments, ideas, and criticisms to editorial@openresearchcomputation.com. If you feel your work doesnâ€™t quite fit the guidelines as Iâ€™ve described them above get in touch and we will work with you to get it there. Our aim, at the end of the day is to help the research developer to build better software and to apply better development practice. We can also learn from your experiences and wider ranging review and proposal papers are also welcome.

In the end I was persuaded to start yet another journal only because there was an opportunity to do something extraordinary within that framework. An opportunity to make a real difference to the recognition and quality of research computation. In the way it conducts peer review, manages papers, and makes them available Open Research Computation will be a very ordinary journal. We aim for its impact to be anything but.

Watching the future…student demos at University of Toronto

On Wednesday morning I had the distinct pleasure of seeing a group of students in the Computer Science department at the University of Toronto giving demos of tools and software that they have been developing over the past few months. The demos themselves were of a consistently high standard throughout, in many ways more interesting and more real than some of the demos that I saw the previous night at the “professional” DemoCamp 21. Some, and I emphasise only some, of the demos were less slick and polished but in every case the students had a firm grasp of what they had done and why, and were ready to answer criticisms or explain design choices succinctly and credibly. The interfaces and presentation of the software was consistently not just good, but beautiful to look at, and the projects generated real running code that solved real and immediate problems. Steve Easterbrook has given a run down of all the demos on his blog but here I wanted to pick out three that really spoke to problems that IÂ have experienced myself.

I mentioned Brent Mombourquette‘s work on Breadcrumbs yesterday (details of the development of all of these demos is available on the student’s linked blogs). John Pipitone demonstrated this Firefox extension that tracks your browsing history and then presents it as a graph. This appealed to me immensely for a wide range of reasons: firstly that I am very interested in trying to capture, visualise, and understand the relationships between online digital objects. The graphs displayed by breadcrumbs immediately reminded me of visualisations of thought processes with branches, starting points, and the return to central nodes all being clear. In the limited time for questions the applications in improving and enabling search, recording and sharing collections of information, and even in identifying when thinking has got into a rut and needs a swift kick were all covered. The graphs can be published from the browser and the possibilities that sharing and analysing these present are still popping up with new ideas in my head several days later. In common with the rest of the demos my immediate response was, “I want to play with that now!”

The second demo that really caught my attention was a MediaWiki extension called MyeLink written by Maria Yancheva that aimed to find similar pages on a wiki. This was particularly aimed at researchers keeping a record of their work and wanting to understand how one page, perhaps describing an experiment that didn’t work, was different to a similar page, describing and experiment that did. The extension identifies similar pages in the wiki based on either structure (based primarily on headings I think) or in the text used. Maria demonstrated comparing pages as well as faceted browsing of the structure of the pages in line with the extension. The potential here for helping people manage their existing materials is huge. Perhaps more exciting, particularly in the context of yesterday’s post about writing up stories, is the potential to assist people with preparing summaries of their work. It is possible to imagine the extension first recognising that you are writing a summary based on the structure, and then recognising that in previous summaries you’ve pulled text from a different specific class of pages, all the while helping you to maintain a consitent and clear structure.

The last demo I want to mention was from Samar Sabie of a second MediaWiki extension called VizGraph. Anyone who has used a MediaWiki or a similar framework for recording research knows the problem. Generating tables, let alone graphs, sucks big time. You have your data in a CSV or Excel file and you need to transcribe, by hand, into a fairly incomprehensible, but more importantly badly fault intolerant, syntax to generate any sort of sensible visualisation. What you want, and what VizGraph supplies is a simple Wizard that allows you to upload your data file (CSV or Excel naturally) steps you through a few simple questions that are familiar from the Excel chart wizards and then drops that back into the page as a structured text data that is then rendered via the GoogleChart API. Once it is there you can, if you wish, edit the structured markup to tweak the graph.

Again, this was a great example of just solving the problem for the average user, fitting within their existing workflow and making it happen. But that wasn’t the best bit. The best bit was almost a throwaway comment as we were taken through the Wizard; “and check this box if you want to enable people to download the data directly from a link on the chart…”. I was sitting next to Jon Udell and we both spontaneously did a big thumbs up and just grinned at each other. It was a wonderful example of “just getting it”. Understanding the flow, the need to enable data to be passed from place to place, while at the same time make the user experience comfortable and seamless.

I am sceptical about the rise of a mass “Google Generation” of tech savvy and sophisticated users of web based tools and computation. But what Wednesday’s demos showed to me in no uncertain terms was that when you provide a smart group of people, who grew up with the assumption that the web functions properly, with tools and expertise to effectively manipulate and compute on the web then amazing things happen.Â That these students make assumptions of how things should work, and most importantly that they should, that editing and sharing should be enabled by default, and that user experience needs to be good as a basic assumptionwas brought home by a conversation we had later in the day at the Science 2.0 symposium.

The question wasÂ “what does Science 2.0 mean anyway?”. A question that is usually answered by reference to Web 2.0 and collaborative web based tools. Steve Easterbrooks‘s opening gambit in response was “well you know what Web 2.0 is don’t you?” an this was met with slightly glazed stares. We realized that, at least to a certain extent, for these students there is no Web 2.0. It’s just the way that the web, and indeed the rest of the world, works. Give people with these assumptions the tools to make things and amazing stuff happens. Arguably, as Jon Udell suggested later in the day, we are failing a generation by not building this into a general education. On the other hand I think it pretty clear that these students at least are going to have a big advantage in making their way in the world of the future.

Apparently screencasts for the demoed tools will be available over the next few weeks and I will try and post links here as they come up. Many thanks to Greg Wilson for inviting me to Toronto and giving me the opportunity to be at this session and the others this week.

October 26, 2008December 30, 2009

Survey on scientists’ use of software

Greg Wilson and others are running a survey on how scientists’ perception of and use of software is changing. The wider range of people who do this the better and it will only take you about 10 minutes (five if you’re really clever).

The survey is at:
http://softwareresearch.ca/seg/SCS/scientific-computing-survey.html

Related articles