I spend a lot of my time arguing that many of the problems in the research community are caused by journals. We have too many, they are an ineffective means of communicating the important bits of research, and as a filter they are inefficient and misleading. Today I am very happy to be publicly launching the call for papers for a new journal. How do I reconcile these two statements?
Computation lies at the heart of all modern research. Whether it is the massive scale of LHC data analysis or the use of Excel to graph a small data set. From the hundreds of thousands of web users that contribute to Galaxy Zoo to the solitary chemist reprocessing an NMR spectrum we rely absolutely on billions of lines of code that we never think to look at. Some of this code is in massive commercial applications used by hundreds of millions of people, well beyond the research community. Sometimes it is a few lines of shell script or Perl that will only ever be used by the one person who wrote it. At both extremes we rely on the code.
We also rely on the people who write, develop, design, test, and deploy this code. In the context of many research communities the rewards for focusing on software development, of becoming the domain expert, are limited. And the cost in terms of time and resource to build software of the highest quality, using the best of modern development techniques, is not repaid in ways that advance a researcher’s career. The bottom line is that researchers need papers to advance, and they need papers in journals that are highly regarded, and (say it softly) have respectable impact factors. I don’t like it. Many others don’t like it. But that is the reality on the ground today, and we do younger researchers in particular a disservice if we pretend it is not the case.
Open Research Computation is a journal that seeks to directly address the issues that computational researchers have. It is, at its heart, a conventional peer reviewed journal dedicated to papers that discuss specific pieces of software or services. A few journals now exist in this space that either publish software articles or have a focus on software. Where ORC will differ is in its intense focus on the standards to which software is developed, the reproducibility of the results it generates, and the accessibility of the software to analysis, critique and re-use.
The submission criteria for ORC Software Articles are stringent. The source code must be available, on an appropriate public repository under an OSI compliant license. Running code, in the form of executables, or an instance of a service must be made available. Documentation of the code will be expected to a very high standard, consistent with best practice in the language and research domain, and it must cover all public methods and classes. Similarly code testing must be in place covering, by default, 100% of the code. Finally all the claims, use cases, and figures in the paper must have associated with them test data, with examples of both input data and the outputs expected.
The primary consideration for publication in ORC is that your code must be capable of being used, re-purposed, understood, and efficiently built on. You work must be reproducible. In short, we expect the computational work published in ORC to deliver at the level that is expected in experimental research.
In research we build on the work of those that have gone before. Computational research has always had the potential to deliver on these goals to a level that experimental work will always struggle to, yet to date it has not reliably delivered on that promise. The aim of ORC is to make this promise a reality by providing a venue where computational development work of the highest quality can be shared, and can be celebrated. To provide a venue that will stand for the highest standards in research computation and where developers, whether they see themselves more as software engineers or as researchers who code, will be proud to publish descriptions of their work.
These are ambitious goals and getting the technical details right will be challenging. We have assembled an outstanding editorial board, but we are all human, and we don’t expect to get it all right, first time. We will be doing our testing and development out in the open as we develop the journal and will welcome comments, ideas, and criticisms to editorial@openresearchcomputation.com. If you feel your work doesn’t quite fit the guidelines as I’ve described them above get in touch and we will work with you to get it there. Our aim, at the end of the day is to help the research developer to build better software and to apply better development practice. We can also learn from your experiences and wider ranging review and proposal papers are also welcome.
In the end I was persuaded to start yet another journal only because there was an opportunity to do something extraordinary within that framework. An opportunity to make a real difference to the recognition and quality of research computation. In the way it conducts peer review, manages papers, and makes them available Open Research Computation will be a very ordinary journal. We aim for its impact to be anything but.
Other related posts:
Jan Aerts: Open Research Computation: A new journal from BioMedCentral
Related articles
- Computational science: …Error (nature.com)
And is the journal Open Access?
Absolutely. I wouldn’t be starting anything that wasn’t! Also BMC are
working with data licensing as well so not only should the text be CC-BY but
data, at least the data within the paper, should also be CC0.
Excellent news, Cameron!! If you’ll take me, I would love to contribute to this project at some stage (reviewer, editor).
Hi Cameron, Congratulations, a long needed journal headed by the right man for the job. I was lucky enough to see Egon W talk today and he mentioned the journal in his talk, the only thing that came up as a point was the peer review, that is a lot of work for a few people, was there no other options in terms of peer review?
Hi Mark. Yes, doing this right without taking a huge amount of time is going
to be a challenge. We’ve got some ideas, and what I’d ultimately like to do
is push a lot of the auditing stuff back onto the software project. If a
good audit process (automated tests, documentation, documented patch
acceptance process) is in place then is essence that should cover the code
review that we would need to do, so projects that do that could expect a
more rapid review process.
We’re not clearly going to be able audit all the code in detail,
particularly for big projects but we do intend to run tests, check they
pass, and go over documentation to check it is up to scratch. And never
fear, we will be farming a significant portion of this out to peer
reviewers…
“Similarly code testing must be in place covering, by default, 100% of the code.”
That is a noble goal, but how achievable is it? Do you mean line coverage, branch coverage, or something else? Is that automated tests or do you include manual ones? If there’s a GUI, how do you want those tested?
Which parts of the stack need the tests? For example, if the code I write has 100% coverage but it depends heavily on another package which has no tests, then would that be acceptable?
Hi Andrew, thanks for the comment. Yes there’s been a lot of discussion
around this. What I meant to say, and I accept it wasn’t clear, is that the
aspiration should be to reach 100% coverage. We know this isn’t possible,
and it isn’t even clear what it means. In practice what we want people to do
is a) do some proper testing b) quantify the coverage in some form and c)
justify the reasons for not covering some code elements and what kind of
checking approach has been used to check things are ok.
The question you raise is an interesting one. My initial response would be
that we are dealing with the code that you are describing in your paper. As
long as you describe the dependencies what is important is the testing and
documentation of your code. We don’t expect you to test e.g. operating
systems so we don’t expect you to sort out other people’s modules/frameworks
either. Although we would encourage people to contribute back tests in
general terms
Does that answer you question?
100% coverage is not really attainable without almost superhuman effort. My favorite example is SQLite. They had to abstract out filesystem interactions so they could test for all manner of error conditions, like “disk full”. I’ve looked, and there are very few people who have 100% coverage, and they’ve only done so when they have a very clear goal to attain that coverage, and are willing to spend a lot of extra time to write those tests.
There’s a relevant discussion about this at http://stackoverflow.com/questions/90002/what-is-a-reasonable-code-coverage-for-unit-tests-and-why . “Industry standard” is 80 or 85%, mostly because someone a long time ago said that that was a good number. Otherwise it’s highly variable on the need and appropriateness of the different levels of coverage (statement, branch, decision path).
I think saying “high statement coverage” would be better than saying 100%, with the same emphasis on “consistent with best practice in the language and research domain.” As people become familiar (will they ever?) with doing statement coverage, then start raising the bar and/or adding branch coverage testing. Once you have a set of packages you’ll be able to do retrospective analysis to see how well they faired, and perhaps be able to correlate coverage with quality or usefulness.
You should also say something about “automated tests”. I use that phrase because the common “unit test” specifically excludes integration, regression, performance, and stress tests. They should be automated to the same high degree as the other components. But some things simply cannot be tested automatically with a very high degree of effort and often with increased fragility towards change.
BTW, I notice the comment about documentation is very OO-centric. “It must cover all public methods and classes” means that C programs (which has neither classes nor methods) need no documentation. The reference to “public” assumes the code API is paramount, but what of command-line tools, or web services, which expose other means to access the functionality?
The phrase should be something like “document the interfaces by which others should use the software.”
I agree with the initiative but I think you’re not heading in the right direction. I think journals need to be more innovative. An idea of what I mean can be seen by watching this video from ideo: The future of the book http://www.vimeo.com/15142335
Its something that I would like to develop soon..
“Computation lies at the heart of all modern research.”
Is not the heart of research the process of observation, hypothesis, experimentation leading to further observation, rinse and repeat until hypothesis is disproved or becomes a theory?
Andrew, very happy to receive any “patches” for the author instructions. They are probably a little OO-centric mainly because I wrote them. Have taken note of your wording and will discuss the details with the Ed Board. I like your term “high statement coverage” and definitely agree that explicitly talking about automated testing is a good thing. We’re still working on wording that really captures what we want which is that across the board from validation of data to ease of checking that any modifications don’t break the code we want to encourage people to use the best tools for the job.
At the end of the day, as I say these issues will be an editorial judgement and we certainly accept we have to start with the possible and then aim to improve on it.
Not necessarily. The process you describe is a stereotype but its not really a process that is followed in detail by many scientists in practice. It’s also specifically a model of scientific research, not of research in general. There are different modes of research, many of which rely on computation that don’t follow a scientific approach necessarily. We don’t seek to exclude these from the journal as long as their code is up to our standards.
The point I was trying to make is that in almost all modern research there are computational systems that we rely on, depend on, and usually don’t think about very much. What we seek to do is ultimately to encourage people to build systems that can be understood in terms of what their limitations and capabilities are, rather than being simply black boxes that we take for granted.
James, I agree that journals, or rather the things that will hopefully replace them, need to be much more innovative. I’m involved in several projects looking at this and I’ve written a fair bit about it in the past. However this is a separate problem to the one we’re trying to solve with ORC. This is much more a social issues – we need to convince people that high quality code is valuable – than a technical one. Sharing code is easy, we already have the tools to do this. What has been missing is a venue for talking about that code which puts a value on the quality of the code itself.
So I agree with you that there are other problems, but they’re not the problems we’re trying to solve in this particular case. We need to tackle each of these separately if we are to stand some chance of success. If we make change dependent on success in all of them we are almost certainly doomed to failure.
Glad to be of service. My concern is that someone with good code, O(20,000) lines, looks at the requirement for 100% test coverage, realizes that it’s currently 10%, and decides that it’s not worth the effort, when it’s really an editorial judgement and not a hard criterion.
I spend a lot of my time arguing that many of the problems in the research community are caused by journals. We have too many, they are an ineffective means of communicating the important bits of research, and as a filter they are inefficient and misleading. Today I am very happy to be publicly launching the call for papers for a new journal. How do I reconcile these two statements?
So, are you going to answer your own question? Or was it just a textual ornament?
In the way it conducts peer review, manages papers, and makes them available Open Research Computation will be a very ordinary journal.
Well, then it inevitably will be as f uped as any other journal. Congratulations on selling out.
If it’s not clear I felt the answer to that question was in the last paragraph. I was convinced by a number of people that there was something we could usefully do within the framework of a traditional journal and that the approach we were taking, while mechanically similar was different enough in spirit to be worth the effort. The response we’ve got so far indicates that we’ve touched a nerve and that providing this venue can make a difference. Baby steps yes, but steps at least. I’ll take them where you can get them.
I struggle a little to see how this is selling out but I’m sorry you see it that way. The ed board are all doing this for money, BMC do not make a huge profit off of their journals. I will receive an honorarium but it is not enough to pay for the time I’m putting in and it is my intention to use it to seed a fund to support APCs for people who can’t pay them.