Driving UK Research – Is copyright a help or a hindrance?
The following is my contribution to a collection prepared by the British Library and released today at the Wellcome Trust, called “Driving UK Research. Is copyright a help or a hindrance?” - Press Release – Document[pdf] – which is being released under a CC-BY-NC license. The British Library kindly allowed authors to retain copyright on their contributions so I am here releasing the text into the public domain via a CCZero waiver. I would also like to acknowledge the contribution of Chris Morrison in editing and improving the piece.
If I want to be confident that this text will be used to its full extent I am going to have to republish it separately to this collection. Not because the collection uses restrictive rights management or licences, it actually uses a relatively liberal copyright licence. No, the problem is copyright itself and the way it interacts with how we create knowledge in the 21st century.
Until recently we would use texts or data by reading, taking notes, making photocopies, and then writing down new insights. We would refer to the originals by citing them. A person making limited copies or taking notes (perhaps quoting the text) does not breach copyright because of the notion of “fair dealing”. Making copies of reasonable portions of a work is explicitly not a violation of copyright. If it were we wouldn’t be able to do any useful work at all.
Today, scholarship and research cannot effectively proceed via manual human processes. There is simply too much for us to handle. On the other hand we have excellent computer systems that can, to some extent at least, take these notes for us. Automated assistants that can read the text for us, that can do text mining, data aggregation and indexing allowing us to cope with the volume of information. As these tools improve we have an opportunity to radically increase the speed of the innovation cycle, using the human brain for what it is best at: insight and creative thinking; and using machines for what they are best at: indexing, checking, collecting.
The problem is that to do this those machines need to take a copy of the whole of the text and in doing so they trigger copyright. Even though the collection you are reading is released under a Creative Commons licence that allows non-commercial use, no-one can take a copy, find an interesting sentence, and then index it if they are going to make money. Google are not allowed to check what is here and index it for us.
Or perhaps they are. Perhaps this does come under “fair use” in the US. Or maybe it does, but not in the UK. What about Australia? Or Brazil? All with slightly different copyright law and a slightly different relationship between copyright and contract law. Even if current legal opinion says it is allowed a future court case could change that. The only way I can be sure that my text is available into the future is to give up the copyright altogether.
To build effectively on the scientific and cultural data being generated today we need computers. If a human were doing the job it would clearly be covered by fair dealing. What we need is a clear and explicit statement that machine based analysis for the purpose of indexing, mining, or collecting references is a fair dealing exception, even where a full copy is taken. There clearly need to be boundaries. The entire work should not be kept or distributed. As with existing fair dealing we could have guidelines on amounts kept or quoted: perhaps no more than 5% of a work. These could easily be developed and be compatible with existing fair dealing guidance.
We risk stifling the development of new tools, both commercial and academic, and new knowledge under the weight of a legal regime that was designed to cope with the printing press. At the same time a simple statement that this kind of analysis is fair dealing will provide certainty without damaging the interests of copyright holders or complicating copyright law. These new uses will ultimately bring more traffic, and perhaps more customers, to the primary documents. By taking the simple and easy step of making automated analysis an allowable fair dealing exception everyone wins.