Limits to openness – where is the boundary?
I’ve been fiddling with this post for a while and I’m not sure where its going but I think other people’s views might make the whole thing clearer. This is after all why we believe in being open. So here it is in its unfinished and certainly unclarified form. All comments gratefully received.
One issue that got a lot of people talking at the Scifoo lives on session on Monday (transcript here) was the question of where the boundaries between what should and should not be open lie. At one level it seems obvious: the structure of a molecule can’t really have privacy issues whereas it is clear that a patient’s medical data should remain private. The issue came up a lot at the recent All Hands UK E-science meeting where the issues were often about census data or geographical data that could pinpoint specific people. It seems obvious that people’s personal data should be private but where do we draw the line? I am uncomfortable with a position where it is ‘obvious’ that my data should be open but ‘obvious’ that personal medical or geographical data should not be. Ideally I would like to find a clear logical distinction.
Jean-Claude Bradley made the comment in the session that molecules don’t have privacy issues and I made a flippant comment about patented genes. Troy McLuhan made the valid point that IP and privacy are different issues but I want to use this example to try and pick the problem apart. I will use a series of escalations and the question is, where should I stop making the data, in this case the structure of the molecule and associated information, open?
- I have synthesised an organic molecule.
- The molecule is a piece of DNA 20 bases long.
- The sequence of bases corresponds to a segment of a human genome
- It corresponds to a sequence obtained from a specific person
- The sequence is part of a gene where mutation is associated with an increased rate of heart disease
- The sequence contains a mutation known to be associated with an increased rate of heart disease
- The patient has no history of heart disease
- The patient has a very specific clinical history that in association with the DNA sequence provides a deep insight into the development of heart disease
- The specifics of the clinical history are enough to precisely identify this person
- They are currently employed in a position where a risk of heart disease would preclude them from further empolyment
In nine points we have gone from an innocuous molecule to a serious ethical dilemma. Incidentally I have used an example of a DNA sequence because I think it makes the example clearer but the argument is not limited to DNA structures. A similar example could be created using, say a lipid derived hormone (that would probably be a more interesting synthetic target).
So where do we draw the line? In practice medical scientists would publish all the way through to point #8. Anonymised clinical examples are common in the medical literature. I do not know what the ethical view would be on point #9 but I suspect the view might vary depending on jurisdiction and the local ethics committee, not to mention the attitude of the patient in question. So one point of view is, if you would publish the data then you should make it open.
If we return to our data and lab books we don’t argue that there are no risks associated with being open but that the benefits outweigh the risks. It is possible that we may be scooped, that someone may take our safety data, mis-use it and attempt to sue us, or that someone may have a strong enough objection to what we are doing to attack us or our lab. It is more likely that we may embarrass ourselves by making a silly mistake or by publically heading down a blind alley. However the belief is that by putting the data and procedures out there we enable both ourselves and our community to do better. Making our data available lets other people do more and better science and lets us make more effective use of what we have done.
However this applies to our patient as well. By making their clinical history available they are potentially benefiting a large number of other patients. This ‘deep insight’ may save both their lives and others. The arguements for open notebooks and open medical histories run along similar lines. The potential benefits if the full medical history of western world could be harnessed are huge. This is a massive store of data on drug interactions, effectiveness, the effect of lifestyle, location, environment…In a sense it is a massive clinical trial and we insist that clinical trial data should be published so why not this data?
So the radical answer is that everything should be open. This would have benefits, unquestionably, but it is also clearly totally unacceptable. Doing so would be to actively cause harm to specific people whose privacy had been removed. So here is the first point of distinction. Benefits versus risks with the usual philospophical issues of balancing the good of ‘the community’ versus the good of the individual, which is far too big an issue to go into here.
Another argument is that of who is taking the risk. If I make data open then the risk I am taking is largely for me (and my group). There is a potential risk of exposing my institution to liability or ridicule but hopefully this is a small risk that is not significantly increased beyond that already created by publication. In our example however it is the patient who is taking the risk, not the person collecting the data and making it open. The patient could choose to take that risk if they were convinced by the benefits, although I imagine again the local ethics committee would take a view on this. A similar argument can cover collaborations – I don’t have the right to take risks, however small, that could harm a collaborator unless they agree the benefits are worth the risks to them.
There are however risks associated with not making the data open. Peter Murray-Rust has argued that open data can save the world. Inaction poses its own risks. Balancing the risk of inaction against the risks of action raises again the balance of the many with the individual. So some sort of pragmatism is required.