Data Deluge: Uncertainty exists in the map, not in the territory.

Many years ago, when I was using Maximum Entropy data processing methods on a tricky problem in gas dispersion, I spent a lot of time trying to understand Bayesian inference. This was mainly for very practical reasons - so I could understand what the MaxEnt programmes were doing - but I did spend quite a bit of time reading up on the work of American physicist Ed Jaynes (a short bio HERE) - who I already referenced in an earlier blog post.[1]

Jaynes had a hang up about what he called the "mind projection fallacy", which is when we assume that the way we see the world reflects the way the world really is. Jaynes also describes another form of the fallacy - when we assume our own lack of knowledge about how things really are, somehow means that they are indeterminate.

Jaynes illustrates this by discussing randomness when we shake a dice;

Shaking does not make the result “random,” because that term is basically meaningless as an attribute of the real world; it has no clear definition applicable in the real world. The belief that “randomness” is some kind of real property existing in Nature is a form of the Mind Projection Fallacy which says, in effect, “I don’t know the detailed causes—therefore—Nature does not know them.” What shaking accomplishes is very different. It does not affect Nature’s workings in any way; it only ensures that no human is able to exert any willful influence on the result. Therefore nobody can be charged with “fixing” the outcome.

This is a tricky concept. Recently I have come across a good analogy that brings this to life a bit. We all know that a map is only a representation of the territory it purports to describe, the map is not reality and we rarely mistake the two; the map is not the territory. [2]

Furthermore, if there is any mismatch between the map and the territory, the uncertainty must exist in the map, that is our man made representation of the territory must by definition have uncertainties and errors - it cannot represent reality as it is a piece of coloured paper (or a coloured computer display screen). Furthermore, if we change the map, for example by scribbling on it, these changes in the map cannot in and of themselves change the territory. Changing our representations of the world (or our beliefs is another way of putting this) cannot change the real world. [3]

Which gets me on to computer models used in science. They are, like a map, not reality. They are a mathematically tractable way of making estimates of what might happen when a particular phenomena unfolds.

In effect a computer based scientific model is a complicated black box, into which is fed a number of input parameters and from which is delivered a “result”. But these results must be treated with care as they are quite unlike the results that scientists obtain from experiments. In an experiment in physical science, in which we know that the laws of physics act always and everywhere the same, when we ask a question of nature, in the form of an experiment, the answer we get tells us something about the question we asked and the laws of nature. In the case of a computer model this is not true.

The computer based model has been created by scientists who have had to find a way of writing a computer program that runs in a reasonable length of time on an available computer. In order to do this they inevitably make a long series of assumptions and approximations that allow them to make progress. Sometimes these assumptions are good and other times they are bad.

But once a model has become a de facto industry standard many of the original assumptions and caveats become forgotten, or deliberately ignored, and the model in some sense begins to take the place of reality in the minds of the many scientists and engineers who use the model on a daily basis. In other words it is an example of the mind projection fallacy - they begin to treat the map as if it were the territory.

To use such a model it will inevitably require a set of assumptions to be made by the practitioner about the input parameters to be used in the black box. In complex models there may be dozens or even hundreds of input parameters. Someone has to make a choice of what these are and those choices will be based in some cases on very firm physical insights and experimentally derived data and in other cases on assumptions, approximation, estimates and guesses.

It is not morally reprehensible to make assumptions, estimates and guesses in science but it is behoven on scientists who do so to be absolutely transparent about what they have done – in order that other scientists can challenge them. Best practice requires that the scientist who uses such a model explicitly explores how sensitive the output of the black box is to different choices they have made about the input parameters.

The minimum that an informed public could expect of scientists who run computer models is transparency. Instead of talking about these models as if they were reality or as if they were based on fundamental physical laws, they should openly say;

These computer models are based on our informed assumptions, approximations and in some cases guesses. They will inevitably have errors in them due to our programming methods. In addition, and in common with ordinary lab based or field based experiments, they are prone to uncertainty and error. We indicate this by always reporting the outputs of our black box model runs with error bars or confidence intervals. Over the past 400 years these have been the accepted way of reporting to readers how much confidence they can have in the results and we use the same conventions.

Long may maps (and computer models) exist, they are a brilliant example of how humans can create new and useful means to represent reality. But let us not mistake the map for the territory, lest we get lost.

References

[1] Ed Jaynes major book Probability Theory: The Logic of Science was published posthumously by Cambridge University Press. A PDF copy of the first three chapters of the book is HERE.

[2] Korzybski, A.(1931). A non-Aristotelian system and its necessity for rigour in mathematics and physics. Read at the American Association for the Advancement of Science, December 28, 1931.

[3] http://wiki.lesswrong.com/wiki/The_map_is_not_the_territory.

Friday, 15 June 2012

Uncertainty exists in the map, not in the territory.