Saturday 30 June 2012

Antediluvian Quantification

It is hard to imagine today how profoundly scarce high quality scientific data was a hundred years ago. For example, in 1910 the American physicist Robert Millikan made a first report of a series of experimental measurements that he had made with his graduate student Harvey Fletcher using their own design of equipment. These measurements involved timing how long it took small drops of watch oil to move up and down in an electric field. The timings were then used to make estimates of the tiny electrical charge of individual electrons. This famous series of ‘oil drop’ experiments of Millikan allowed him to make an estimate of the electron charge that is correct to within about 0.5% of the currently accepted value. This series of experiments and the estimation of electron charge they allowed are celebrated to this day; when Millikan’s value for the electronic charge was inserted into Bohr's formula for the hydrogen spectrum, it accurately gave the Rydberg constant. This confluence of experimentally derived estimation and a new theory was impressive and the experiment is still seen by many scientists as one of the first and most convincing proofs of the quantum theory of the atom proposed by Bohr. 

Millikan’s oil drop data were first reported in 1910, which prompted some controversy particularly with the physicist Felix Ehrenhaft. After improving his experimental setup Millikan then published a full report of his work in 1913 in the Physical Review (Millikan 1913). In this paper Millikan reports the culmination of 4 years of hard experimental effort, designing and mastering a new technique and assessing the sources and magnitudes of the errors in his approach. The conclusions of the paper are based on the detailed analysis of 58 individual oil droplets with measurements on these droplets having been made over a period of 60 consecutive days. On each of these drops about 40 individual timing measurements were made. The data set used for Millikan’s analysis is presented as Table XX in his 1913 paper. If these numbers are entered into a modern spreadsheet file the dataset is about 28 kilobytes.

Notwithstanding the limited number of kilobytes of data reported in this paper of Millikan, this was a fantastic piece of science. The oil-drop experiment showed that an elegant experimental method could not only provide an accurate determination of the fundamental unit of charge, it could also provide evidence that charge is quantized. In 1923 Robert Millikan was awarded the Nobel prize in Physics, “for his work on the elementary charge of electricity and on the photoelectric effect". In his Nobel prize acceptance speech he firstly celebrated how science at that time was a close partnership of theory and experiment;
The fact that Science walks forward on two feet, namely theory and experiment, is nowhere better illustrated than in the two fields for slight contributions  to  which  you  have  done  me  the  great  honour  of  awarding  me  the Nobel Prize in Physics for the year 1923. 
Sometimes it is one foot which is put forward first, sometimes the other, but continuous progress is only made by the use of both - by theorizing and then testing, or by finding new relations in the process of experimenting and then bringing the theoretical foot up and pushing it on beyond, and so on in unending  alternations.

He then describes the importance of the oil drop experiments he had completed a decade earlier ; “..the electron itself, which man has measured…is neither an uncertainty or an hypothesis . It is a new experimental fact…” (Millikan 1924).

The sheer scarcity of high quality data was a major barrier to scientific progress and many of the leading research scientists of the day spent an enormous amount of effort to design and construct physical instruments that were capable of providing high quality, reproducible data. The reason that these scientists spent so much effort on creating physical devices for generating data for their studies was that they simply did not exist. Compared with today, that era of science was antedlivian; literally "before the deluge". Each and every data point had a real value to a practising scientist because they knew exactly how much effort had been expended in obtaining it. 

Those days are long gone. We have never had so much “data”, or so much capacity to store this data, manipulate it and analyse it both mathematically and visually. Yet using data in science has never been more difficult. The problems are not set by the technical limitations of instruments, computers, memory or even mathematical and statistical techniques – though all of these will continue to develop and these developments can help scientists.

The real challenge is in how we can best use the tried and trusted intellectual frameworks that have been the basis of scientific research over the past 400 years in our era of data deluge. Many of our current archetypes of science and our scientific hero’s are products of the data scarce era of science; Galileo, Newton, Kelvin, Einstein, Curie, Feynman, Watson & Crick. These are all antediluvian heroes, they lived, worked and excelled prior to the data deluge. 

How then can it be, that if the core of science is measurement and quantification and both are increasing in capacity at such a great rate, there is a problem? 

The answer is that science is not just the accretion of raw data or even analysed data. It is a fundamentally creative act. It requires a human intelligence to combine what was already known (or thought to be known), with the new data from experiment and observation, into a new level of knowledge. Paradoxically, this process has historically been aided by the fact that it always took time to collect data. A scientist may well have had to design and build the apparatus before he or she could even begin to make the measurements they were interested in. They had to first conquer the experiment before really understanding the object of attention. This is classical science. And it is not so long ago. One of the key weaknesses in today's science is that modern scientists have become personally disconnected from the measurement process - in the sense that Millikan understood it - in that they have never designed, or built a piece of apparatus. This means that they have reduced the connection they have with their data and reduced the amount of down time in their research.

I remember for my own BSc. Chemistry project having to construct an apparatus to image and measure droplets. Gaining a deep understanding of the shortcomings of an experiment by designing and building the kit has aided the development of science over the past three centuries. Today much of this forced downtime has been driven out of modern research labs. A high quality piece of equipment can be purchased and deployed in a matter of weeks. And with modern analytical instruments having high reliability and high data storage capacity they can be set up to run virtually unattended all day every day. 

In previous generations one of the key bottlenecks was the need for experimental quantitation. This in turn led to the need for scientists to be able to develop and deploy scientific instrumentation and the ability to understand what each measured data point meant. This is harked back to by older science teachers who still stress the need for “good lab practice” using paper notebooks, keeping observational records of what is going on etc. Now most scientists do not have a clue about what is going on in their instruments. They have no idea about how much data processing is going on before they get hold of raw data.


Figure 1 from Millikan's 1913 paper.


On the elementary electrical charge and the Avogadro constant. R.A. Millikan.
The Physical Review. Vol II Series II 1913.