Thursday, 22 August 2024

Quantum Mechanics and Multiple Measurements

I'm going to talk about quantum physics again. As readers know I studied a little physics at university level in 2005 and, over the last little while, I have been trying to fill in some gaps in my knowledge about quantum mechanics by watching sometimes slightly misleading educational videos on Youtube, reading Wikipedia, and referring to the Stanford Encyclopaedia of Philosophy entry on Heisenberg's Uncertainty Principle. The Encyclopaedia entry is excellent but I suspect that the Wikipedia entry I looked at, concerning the Particle in a Box thought experiment, is at best obfuscatory and might even be just plain wrong. In talking about quantum physics I am going to attempt to describe it in a way that is reasonably clear to people who only have a high school understanding of physics but I also have a reasonably innovative idea, the seed of which occurred to me a long time ago and which I have been developing in this blog for some time, that may even interest people who have done proper degrees in physics. In this respect this essay will resemble an essay I wrote last year, "Quantum Physics for Dummies and a New Idea". It may be helpful to read this earlier essay first. What I intend to do in my main argument is to show that there is an apparent contradiction in the laws of physics which forces us to choose between the principle that indeterminacy is a necessary feature of quantum mechanics and the Law of Conservation of Momentum. I do not pretend to be an expert on physics and there is much I don't know about the mathematics and proposed interpretations of quantum mechanics but, as I said in the essay "Evolution, Ideas, and Hiveminds", what I can do is take what I know and think though it with some semblance of rationality. In the first part of the essay I will describe what I know about the Particle in the Box experiment and in the second part, the most important part, I will present my main argument.

Let's start with the core idea. A fundamental principle in quantum mechanics is that sometimes it is better to describe things like electrons and photons as particles and sometimes it is better to describe them in terms of waves. The simplest waves are sine waves and cosine waves but a great deal of phenomena can be described as wave-like if it is periodic. We can get these other types of waves by adding together a lot of sine waves with different wavelengths, frequencies, and phase shifts. It is exactly the same as when we talk about musical notes not only having 'pitch' but also 'timbre'. If we add together an infinite number of waves that all have crests at or near a single point we can get something very localised in space–  we call this a 'wave packet'. A wave packet is not periodic. Wave packets are very close to being particles, although, importantly, the waves that together contribute to the wave packet exist throughout all space and time and the speed of these individual waves do not have to be same as the speed of the wave packet. A great deal of quantum physics follows from this wave-particle duality.

What we want to do now is to imagine situations in which an electron or photon is subject to boundary conditions. If these boundaries do not change with respect to time, we can use an equation known as the time-independent Schrodinger equation to find the wave function – I will describe what I mean by the wave function as we go along. The simplest such case is the Particle in the Box thought experiment, a thought experiment often pulled out in introductory courses in quantum mechanics. Imagine an electron in a one-dimensional box with impenetrable walls on either side. We make the walls impenetrable by stipulating that the electron could only escape the box if it had infinite energy, something that is impossible; we also assume that there are no varying electromagnetic fields inside the box, that the potential energy associated with the electron is always constant inside the box. To find the wave function, we try to find solutions to the Schrodinger equation that are consistent with these boundary conditions. Because the wave function needs to be zero at the walls we want the wave function to be zero at both walls; these conditions and the fact that the potential energy inside the box is constant leads us to conclude that the wave function must be a simple sinusoidal wave. (I won't go into the mathematical proof here but readers can find proofs on the Internet.) I can't draw pictures in this blog but visualise the first half of a sine wave going up from zero at the left hand wall, peaking in the middle of the box, and then descending back to zero at the right hand wall. In this solution the wavelength is exactly 2L, twice the width of the box. However this is not the only solution. We could imagine a sine wave that has exactly half the wavelength of the first we considered or a third or a quarter and so on; these solutions will all also work. There is an infinite family of solutions but all these solutions are discreet in that they are all n times the wavenumber of the simplest solution where n is any natural number. (Wavenumber is simply the inverse of wavelength times 𝝿 and the term we use for the solutions is eigenstates.) Mathematically, when the width of the the box is L, the equation for the wave function associated with a given eigenstate is √(2/L) multiplied by sin(2n𝝿x/L) where x is the coordinate on the x axis and has the value 0 at the left hand wall.

What is the physical significance of the wave function? The wave function by itself does not represent anything measurable but if we apply the right operations on it we can find all of measurable properties of the associated particle such as its average expected position, momentum, and energy. Perhaps the most helpful way to make sense of what the wave function does is to talk about probabilities. If we want to find the probability that the electron is between two points in the box, a and b, we first square the wave function and then find the area under this new curve between the two points a and b. This area equals the probability of finding it there. The total area under the whole curve must equal 1 because we are assuming that exactly one electron exists in the box. What this means is that for the n = 1 case, the probability of finding the electron in the middle of the box is much greater than the probability of finding it near the walls. However there is a surprising twist. If n = 2, the wave function is zero in the middle of the box: this point is called a node. In general, for any eigenstate associated with the Particle in the Box thought experiment, there are always n – 1 nodes. The fact that the wave function can sometimes be zero means that if we choose a region arbitrarily small around a node, the probability of finding the electron there gets arbitrarily close to zero.

This raises a question that has often occupied me in the past. If the probability of finding an electron in the middle of the box is zero, how does the electron move from one side of the box to another? This puzzle vexed me because it is tempting to think that behind the mathematics there is a real world in which the electron is flying from one side of the box to the other and back again, bouncing off the walls. This puzzle appears if we assume that the electron is only in one eigenstate. However what I didn't fully understand in the past but have a better understanding of now is the notion of superposition. By far the most helpful Youtube video I watched is "Superposition for the particle in a box – David Miller", a video which has criminally few views but which I recommend to readers particularly because of the visual animations he includes in the second part of the video. The idea here is that the total wave function can be a sum of different eigenstates properly 'normalised' so that the area under the total wave function still equals one. We might suppose that the wave function is composed of the n=1 eigenstate and the n=2 eigenstate and then assign coefficients to each eigenstate, c1 and c2 ,such that c1 squared and c2 squared equal one. If we suppose these two coefficients are equal, this is like saying that each of the two eigenstates is equally probable. These eigenstates need time components because this superposition changes with time and so we have to use the time-dependent Schrodinger equation. What we find now is that if we take the absolute square of the wave function to find the probabilities it no longer has a node and now alternates between swelling in the left side of the box and the right side of the box as time progresses. Although Miller does not say this, this leads me to make the following conjecture which people who are experts might hopefully agree with. Let us suppose that n is very very large as we might expect with a macroscopic system and also that there is uncertainty about the value of n, that there are very very many other possible values of n. We might suppose that all the possible values of n lie on a bell curve with our preferred value at the crest and choose our coefficients accordingly. Our wave function is then the superposition of all of these very many eigenstates. We might then find that the wave function is now characterised by a very sharp spike or pulse, a wave packet, travelling from left to right and back again with constant velocity given by p/2m as would be the case in Newtonian mechanics. If this conjecture is correct, it would suggest that it is waves that are really fundamental and that particles increasingly emerge from these waves as we increase the size of the system, increase the size of the eigenstates and the number of eigenstates we are superposing.

I want now to talk about the momentum of the electron in the box. At high school we learn that momentum is mass times velocity, mv. This enables us to define kinetic energy as p²/m where p is momentum. Then in the nineteenth century physicists discovered light, which has no mass, can transport momentum through space as well, and with Einstein's special theory of relativity in 1905, we arrived at a new definition relating momentum to energy, E² = p²c² + (mc²)². Photons, the elementary particles Einstein postulated, are massless and so simply have a momentum p equal to E/c. In 1923, Lois De Broglie proposed yet another definition for momentum by arguing that particles have characteristic wavelengths and that the momentum of a particle is related to its wavelength by the simple equation: = h/λ where is the momentum, h is a very important constant called Planck's constant, and λ is its wavelength. Note that even though momentum is a vector, wavelength is a scalar. If we apply the De Broglie equation to the Particle in the Box thought experiment, we find that the momentum must be nh/2L inside the box and zero outside it because for each eigenstate there is a single sine wave. It is reasonable to suppose that the momentum of the particle, insofar as we can speak of a particle at all, if in a single eigenstate, is either nh/2L going left or nh/2L going right with equal probability and so the average expected momentum must be zero. In fact this is fairly easy to prove mathematically. We can then use the Newtonian definition relating energy to momentum to find the allowed values for the energy of the electron. (Although the Einsteinian definition is superior, it is the Newtonian definition that is typically used.) If we have a superposition, time becomes involved again and so it is more complicated to work out the momentum. 

However we seem to have a problem with this simple formula. The problem relates to another key idea in quantum physics: the Heisenberg Uncertainty Principle, a Principle that is very important to my main argument and which I will come back to later. This principle says that the uncertainty in the momentum of a particle when measured times the uncertainty in the position of a particle when measured can never be less than ℏ/2, where ℏ is the Reduced Planck constant or h/2𝝿. Schematically,   ΔPΔx ≥ /2. Uncertainty here can be defined to be the standard deviation in momentum or position if we could somehow carry out a measurement on the same particle in identical situations many times; this is not the only way to define uncertainty (it works best with bell curves) but it is the one physicists usually accept. The problem is that if the momentum is known precisely then the uncertainty in position must be infinite but the uncertainty in position cannot be infinite because the particle must be somewhere inside the box. If the uncertainty in position is not infinite, then there must be uncertainty related to the momentum and so this seems to suggest that momentum cannot be the simple expression we arrived at above, the one given by the De Broglie equation.

Faced with problems such as this, quantum physicists have come up with yet another way of working out momentum. It involves something known as the momentum operator. That is, we take the partial derivative of the wave function with respect to space to find another function. This second function, a kind of momentum wave function, does not give an exact value for the momentum but, similar to the operation we can carry out on the ordinary wave function, it seems that we should be able to find the probability that the momentum lies between two fixed values, p1 and p2, by taking the absolute square of this second wave function and finding the area under the curve between these values. A difficulty I have faced when thinking about the momentum wave function is that the variable in the first wave function is x (we are assuming time invariance) whereas the variable in the second function is p and I am unsure how we exchange variables or how we change the bounds of integration. I have gone through innumerable videos on Youtube and read a few conversations on the Physics Stack Exchange website and have yet to find a satisfactory answer to this question. It seems everyone is as confused about this as I am. 

It may be useful to consider the Wikipedia entry on the Particle in a Box. The equation for the momentum function given in it, presumably determined using the momentum operator, is complex, perhaps unnecessarily complex. I'll make some important points about it. Everyone agrees that for any eigenstate there is a fixed energy associated with the particle which, if we define as p²/m, would lead us to suppose that the momentum is also fixed. But the Wikipedia entry quite clearly says the momentum can take any value at all before measurement. The writers argue that the Newtonian relation does not hold with respect to the Particle in a Box, even though it is supposed to hold in many other quantum experiments. It is important to also note that the Einsteinian definition of momentum is not used either – physicists often analyse quantum situations from a kind of Newtonian perspective even though Dirac uncovered a better version of the Schrodinger equation that is relativistic, perhaps because using the Dirac equation is just too unwieldy in most situations and so it is easier to use the older Newtonian way of defining momentum. Finally it is also worth noting that the Wikipedia article does not use the De Broglie relation either – although momentum is still defined in terms of a wavelength, this wavelength is not the same as the one we worked out when we applied the Schrodinger equation to the particle in the box originally. Our original treatment suggested that the wavelength of a particle in an n eigenstate is 2L/n but this is not the wavelength used in the article's momentum calculation. It is not clear to me how the physicists who contributed to this entry arrived at another value for the wavelength of the electron, a wavelength that does not seem to clearly follow from an analysis of the original situation, although, if the treatment is correct, it presumably follows from a Fourier transform of the original wave function. I admit I currently don't fully understand Fourier transforms.

There is one final point worth making about the Wikipedia entry. There is a discussion of the Uncertainty Principle and what I want to focus on is that the uncertainty in momentum, once we take the square root of the variance and swap the reduced Planck constant for the normal Planck constant, is nh/2L. If this is the uncertainty we could suppose from it that the momentum, although zero on average, is indeed either either nh/2L to the left or nh/2L to the right as we seemed to conclude based on our original calculations, calculations we made assuming the De Broglie relation did indeed hold, and that the Newtonian definition E = p²/m also did indeed hold. Although the Wikipedia writers claim that the momentum is continuous for any eigenstate, it seems worth noting that the uncertainty calculated is consistent with this other treatment of the thought experiment, the treatment I originally learned way back in 2005.

So I admit I am somewhat confused about quantum physics, even in the simple case of a Particle in a Box. However the main argument I wish to present does not rely on a sophisticated understanding of quantum physics. Rather it follows from some simple physical premises that physicists usually accept.

My main argument concerns measurements, the Uncertainty Principle, and Conservation of Momentum. Suppose we carry out the particle in a box experiment in the real world. The box we are considering is three-dimensional rather than one-dimensional and it may not be reasonable to suppose that the potential energy is infinite outside the box. Nevertheless the properties of the system should still be calculable in principle. What we do first is perform measurements on the box that will provide us with information about the electron. That is, we measure the width, height, and length of the box and, based on empirical data, we work out that potential energies both inside and outside the box. We also know that there is exactly one electron in the box and we know, of course, the mass of an electron. Although we do not know the eigenstate of the electron, or if the wave function is a superposition of multiple eigenstates, I claim that by applying the appropriate equations to the results of these measurements, we can set limits on the potential wave functions the electron must have and on its properties: its possible positions, momenta, and energies. Although the measurements of the box's dimensions occur at a particular moment, we assume that these dimensions are completely invariant with respect to time: this means that because the uncertainty in time is infinite, we can know with absolute precision the possible energy levels associated with the electron. (This is because the Uncertainty Principle applies to time and energy in the same way that it applies to spatial position and momentum.) Then, at some particular later time, we perform another measurement. We irradiate the box with photons at this particular time and from this determine where the electron is at this particular moment. Then a little later we perform yet another measurement: we again irradiate the box and work out very precisely again where the electron is.

What I am claiming is that with every measurement we must update our model of the wave function inside the box. The wave function must change with every additional measurement. Whereas the probabilities associated with position, momentum, and energy were originally based on our measurements of the box, probabilities given by a wave function somewhat comparable to the wave function of the Particle in One Dimensional Box, when we perform the measurement at t1 , we must suppose now that the wave function has a very sharp spike around the point where we find the electron at this time and is almost zero everywhere else. This is because the probability of finding it very near this point must be almost a certainty. If there is great certainty about the particle's position, there must be great uncertainty in its momentum. As a result of this measurement the wave function has changed, our model has changed. This has consequences for our predictions not only concerning where the particle will be in the future but also where it was in the past. Because this measurement occurs at a particular time, we can no longer employ the time-independent Schrodinger equation; whatever equation we use to determine the particle's future and past positions must involve time as well as space. Then when we perform the next measurement at t2 we we must change the model once again.

This proposal, a proposal I have hinted at in earlier essays, is that each new measurement changes the wave function. It raises some curly questions. It involves a view of wave functions, and probability distributions more generally, as being incompletely described models that can be improved upon by more measurements, more evidence; this might seem to suggest that if we performed enough measurements on a system, if we could gather all of the information about a given system or situation, all of our predictions concerning it would be certain. This seems to rub up against the idea that quantum physics is fundamentally probabilistic, indeterminate. To fully work out this proposal you would need a different theory of probability than the one often used by physicists. This is something I have thought about for a while; in the previous essay when I talked about Bayes' Theorem I should have been clearer about my own view of probability but I have still not worked out my own theory of it sufficiently well to clearly articulate here. The most important issue raised by this imagined experiment that I want to talk about in this essay however concerns Conservation of Momentum. Let us assume that we know that the particle's momentum has some certain range of values before the first irradiation; it seems that as a result of the measurement its momentum 'randomly' changes. There are two possible explanations. Either momentum is not absolutely conserved or the photons with which we have irradiated the electron have imparted momentum to it. Furthermore, by finding out very precisely where the electron is we have supposedly lost information about its momentum, but, supposing we found the electron to be at point A at t1 and at point B at t2 , it seems that, if momentum is conserved, its velocity between these two measurements must be exactly (B – A)/(t2 – t1) and can thus be precisely calculated. Either momentum can randomly change between measurements, violating Conservation of Momentum, or, although the Heisenberg Uncertainty Principle might indeed apply to particular measurements, we can in a sense violate this principle, hack it, by taking into account multiple measurements.

This issue concerning multiple measurements is the reason I have named this essay "Quantum Mechanics and Multiple Measurements". If you have a look at the Stanford Encyclopaedia entry on the Uncertainty Principle, you'll find that Heisenberg himself conceded that it was possible his principle could theoretically be violated by, as it were, bringing multiple measurements of a particle or system made at different times into the calculations. Unfortunately the Encyclopaedia entry does not explain how physicists since Heisenberg have solved this apparent paradox or even if they have.

I want now to discuss these ideas in relation to another quantum experiment, the diffraction experiment, the example I used in "Quantum Physics for Dummies". In this experiment, going from left to right, we have an emitter, a screen with a very small aperture, and then some distance away another screen. We fire electrons from the emitter through the aperture and then record where these electrons land on the second screen. There is, as readers will remember, a very simple equation that can be used to work out the probabilities concerning where each electron will arrive at the second screen. Some electrons may go in a straight line from the detector through the aperture to the second screen but some will be deflected upwards and others downwards. If we treat the electrons as particles it seems that they usually pick up either positive or negative vertical momentum either at the aperture or somewhere else along the way. Where does this momentum come from? It seems, again, either that momentum is not absolutely conserved or that momentum has been imparted to each electron somehow. The electron is, from a quantum mechanical perspective, not really localised in space and so one might reasonably suppose that random vibrations in the molecules surrounding the aperture might communicate momentum to it in an unpredictable manner. But it is difficult to reconcile this hypothesis with the simplicity of the equation that describes any diffraction experiment. The other worry is also relevant here. When the electron passes through the aperture this event can be considered a kind of measurement because we know with some degree of precision the particle's vertical position; consequently there is uncertainty about its vertical momentum. The diffraction pattern may be partly explainable in terms of the Uncertainty Principle. However if we observe, measure, where and when any individual particle lands on the second screen with great precision and assume that momentum is conserved, we can in the same way as discussed earlier calculate the momentum the particle must have had between the aperture and the screen – unless its momentum has randomly changed en route. It seems, again, either that momentum is not always conserved or that we can violate the Uncertainty Principle by combining the results of multiple measurements.

I want now to elaborate on my conception of what the wave function is in its most general sense. The picture I want to draw may seem radical but I would argue it follows directly from a clear conception of quantum mechanics. Ordinarily we are supposed to talk about the wave function being associated with both a particle and its system but I shall talk simply about a particle's wave function. For every point in space-time, there is a value associated with that point which we can also associate with the particle; the totality of these values is the particle's wave function. If we integrate the absolute square of these values over some region in space-time, we find the probability of locating the particle in that region. This wave function conforms to some appropriate set of equations – this equation will probably not be the Schrodinger equation because it is not relativistic but perhaps rather the Dirac equation, an equation that, unlike Schrodinger's equation, is relativistic, or perhaps some even better set of equations yet to be discovered that are compatible with General Relativity. At any specific time, the integral under the absolute wave function squared for all space equals one, meaning that this one particle must exist somewhere at this time. However to know any of the values associated with any of these points, we need information first, measurements; although this information will necessarily be incomplete it enables us to build a model of the wave function. In fact the wave function simply is this model. Every subsequent measurement may enable us to 'improve' the model – a new measurement may force us to change it, make a somewhat different model (although it may be possible for subsequent measurements to simply confirm the original model). Because the wave-function extends throughout all space and time, we don't need to measure the particle directly to acquire knowledge about it. This view is compatible with a phenomenon that students weren't typically taught about in 2005 – quantum entanglement. The basic idea behind quantum entanglement is that if two particle are 'entangled', that is, if they interacted in the past in such a way that some of their information is shared, then a measurement on one at some later time will immediately affect the properties of the other even if the two particles are very far apart. This makes sense if measurements on one particle affect the wave function associated with the other particle. However it goes further than just entanglement. If the wave function extends through all space-time, then any measurement carried anywhere or anytime can affect the wave function associated with the particle. We might be uncertain about the position of a particular particle and then carry out a measurement somewhere else that finds quite conclusively that the particle isn't there – this measurement will also affect the wave function associated with the particle. My solution to the measurement problem is not to say, as proponents of the Many-Worlds interpretation claim, that the wave function is real and that measurements aren't, that they never really happen (a view that ignores the fact the fact that at least some measurements are necessary to establish what the wave function is at all), but rather that measurements are real and that the wave function is, in a sense, unreal, a model. And, if the wave function associated with a particle extends throughout all space and time, a measurement made anywhere and anytime will influence the wave function associated with that particle.

This is not the most radical part of my conception of quantum physics. My most radical insight involves also recognising that measurements are subjective – where by 'subjective' I mean that measurements are carried out by particular individuals. Because measurements are subjective, the wave function can be different for different people. I don't know if readers remember but I first started writing about quantum physics in, I think, 2018, in the posts "Probability and Schrodinger's Cat" and its sequel, "Probability and Schrodinger's Cat Part 2". The basic idea behind these posts, which I didn't express clearly back then because I hadn't thought it all the way through, is that two different observers might have made or be aware of two different sets of measurements associated with a system; they will consequently be led to construct two different models of the wave function associated with it. This is why 'wave-function collapse' can occur in the world of one observer but not in the world of another, why a cat in a box can be both alive and dead for one observer and definitely either alive or dead for another. This raises two important questions. First, if we suppose that the wave function is indeed a model, does that mean it exists in the mind of a person? Second, do we all live in the same world or does each person live in a different world? I approached these questions obliquely in "The Meaning of Meaning" and may come back to them in a latter essay.

The proposal I have just made may seem extraordinary but the point of this essay is not to present this proposal but to go one step further. There are two different directions we can go. The first is take a perspective we can call epistemological realism. In the examples I gave earlier, it seemed that, although the Uncertainty Principle applies with respect to any particular measurement, if we take multiple measurements and assume that the Law of Conservation of Momentum is a hard law, we can reduce the uncertainty associated with a system below the limit imposed by the Uncertainty Principle. This would imply that the more measurements we perform at different times on a system the more confidence we can have that we know exactly where all the particles in it are and how fast they are moving at any particular time. In the limit, as the number of measurements approach infinity, if we could have perfectly complete information about a system, our model would conform exactly with reality. The map will have become the territory. This perspective aligns with ideas of determinism and reductionism because we are assuming that everything, even complex human-level phenomena like Hollywood films and Jordan Peterson, can ultimately be explained by simple laws acting on fundamental particles, one of these laws being Conservation of Momentum.

The second perspective is this. We could suppose that no matter how many measurements a person makes on a system, there is always residual uncertainty. Something like the Uncertainty Principle is indeed a hard law. This seems to me to imply that the Law of Conservation of Momentum is not a hard law: it would be emergent like the Second Law of Thermodynamics. The Law of Conservation of Momentum would apply almost absolutely to macroscopic phenomena but would apply much less so at the level of particles. This is because the motion of particles can change randomly between measurements. The implication of this is that 'randomness' or 'non-determinism' is a fundamental feature of the universe where by 'random' and 'non-deterministic', I mean that we cannot explain such changes of momentum in terms of the types of the reductive physical laws discovered by physicists. 

Almost from the time I learned about quantum mechanics, I simply assumed that it involved elements of irreducible chance, non-determinism, without realising that this would then imply violations of the Law of Conservation of Momentum. I am not alone. In Determined, Robert Sapolosky, himself a determinist and a reductionist, when discussing quantum physics seems to concede that quantum phenomena can be random. His argument, if you recall from my essay discussing his book, is that these random quantum fluctuations, random changes of motion, do not percolate up to the level of human behaviour. The notion that randomness and non-determinism are necessary features of quantum mechanics is something that seems to be the mainstream view even among physicists. I did not fully realise that true randomness and the Law of Conservation of Momentum are irreconcilable until I watched Sabine Hossenfelder's video "So You Think You Understand Quantum Physics?" a little while ago, a video which greatly influenced this essay although I am not entirely sure if her analysis in this video is totally correct because she assumes both locality and Conservation of Momentum. We seem to need to make a choice between the two laws, between the two perspectives, the first involving the idea that Conservation of Momentum is absolute and the second involving the idea that there is ineliminable uncertainty involved in subatomic processes. Of these two perspectives I prefer the second because it seems to fit more neatly with the view of the world I have developed over the course of my life. 

In opting for the second interpretation, I am endorsing a view that readers may think is woo-woo or new agey. I'll explain why I say this. My use of the word 'randomness' is idiosyncratic because what I mean by it is that we cannot explain such changes in momentum naturalistically but rather must invoke something supernatural. (I am aware that to make my position clearer I would need to define what I mean by 'natural' and 'supernatural' but it would take me too far afield to do so here in this essay.) People today assume that if something is 'random' it is 'causeless' but I am suggesting that phenomena can be construed as 'random' within a materialist framework but as 'deterministic' when interpreted within a more mystical paradigm, that they might have causes that can not be explained in terms of simple reductive physical laws but might be explainable in other ways. This second way of understanding the world might provide some comfort to believers in free will, although I do not believe in free will myself because I find the concept incoherent. It might also give comfort to some religious or spiritual people although I do not want to endorse the worldview of Evangelicals who reject Evolution entirely, claim climate change is a hoax, endorse Old Testament attitudes towards homosexuals and women's rights, or support Israel's indiscriminate bombing of civilians in Gaza. My view, that I presented in the essay about Sapolsky's book "Determinism, Quantum Physics, and Free Will" and in the essay "Evolution, Ideas, and Hiveminds" is that there is top-down causation and spooky action at a distance. I also believe that living creatures have minds or souls in some sense separate from their bodies and that psychic phenomena such as clairvoyance, synchronicity, and precognition, although probably not direct telepathy, can genuinely occur. I am permitted to take this position based on reflection on my own life and on the world.

In wading so deeply into quantum physics, as readers will appreciate, I am perhaps moving outside my area of competence. Nevertheless my main argument is based on premises that physicists generally accept. One such premise is that wave functions extend throughout all space and time (a premise that quantum physicists have yet to fully reconcile with either Special or General Relativity.) Another is that quantum phenomena are 'random' or 'non-deterministic' although as I said I am using these terms in a different way than physicists often do. Physicists seldom publicise the fact that this second premise implies that Conservation of Momentum must sometimes be violated. Where I depart from the mainstream view of physicists is that I regard measurements as fundamental and wave functions as models that somehow exist in the minds of conscious beings, although I admit that I am unsure what this would mean; physicists tend to regard the wave function as real and some believe that it is measurements that are unreal. In taking a kind of mystical position, I am aware that I am vulnerable to charges of being anti-science because my view might seem to suggest that some phenomena cannot be scientifically explained and I suppose I should just bite the bullet and accept that this criticism would be fair. It may be that scientific endeavour in order to continue requires its practitioners to at least pretend that the natural world can be explained by simple reductive laws because the alternative would be to 'explain' phenomena by simply saying something like, "God did it" which is of course no real explanation at all.

I'll finish this essay with an addendum to the previous essay's discussion of Bayes' Theorem, not because I said anything incorrect it it (I think) but because readers may want a better understanding of how Bayes' theorem should actually be used. Readers may recall that I pointed out that in order to use Bayes' Theorem, when working out P(E | ~H) you still need a hypothesis related to it. Suppose now that Jones, the visitor I described in the previous essay, knows that Smith has fifty animals on his farm. If Jones's main hypothesis is that every animal on Smith's farm is a sheep, it may be that his best alternative hypothesis is to suppose that Smith has forty-nine sheep and one cow. This would make P(E | ~ H), given that he sees a sheep, equal to 49/50. This is not the only alternative hypothesis (the probability will be higher if more alternative hypotheses are included in the calculation) but, if Jones treats this alternative hypothesis as a basis for his calculation, it will enable him to set a lower bound on P(H | E). The more sheep Jones sees without ever seeing a cow, the more P(H | E) will approach one, but it will approach one much more slowly than the example I gave in the previous essay. It is this kind of reasoning that was actually employed by the mathematicians who initially embraced Bayes Theorem in the eighteenth and nineteenth centuries. However, to reiterate the point I made at the end of the previous essay, statisticians do not use this kind of method when working out the probabilities associated with the null hypothesis and many statistical errors may spring from false assumptions baked into what versions of the null hypothesis researchers use. I am skeptical of much population statistics but my skepticism should be the topic of another essay. 

Friday, 2 August 2024

Concerning Evolution Again; Concerning Bayes' Theorem

Sometimes in this blog I have felt impelled to revisit earlier essays to improve the arguments I presented in them. My most recent essay, "Evolution, Ideas, and Hiveminds", an essay I only published a couple of weeks ago, was not as strong as I would like and, in thinking about it over the last couple of weeks, I have decided to go back to the arguments I made in it and discuss some of the its claims from a different perspective. When one is engaged in debate one should at least try to steel-man the arguments made by one's opponents and so, in an effort to cultivate at least a little intellectual integrity, I feel I should present Neo-Darwinism in a slightly more favourable light. This essay will make most sense to readers who have read the preceding essay. I want to discuss the notion of "Survival of the Luckiest" again and then the claim that genotype does not fully determine phenotype, two notions I introduced in the previous essay; this time though I want to consider them from the point of view of someone who believes in the Modern Synthesis. I also want to talk about an argument I set out a long time ago in my first essay about evolution: "Concerning Evolution". Although there were weaknesses in "Evolution, Ideas, and Hiveminds", I stand by my general thesis and so will go on to talk about an idea that does support it: sexual selection. In the second part of the essay I want to talk about probability again. First I want to discuss randomness and then Bayes' Theorem.

Readers may recall that in the previous essay I told a little story concerning a pigeon. I asked the reader to imagine a pigeon born with a small mutation that makes it very slightly better at flying than other pigeons. I claimed that, according to Neo-Darwinists, Evolution will inevitably lead that pigeon to survive longer and have more offspring than its peers and that, because this mutation should gradually propagate throughout the whole population, the whole pigeon species should evolve. I objected to this proposal by describing all the potential misfortunes that could befall the pigeon and prevent it from mating. I also asked the reader to imagine a slightly inferior pigeon who just happens to have some good luck and thus introduces suboptimal genes into the pigeon gene pool. I argued that when we consider the amount of chance events that can occur in the world, randomness Neo-Darwinists believe in, we should replace the motto "Survival of the Fittest" with the motto "Survival of the Luckiest". 

However Neo-Darwinists have a rejoinder to this argument. Neo-Darwinists can argue that a mutation conferring on a pigeon a better-flying gene actually occurs quite frequently: today and throughout history many pigeons have been born with this mutation. Yes, some pigeons born with this mutation are less lucky than others born with this mutation but if thousands of pigeons have been born with this mutation throughout history, on average, overall, they are slightly more likely to reproduce than those born without it. The notion in statistics that is relevant here is "Regression to the Mean". Neo-Darwinists can argue that the random disasters and windfalls that might befall individual pigeons born with this mutation cancel each other out. If many pigeons are or have been born with this mutation, more predictable and consistent selective pressures, long term pressures, should produce a slight bias towards fitter pigeons and bring about gradual evolution. (This is to assume that pigeons are not already so well adapted the their ecological niche that they cannot improve, that this niche remains fairly stable, and that the Law of Large Number applies to evolution, rather than Chaos Theory.)

This is a powerful rejoinder but it has some implications that evolutionary biologists might not like. It seems that this counterargument involves assuming that beneficial mutations occur more frequently than I suggested in the previous essay. How often do random mutations occur? If they do occur frequently then we might expect the genetic variation within a species to become very large. Consider dogs. There is huge phenotypic variation in types of dog, from chihuahuas to St Bernards, although I think that all dogs can couple with each other and we tend to use the same word, "dog", for all types of dog. Neo-Darwinists might argue that for many thousands of years there has been very little selective pressure on the dog species enabling genetic variation to enormously increase, an increase exploited by dog breeders when breeding types of dog. Dogs seem evidence for the claim that mutations can occur frequently. However other species, like leopards and kiwis and elephants for instance, are very uniform. If we assume Neo-Darwinism is correct and if we also assume that mutations are frequent, this uniformity can only be explained by also assuming very strong selective pressures on leopards, kiwis, and elephants, pressures continually weeding out nonconformists. There are puzzles here which I don' t think either evolutionary biologists or their opponents have solved.

This argument bears on modern humans. It is tempting to think that selective pressures in the contemporary world might be having an effect on the evolution of humankind today. Many people used to believe that intelligence is largely genetic, that 5 point increments in IQ could somehow be linked to the presence or absence of particular alleles; if this were the case we might worry that because so many intelligent women are choosing not to have children these days that humans might be becoming gradually stupider. In 2004, Dean Hamer proposed in a book called The God Gene that people who are more inclined to mystical experiences carry a variant of the gene that codes for a protein called VMTA2 that conveys many neurotransmitters from one neurone to another. (Even though Hamer might be wrong, this still qualifies as good science – unlike most other evolutionary psychologists Hamer has at least gone to the trouble of trying to identify an actual gene, variants of which seem to correlate with the presence or absence of a particular psychological trait.) If we assume that people who are more likely to be mystical are also more likely to be religious and also consider the apparent fact that religious people tend to have more children than atheists, this should make atheists like Richard Dawkins very concerned that evolution might favour theists, that it might be weeding atheists out of the gene pool. Humans would therefore be becoming gradually more religious. Of course this is to assume that intelligence and mysticism are genetic. We should also remember that atheism and women's rights only appeared very recently in our evolutionary history; it would probably take many thousands of years of social conditions very much like what we have today for us all to become stupider and more inclined to believe in God, even assuming that these traits are indeed genetic and are being selected for. Nevertheless anxieties of this sort, anxieties about the future of humanity, can plague a person whenever he or she worries that evolutionary psychologists might be right.

In the previous essay, I argued that genotype does not wholly determine phenotype. First I want to emphasise again the extraordinary complexity of humans: the human eyeball, with its cornea, pupil, iris, retina, two types of fluid, and other features is enormously complicated, and it is only one organ among many that makes up a person. In the previous essay I claimed, effectively, that given the complexity of a human, the DNA housed in each cell nucleus could not carry enough information for it to be considered the blueprint for the whole organism. I based this claim on the finding by the Human Genome Project that the human genome carries less than 20,000 protein-coding genes. However evolutionary biologists could push back against this claim in the following way: they could argue that in order to build and maintain cells, organs, the human's whole body, a whole lot of other genes aside from protein-coding genes are involved, genes that taken together influence when and how often the protein-coding genes are expressed. They could then argue that if we take into account all these other genes we can consequently account for the complexity of a human. But in order for them to make this case convincingly we would need to know how many active genes other than protein-coding genes there are in the human genome total and I do not believe the geneticists know this yet.

Another way evolutionary biologists could fight back against the arguments I presented in the previous essay is by invoking the idea of polygenic inheritance. The Human Genome Project was a tremendous achievement but after it was completed and more and more genetic surveys on different groups of people were carried out, geneticists realised that they had a serious problem. This problem is known as the Missing Heritability Problem, something I have talked about before. It had been assumed that many supposedly congenital conditions and diseases are caused by particular gene variants, but although this was found to be true of some conditions, many other conditions could not be correlated with specific genes. For instance, it was thought that there must be a 'gay gene' and a 'schizophrenia gene', genes that could be identified, but neither has been found despite the best efforts of geneticists to find them. If you think either gene has been discovered, you've been misled. Last year, here in New Zealand, The Listener published an interview with some American fascist in which he argued that mental asylums should be brought back – this article contained the false claim that the schizophrenia gene had been discovered. (Genetic studies, known as candidate studies, often throw up false positive results and this was probably the basis for this egregious claim.) This article upset me profoundly and, combined with other things happening in my life at the time, was what led me to drop out of the philosophy degree I was studying for last year.

Faced with with the Missing Heritability Problem, evolutionary biologists have retreated to the position that supposedly genetic conditions are caused by very many genes all working in concert, something known as polygenic inheritance. This concept seems to me extremely ad hoc. Because evolutionary biologists still desperately want to claim that many conditions are congenital and genetic they have decided to believe that conditions like 'schizophrenia' result from an interplay of very many genes. Some of them may even think that some conditions somehow can be attributed to the whole genome, a position that Munecat herself took in her video (in a way that was somewhat inconsistent with other claims she made in the video concerning human psychology). Evolutionary biologists had assumed certain genes could be linked to certain conditions but empirical science had shown this to be wrong and so they have adopted a position which, to be proved, would require geneticists to show how lots of genes working together make people gay or nuts, something very difficult to prove – I would like to wager that empirical science will eventually show that polygenic inheritance also doesn't make sense, forcing evolutionary biologists to retreat to an even less falsifiable position if they still want to cling to the idea that the genome is the blueprint of the adult organism. However I have to concede that these two notions, the notion that genes other than protein coding genes could determine the phenotype of an organism and the notion that characteristics could result from polygenic inheritance, are not unreasonable ways evolutionary biologists could push back against some of the claims I made in the previous essay.

I have been thinking about evolutionary biology for a long time – I published my first post concerning it way back in 2016 I think. I want to rehash the argument I made then briefly. Humans have 46 chromosomes. Horses have 64 and donkeys have 62. If a a donkey and a horse get intimate with each other, they beget a mule with 63 chromosomes, offspring that is sterile because an animal needs to have an even number of chromosomes in order to create viable gametes. One way that you can distinguish a particular species from another is by counting the number of chromosomes it has. It seems reasonable to suppose that horses might have descended from donkeys or the other way round or that they both might have shared a common ancestor in relatively recent evolutionary history. But, and this is where we arrive at the objection, how can a species with, say, 62 chromosomes evolve into one with 64? This seems to involve an individual being born with an enormous and abrupt mutation. This mutation must also help rather than hinder the individual's chances of surviving and reproducing. Furthermore if such a mutation were to occur, if a donkey were to give birth to the world's first horse, with whom would the horse mate and produce viable offspring? It seems we need at least one more horse, perhaps many, to suddenly appear all at the same time. It seemed to me then that such an enormous leap in equine evolution was too unlikely to occur by chance.

This argument occurred to me in 2013. Up until then I had never questioned the Neo-Darwinian orthodoxy. When it occurred to me it stunned me so much that, while sitting on my deck at night looking at the sky, it suddenly seemed to me that the moon was projected onto the clouds rather than being behind them. When I discussed this argument in 2016, some of my readers may have concluded that I was a Young Earth Creationist trying to debunk Evolution on religious grounds but, of course, it wasn't like I had suddenly decided that the Garden of Eden and Noah's Ark and all that had really existed; rather I had realised that Neo-Darwinism was wrong or incomplete. I didn't have an alternative. It wasn't until early 2018 that a solution occurred to me: when a mutation occurs it must occur to a number of individuals all at the same time. This solution seemed to me to involve something both supernatural and purposeful, a view I have not disavowed: this is why in the previous essay I argued that we should explain evolution in terms of top-down causation and action-at-a-distance.

The purpose of this essay however is to consider arguments that evolutionary biologists might make to defend their theory from attacks such as this. Recently I was thinking about Down syndrome. People with Down syndrome are born with an extra copy of chromosome 21 – this occurs as a result of a glitch during meiosis in the gonads of one of the parents. People with Down syndrome very rarely have children. An argument could be made that people with Down syndrome are so genetically unlike regular people that they are another species entirely; such an argument would have to contend with the apparent fact that people with Down syndrome don't reproduce. Now consider the following little story. We have a population of donkeys and, for some reason, many of them start being born with a chromosomal abnormality similar somewhat to Down syndrome – they are born with an extra copy of a chromosome. We assume furthermore that donkeys born with this chromosomal abnormality can breed with each other and with regular donkeys. In this population we will therefore presumably end up with some donkeys having 62 chromosomes, some having 63, and some having 64. We now suppose that this population is divided into two groups by some geographic change such as a glacier or continental drift; we then suppose that natural selection does its work and, because of different selective pressures associated with the two different environments, one of the two sibling populations entirely assumes a chromosome number of 64 and and the other 62. We now have two distinct species: horses and donkeys. The process I am describing is very close to allopatric speciation except that I am conjecturing that the genetic variation might exist prior to the geographic schism. Little stories like this are ones evolutionary biologists could postulate when fighting against arguments of the type I presented above.

What I have been trying to do in this essay so far is to set out some of arguments evolutionary biologists could employ when contesting the claims I made in the previous essay and in earlier posts. They could argue against my claim that we should replace the motto "Survival of the Fittest" with "Survival of the Luckiest" by proposing that beneficial mutations occur often within a species and that the Law of Large Numbers means that fitter individuals are, on average, more likely to survive and reproduce. They could argue against my claim that the genotype of a human is not information-rich enough to explain the complexity of the human phenotype by saying that genes other than protein-coding genes affect the phenotype and that polygenic inheritance also plays a role. They could argue that it is possible for a species with one number of chromosomes to evolve into a species with another number of chromosomes by inventing a just-so story like the one I told above. The important thing though is that, although evolutionary biologists might indeed deploy such arguments, they are all very speculative, all very unlikely, and all very difficult to falsify, and so we are entitled to suspect that there is a problem with Neo-Darwinism and that processes other than chance and natural selection might be involved in evolution.

Although I have conceded in this essay that the arguments I made in the previous essay were not watertight, I still believe that Neo-Darwinism is wrong. I want now to present another argument against it. It concerns sexual selection. Readers may be aware that evolutionary psychologists have an issue with peacocks. The problem they have is that peahens really like males with enormous multicoloured plumage even though peacocks have to invest a great deal of resources into growing them and even though their tails make them much more vulnerable to predation. Presumably hundreds of thousands of years of peahens selecting mates with more and more brilliant trains has resulted in such brilliant unwieldy trains becoming the norm among peacocks. Even though Darwin proposed this hypothesis, he so disliked it that throughout the later part of his life, he is reported to have said that just the sight of a peacock's tail made him nauseous. (If you don't believe me, see The Philosopher's Secret Fire by Patrick Harper for a discussion of this.). Evolutionary biologists find the evolution of peacocks' trains so vexing because they cannot find a satisfactory just-so story to explain why peahens find enormous brilliant tail feathers so sexy. For instance, it has been proposed that males grow these trains to show that they are so fit that they can waste resources by encumbering themselves with enormous tails and still win the Game of Life. Obviously this is a fairly unsatisfactory resolution to the paradox. Years ago Bret Weinstein, before he had self-destructed as a result of his anti-vax stance, during a public discussion about evolution with Richard Dawkins, a discussion which can be found on Youtube, brought up the fact that no one has provided a satisfactory explanation as to why peacocks have the tails they do; Dawkins, with his fundamentalist faith that Neo-Darwinism can explain everything, couldn't seem to perceive that it was even a problem.

I want to describe what I believe is another example of sexual selection at work. When people, experts and laypeople alike, think about evolution, one of their favourite go-to examples is giraffes. It is supposed that giraffes have evolved such long necks so that they can reach leaves growing very high up on trees. In the nineteenth century, some thinkers imagined that giraffes in the past would stretch their necks to reach leaves high in the canopy and had then passed on this acquired characteristic to their offspring. Neo-Darwinists today though would say that when giraffes were randomly born with genes that endowed them with longer necks, these giraffes, because they could reach leaves higher up, had survived longer and reproduced more than shorter necked giraffes, bringing about a gradual evolution in the whole giraffe species. However again there is a problem. On QI many years ago Steven Fy reported that zoologists had found that giraffes often bend down to eat leaves. An alternative explanation is required. Some evolutionary biologists, spurred by observations made by field researchers, have proposed that giraffes use their necks somehow when fighting with each other and longer necked giraffes were more successful when engaged in such internecine warfare. This is another fairly unsatisfactory resolution to a different problem. I would like to suggest here that the long necks of giraffes also result from sexual selection. Female giraffes simply find long necked male giraffes sexy and male giraffes simply find long necked female giraffes sexy.

Something I implied but didn't state explicitly in the previous essay is that I believe that humans and other animals have minds independent of their bodies. The minds of humans participate in the collective Human Mind and the minds of peacocks and giraffes participate respectively in the collective Peacock Mind and the collective Giraffe Mind. These minds or souls direct the behaviours exhibited by humans and other animals as well as their cognitions. When we consider that the sexual attractions peahens feel towards well feathered peacocks and the sexual attractions giraffes feel towards long necked giraffes have baffled evolutionary biologists, we are entitled to wonder if something mystical or supernatural might be behind these attractions, some process other than the mechanisms evolutionary biologists ordinarily propose, and we are entitled to wonder if some power might be acting through peahens and giraffes influencing their evolution. This speculative, partial explanation of sexual selection coheres with the theory I proposed in the previous essay.

In the next part of this essay I want to turn to another subject that readers know fascinates me: probability. Recently I read Ten Great Ideas About Chance by Persi Diaconis and Brian Skyrms. I have been thinking about some ideas related to probability, inspired by this book, and my aim here is to discuss some of these ideas. I'm not going to simply regurgitate their book because they were not always perfectly clear and I think it possible to present these ideas in a much more easily digestible style than Diaconis and Skyrms do.

When we think about probability we often think about randomness. True randomness is both difficult to produce and difficult to define. People still use calculators and on most modern calculators there is a button you can press which supposedly generates a 'random' number. In fact this number is not really random at all because computers rely on algorithms and you cannot produce random numbers algorithmically. Instead calculators produce numbers that are pseudorandom. For instance, a calculator might take the specific time of the day when the button was pushed and then perform some complicated procedure on it to produce a number that looks random but actually isn't. Sometimes scientists need random numbers to feed into their experiments and in their book Diaconis and Skyrms describe some very involved and unusual experimental setups scientists have implemented to get numbers that look as random as they can make them. It might be that the best way to generate random numbers would be to involve quantum mechanics because most physicists today believe that measurements of quantum phenomena produce random results; but we cannot easily equip handheld calculators with Geiger counters.

So it is almost impossible to produce random numbers and this may lead us to wonder if the term 'randomness' is even well defined. One way to approach the concept of randomness is to consider a sequence of numbers. We then ask if there is some pattern in it and if this pattern can enable us to predict the next number in the sequence – if there is no pattern and we cannot predict the next number we can call the sequence random. Consider first the sequence (1,1,1,1,1,1). It seems that this sequence has a pattern – it is just the number one repeated over and over again. If this pattern holds, we would expect that the next number will be the number one again. If this is the case, the sequence is not random. We cannot be sure that this pattern will continue, though – it might still be a random sequence that has accidentally hit on the number one six times. Even so we seem justified in supposing that this sequence displays a pattern that will continue and that is not random. A sign that a sequence is random might be that, assuming each term is drawn from a finite set of possibilities, each term appears, in the long run, with the same frequency as every other term. Suppose we throw a coin repeatedly – we would expect the sequence, in the long run, to contain as many heads as tails because of the randomness involved. However consider the sequence (1,0,1,0,1,0,1,0). Even though the frequencies of ones is the same as that of twos, there is still a pattern – the sequence seems to be alternating between one and two and we seem justified in predicting that the next term will be one. Again though we cannot be sure. Consider now the following sequence: (3,1,4,1,5,9,2). This sequence looks random but in fact it isn't – I have just picked the first seven digits of the number pi. Now that you know this you can pick the next number with certainty. It will be six. Finally consider the sequence: (2,4,3,6,5,10,9,18). Again the pattern is harder to spot but it is there. Each even term is double the previous term and each odd term, apart from the first, is the previous term minus one. If this pattern holds, we should expect the next term to be 17. 

If a sequence is non-random it has a pattern that continues indefinitely – if no pattern at all can be discovered then the sequence is random. The problem is that, for any finite sequence, a pattern can always be discovered. This can be proved quite easily. Suppose we have a finite sequence of n terms – we could always suppose that this sequence of n terms will repeat indefinitely. We might have, for instance, the sequence (1,11, 23,7, 28) and then suppose that the next five terms will be (1,11, 23, 7, 28) again and so on indefinitely. However we still cannot be sure that it will persist. Our issue is a special case of Hume's Problem of Induction. It seems that we can only be sure that a sequence is random or not if the sequence is infinite. If some pattern holds indefinitely, then we can say that it is non-random; if we cannot find any pattern at all in such an infinite sequence, then it is random. The difficulty for mathematicians interested in probability is that in order to decide with certainty whether a sequence is random or not, we would somehow need to evaluate an infinite sequence of terms and test this sequence against all the possible patterns that might produce some or all of the sequence, a number of patterns that may also be infinite. The problem of deciding whether a sequence is random or not cannot be solved. But it can be ameliorated by Bayes' Theorem, the topic I want to turn to next.

Thomas Bayes, a Presbyterian minister who lived during the eighteenth century, was motivated by a fundamental issue in the philosophy of probability, the Inverse Problem. Up until Bayes, philosophers always moved from hypotheses concerning probabilities to frequencies; the task Bayes set himself was to show how we can move from frequencies to probabilities. Consider coin flips. Before Bayes, people would start by assuming that the probability of getting heads is 1/2 and then go on to ask: given some set of actual outcomes, say (H, H, T, H), what is the probability of these outcomes occurring? The probability here, because we assume the ordering does not matter, is 1/4. The project that Bayes set himself was to work out how we can start with the outcomes and then proceed to the probability. Bayes's Theorem is that the probability of a hypothesis given some observed evidence (known as the posterior probability) equals the probability of the evidence given the hypothesis is true multiplied by the probability of the hypothesis (known as the prior) divided by the probability of the observed evidence. Algebraically this is 

P(H | E) = P(E | H) P(H) / P (E)

P(H | E) and P(E | H) are both conditional probabilities. One reason Bayes' Theorem has proved so popular among philosophers is that it seems to capture the way they think we ought to reason about the world. It is supposed that we constantly form hypotheses about the world to which we assign probabilities or credences; then on the basis of evidence we either increase or decrease our credence in these hypotheses. It seems a rational model for how we should adjust the strength of our beliefs when faced with confirming or disconfirming evidence. Ten Great Ideas About Chance is not as clear when it comes to Bayes' theorem as it could be and the treatments I have found on Bayes' Theorem on Youtube and elsewhere do not seem to me to capture the most important insight implied by the Theorem. Educators on Youtube often use medical trials as examples and focus on the notions of specificity, sensitivity and prevalence. Specificity is P(~E | ~ H), sensitivity is P(E | H), and prevalence is P(E); given these three we can work out P(H) and P(H | E). What I want to do though is approach the Theorem from a different angle then these online educators and the way I shall do so is by devising a kind of thought experiment, by making up a little story.

Suppose Jones is visiting his friend Smith, a farmer in Central Otago. Jones knows for sure that all farms in Otago contain either sheep or cows or some mixture of both. Jones does not know for sure the composition of Smith's farm but he hypothesises that all the livestock on Smith's farm are sheep. He assigns to this hypothesis a probability between 0 and 1 – it actually doesn't matter what this probability, the prior, is. Then when he arrives at Smith's farm, the first animal Jones sees is a cow. Obviously this immediately disproves his hypothesis – if one of Smith's animals is a cow then obviously not all of Smith's livestock are sheep. Let's now interpret Jones's thinking from a Bayesian perspective. Because Jones has seen a cow the probability of the evidence given the hypothesis, P(E | H), is 0. That is, if the hypothesis were correct we would expect it to be impossible for Jones to see a cow. In this situation the value of the prior, P(H), is irrelevant, as is the probability of the evidence P(E) (assuming that P(E) is greater than 0 which it must be if Jones has seen a cow): 0 multiplied by anything is still 0. Having seen a cow, Jones must alter his credence in his hypothesis from whatever it was before he visited the farm to 0.

Now suppose that when Jones visits the farm, the first animal he sees is a sheep. Then he sees another sheep and then another. To make the math simpler, we shall not assume that every sheep he sees is a different animal – it might be, although this is unlikely, that he is seeing the same sheep over and over again. It seems nevertheless that every time he sees a sheep this should increase Jones's confidence that his hypothesis is correct and to see if this is so we can put numbers into Bayes' Theorem. The probability of the evidence given the hypothesis, P(E | H), is 1. To put it another way, if Jones's hypothesis is correct, then it is certain that when he sees an animal on the farm, it will be a sheep. This means that we can simplify the formula: it becomes P(H | E) = P(H) / P(E).  The first thing to note about this formula is that if P(E) is less than one then P(H | E) must always be greater than P(H). That is, every time time he sees a sheep, assuming he never sees a cow, he can have greater confidence in his hypothesis then before that every animal on the farm is a sheep. The posterior possibility will always be greater than the prior possibility. The second thing to note is that, because P(H | E) can never exceed one, P(E) can never be less than P(H).

When I was thinking about this over the last week or two, the fact that P(H) must always be less than (or equal to) P(E) initially confused me. One important idea behind Bayes Theorem is that P(H) can initially take any value we want: it is arbitrary, subjective. It is only after more and more evidence is gathered that the posterior possibilities will tend to converge on the correct number. Surely, though, if P(H) can take any value we want then we could always pick a value for it greater than P(E). We could suppose, as an alternative hypothesis, that there are the same number of cows as sheep and that therefore P(E) = 1/2; why could we not then choose a value for P(H) that is greater? Then I thought it through. P(E) is determined after P(H) is determined. P(H) can never be greater than P(E) because when Jones sees the sheep it could be that his hypothesis is correct or it could be that some other hypothesis is correct or it could be that whether he sees a cow or a sheep is completely random. (This last option is actually another hypothesis; I will clarify what I mean by this in a moment.) The probability that he sees a sheep must always be greater than the probability that his hypothesis is correct because his sheep-sighting could always be explained by alternative hypotheses instead of his main hypothesis. We can formalise this insight in the following way: the probability of the evidence must equal the probability of the evidence given that the hypothesis is correct multiplied by the probability of the hypothesis plus the the probability of the evidence given that the hypothesis is incorrect multiplied by the probability that the hypothesis is incorrect. Schematically,

 P(E) = P(E | H)P(H) + P(E | ~H)P(~H) . 

(The symbol, ~ ,called a tilde, just means "not".) In the particular situation we are imagining P(E | H) equals one, so we can simplify this equation to P(E) = P(H) + P(E | ~H)P(~H). We can simplify this equation still further: P(H) + P(~H) = 1 so P(~H) = 1 – P(H) and so P(E) = P(H) + P(E | ~H)(1 – P(H)). 

However we now face another puzzle, a puzzle less easy to clear up. Bayes is attempting to solve the Inverse Problem, that is, how we can make the move from frequencies to probabilities. But his solution still involves sometimes going in the traditional direction, from probabilities to frequencies – we arrive at a value for P(E | H) by considering the hypothesis alone, a hypothesis that involves a fixed probabilistic distribution. In the case we are considering, the probability of P(E | H) is 1. How though can we evaluate P(E | ~H)? There is supposedly no specific hypothesis involved here, only the absence of a specific hypothesis. However when calculating P(E | ~H) we still need an alternative candidate hypothesis in order to find a value for it. This hypothesis could be that Smith owns 3 sheep and 47 cows, or 35 sheep and 83 cows, or 13 sheep and 7 cows, or any other distribution. In these cases, assuming (paradoxically) that one of these distributions is conceived by Jones as being a viable alternative candidate hypotheses, P(E | ~H)should equal 3/50, 35/118 or 13/20 respectively. Perhaps Jones should assume that, if his main hypothesis wrong, the distribution is completely random. But then we are forced to say what we mean by 'random'. We could define the notion of randomness in this case by supposing that the Principle of Indifference applies and that, because there are two options, cows or sheep, they are both equally likely. In this case, P(E | ~H) should be taken to be 1/2. Alternatively we could imagine that before he visited the farm, Jones knew that 35 percent of all farm animals in Otago are sheep and the other 65 percent are cows, in which case P(E | ~H) could be taken to be approximately 7/20. Perhaps such background knowledge should be taken into account when working out P(E | ~H). But it seems that Jones is not now comparing his hypothesis with its absence so much as comparing his hypothesis with a specific alternative hypothesis.

Despite this puzzle, in the case we are considering Bayes' Theorem is still actually very successful. This is because it turns out that we can assign any value we want to P(E | ~H) so long as it is between 0 and 1. It doesn't matter what the probability of P(E | ~H) is. Whatever probability we assign to P(E | ~H), the more sheep Jones sees without ever seeing a cow the more confident he can be that his hypothesis is correct, will approach certainty. Earlier I said that the probability Jones associates with his hypothesis before he visits the farm, P(H), can take on any value; when you think through the math, you realise that P(E | ~ H) can also take on any value. After repeated observations of sheep without ever seeing any cows, the posterior probability calculated every time will still always tend towards unity after multiple observations. Imagine that before he visits the farm, the credence Jones puts in his hypothesis that Smith only owns sheep is 1/10 and that the probability of seeing a sheep when his hypothesis is incorrect is 1/2. The first time he sees a sheep, his credence will change to be (1/10)/(1/10 + 9/20) or 2/11. This posterior becomes the prior when he next sees a sheep: the calculation then becomes (2/11)(2/11 +9/22) or  4/13. The next time he sees a sheep he should estimate his certainty as being (4/13)(4/13 + 9/26) or 8/17. Thus, after seeing three sheep, his confidence in his hypothesis has increased from 1/10 to nearly a half and it will continue to improve the more sheep he sees. More generally, if Jones, before visiting the farm, has estimated the probability of his hypothesis being true as being a and the probability that he sees a sheep when his hypothesis is incorrect as being b then the probability he should assign to his hypothesis after seeing n sheep without ever seeing a cow is a/(a + x(1-a)) where x is b to the power of n. And because b is between one and zero, x will tend towards zero as n tends towards infinity and the probability of Jones's hypothesis being correct should tend towards 1.

So Bayes' Theorem works in this case, in the case where P(E | H) = 1. However the problem I discussed above has only been resolved for this one case. What about other cases? We can rearrange Bayes' Theorem in the following way: 

P(H | E) / P(H) = P(E | H) / (P(E | H)P(H)  +  P(E | ~H)(1 - P(H)))

If the evidence supports the hypothesis, the left side of the expression, and consequently the right side as well, will be greater than one; if the evidence militates against the hypothesis, both sides will be less than one. It just takes a little more rearranging to work out that if P(E | H) > P(E |~H), that is, if the probability of the evidence assuming the hypothesis is true is greater than the probability of the evidence assuming the hypothesis is false, then if presented with the evidence one's credence in the hypothesis should increase. If P(E | H) < P(E | ~H) then, when observing the evidence, one's confidence in the hypothesis should decrease. But the problem we discussed earlier reemerges: although we can work out P(E | H) we cannot work out P (E | ~ H). I'll give a concrete example. Suppose Jones's initial hypothesis is not, "Every animal on Smith's farm is a sheep" but rather "23 percent of the animals on Smith's farm are sheep". This hypothesis enables us to calculate that, if Jones sees a sheep, P(E | H) will be 23/100. But how can we calculate P(E | ~H)? We could assume that if the hypothesis is not true then 10 percent of the animals are sheep, in which case P(E | ~ H) will be 1/10. This would mean that if Jones sees a sheep his hypothesis has been strengthened. Or we could assume that if his hypothesis is not true then 75 percent of the animals are sheep, in which case P(E | ~H) equals 3/4; seeing a sheep should thus make the hypothesis weaker. Perhaps Jones should assume that if his hypothesis is wrong, sheep and cows are equally likely, that this is what we mean when we say that the distribution is random: in this case P(E | ~H) equals 1/2 and his seeing a sheep will weaken Jones's credence in his hypothesis. But there is no way of working out what value for P(E | ~H) we should choose. When carrying out statistical research scientists often talk about the null hypothesis. The null hypothesis involves assuming that there is no causal relationship between two sets of data, that any apparent correlation is the result of coincidence, chance. The null hypothesis is close to but not quite the same as P(E | ~H) – the former is supposedly a theoretical construct involving no hypothesis at all while the latter can be seen the probability of the evidence given any hypothesis other than Jones's, including the 'random' hypothesis, being true. It may be, as I intimated a little earlier, that we should estimate P(E | ~H) based on background assumptions but, and I know I'm repeating myself, this would mean that Bayes' Theorem should really being interpreted as concerned with evaluating one hypothesis against another rather than against an alternative in which there is no hypothesis at all.

Despite these puzzles and problems, Bayes's Theorem is a useful way of considering the number sequences discussed earlier. Suppose we are presented with the first ten numbers of a longer sequence. We can of course always find at least one pattern – we could take this pattern to be our hypothesis concerning the whole sequence.  The probability we assign to this prior, P(H), is subjective, arbitrary, but we could perhaps reasonably assign larger probabilities to simpler patterns. We are then presented with the next number in the sequence. If this number breaks the pattern, then P(E | H) equals zero and we have falsified the hypothesis. If this next number conforms to the pattern, P(E | H) equals 1 and we can work out that  P(H | E) = P(H) / (P(H) + P(E | ~H)P(~H)) . Although, as I've pointed out, it is difficult to determine what value we should assign to P(E | ~H), whatever value we give it, so long as it is less than one, will lead us, given this next observation, to strengthen our confidence that the pattern has continued and will continue. This is exactly analogous to my story about Jones's visit to Smith's farm. Consider the sequence (1,0,1,0,1,0,1,0). We could form the hypothesis that the sequence will alternate between 1 and 0 forever and assign to this hypothesis, arbitrarily, the probability of 0.1. If the next number is 0 again the hypothesis has been falsified. Suppose instead that the next number is 1.  Let us suppose that we also know that this sequence consists entirely of ones and zeros in some order – we can then reasonably make the assumption that it might be random and that P(E/~H) could be taken to be 0.5. Then the posterior, P(H | E) equals 0.1/(0.1 + 0.5(1-0.1) or 2/11. With each successive term that conforms the pattern, we can increase our confidence that we are right, that this pattern exists and will continue.

This method of finding the probabilities involved in a number sequences has interesting implications. If we find one number that breaks the pattern our hypothesis has been falsified but we can imagine a situation in which we run through hundreds or thousands of numbers before the black swan appears. We can never be absolutely certain that the pattern will continue although (depending on P(E | ~H)) if a sequence persists for hundred or thousands of terms we might place very high confidence in the expectation that it will continue forever. However it is even more difficult to decided if a sequence is random – in order to decide if a finite sequence is random we would need to test it against every single candidate hypotheses, every single potential pattern, a number that may be extremely large if not infinite. And of course, as I argued above, for any n terms, no matter how large, there is always at least one potential pattern.

This brings me back to the issue of randomness. Science does not always proceed statistically but when it does, for instance in the research carried out today by university psychology departments, it requires the null hypothesis, the hypothesis that some set of data points might be wholly random. For a Philosophy course I took last year I wrote an essay concerning Jacob Stegenga's book Medical Nihilism in which he attempts to apply Bayesian reasoning to medical science with the aim of convincing people that they generally put more faith in medicine than they ought. In the essay I focussed on the hypothesis widely believed by ordinary people and mental health practitioners generally that antipsychotic medication successfully treats the symptoms of this supposed disease 'schizophrenia' that some people are supposedly born with, that in the absence of antipsychotic medication psychotic symptoms will reappear. As Stegenga does when attacking medical science more generally, I tried to use Bayes' theorem to show that people have higher confidence in this hypothesis than they should. The arguments I made in that essay were interesting even though I am not sure I fully understood Bayes' Theorem when I wrote it – but I am not going to recycle them here except to say that there is a problem with the null hypothesis. The issue of randomness also bears on the subject of the first part of this essay: evolutionary biology. Evolutionary biology relies on the concept of randomness because we cannot have natural selection without also supposing that random genetic mutations occur for nature to then select. My feeling is that randomness is not a real thing. My feeling is that the science that has built up around the notion of 'schizophrenia' malignly affects people labelled with this condition, and that supposedly random genetic mutations are not random all. I believe that the supposed randomness associated with quantum mechanics is also not random. These are extraordinary claims which would require extraordinary evidence. But to successfully rebut these claims, philosophers and scientists need to be clearer about what they mean by 'randomness'.

I'll finish this essay by making a broad generalisation about many philosophers today. The whole enterprise of philosophy is based around the notion of 'rationality'. Philosophers in the anglophone tradition value rationality highly and often believe that rationality is the highest virtue. But what does it mean to be 'rational'? I suspect that many philosophers today think that rationality should involve Decision Theory and Bayesian reasoning. When faced with a choice, a person should associate utilities and probabilities with all the options and choose the option which maximises Expected Utility. When working out probabilities, rational agents should apply Bayes' Theorem – that is, they should adjust the strengths of their beliefs after considering all the evidence presented to them. My problem with philosophers who embrace this position is that I simply do not believe that people travel through the world working out probabilities and utilities and adjusting them when encountering either confirming or disconfirming evidence in this way. It would involve assuming that they could somehow numerically determine the Utilities and Probabilities associated with particular events and then then subject them to complicated mental arithmetic. Real people in the real world   do not do this. Suppose I am considering seeing the new film Deadpool & Wolverine. Philosophers of this sort will say that what I should do is assign, say, three units of positive utility to the prospect of seeing it and enjoying it, multiply this number by the probability that it will be enjoyable, say 70%, and then subtract from this product the Expected Utility of it being bad (say, four units) multiplied by the probability that is bad which we are supposing to be 30%. In this case, the net Expected Utility is positive and so I should go see it. Afterwards I can revise my hypothesis concerning whether it was good or not, retroactively, in a Bayesian way, having been presented with the evidence, my feelings about the film. But how can I possibly arrive at all these numbers? I do not believe people travel through life consciously assigning utilities and probabilities to options and basing their decisions on such calculations. Either people are rational but these calculations are unconscious, or people are not 'rational' at all (supposing we are to insist that the term 'rational' must mean the consistent application of Decision Theory and Bayesian reasoning). This position may seem nihilistic but I wouldn't have written this essay if I didn't believe in rationality at all. It is simply that I believe that if we are to be truly rational, it may lead us to conclusions that may seem extraordinary. And one conclusion is that we should reject the idea that Decision Theory and Bayesian reasoning are the only rational theories we should adopt to guide our actions.