silverfish: Concerning Evolution Again; Concerning Bayes' Theorem

Sometimes in this blog I have felt impelled to revisit earlier essays to improve the arguments I presented in them. My most recent essay, "Evolution, Ideas, and Hiveminds", an essay I only published a couple of weeks ago, was not as strong as I would like and, in thinking about it over the last couple of weeks, I have decided to go back to the arguments I made in it and discuss some of the its claims from a different perspective. When one is engaged in debate one should at least try to steel-man the arguments made by one's opponents and so, in an effort to cultivate at least a little intellectual integrity, I feel I should present Neo-Darwinism in a slightly more favourable light. This essay will make most sense to readers who have read the preceding essay. I want to discuss the notion of "Survival of the Luckiest" again and then the claim that genotype does not fully determine phenotype, two notions I introduced in the previous essay; this time though I want to consider them from the point of view of someone who believes in the Modern Synthesis. I also want to talk about an argument I set out a long time ago in my first essay about evolution: "Concerning Evolution". Although there were weaknesses in "Evolution, Ideas, and Hiveminds", I stand by my general thesis and so will go on to talk about an idea that does support it: sexual selection. In the second part of the essay I want to talk about probability again. First I want to discuss randomness and then Bayes' Theorem.

Readers may recall that in the previous essay I told a little story concerning a pigeon. I asked the reader to imagine a pigeon born with a small mutation that makes it very slightly better at flying than other pigeons. I claimed that, according to Neo-Darwinists, Evolution will inevitably lead that pigeon to survive longer and have more offspring than its peers and that, because this mutation should gradually propagate throughout the whole population, the whole pigeon species should evolve. I objected to this proposal by describing all the potential misfortunes that could befall the pigeon and prevent it from mating. I also asked the reader to imagine a slightly inferior pigeon who just happens to have some good luck and thus introduces suboptimal genes into the pigeon gene pool. I argued that when we consider the amount of chance events that can occur in the world, randomness Neo-Darwinists believe in, we should replace the motto "Survival of the Fittest" with the motto "Survival of the Luckiest".

However Neo-Darwinists have a rejoinder to this argument. Neo-Darwinists can argue that a mutation conferring on a pigeon a better-flying gene actually occurs quite frequently: today and throughout history many pigeons have been born with this mutation. Yes, some pigeons born with this mutation are less lucky than others born with this mutation but if thousands of pigeons have been born with this mutation throughout history, on average, overall, they are slightly more likely to reproduce than those born without it. The notion in statistics that is relevant here is "Regression to the Mean". Neo-Darwinists can argue that the random disasters and windfalls that might befall individual pigeons born with this mutation cancel each other out. If many pigeons are or have been born with this mutation, more predictable and consistent selective pressures, long term pressures, should produce a slight bias towards fitter pigeons and bring about gradual evolution. (This is to assume that pigeons are not already so well adapted the their ecological niche that they cannot improve, that this niche remains fairly stable, and that the Law of Large Number applies to evolution, rather than Chaos Theory.)

This is a powerful rejoinder but it has some implications that evolutionary biologists might not like. It seems that this counterargument involves assuming that beneficial mutations occur more frequently than I suggested in the previous essay. How often do random mutations occur? If they do occur frequently then we might expect the genetic variation within a species to become very large. Consider dogs. There is huge phenotypic variation in types of dog, from chihuahuas to St Bernards, although I think that all dogs can couple with each other and we tend to use the same word, "dog", for all types of dog. Neo-Darwinists might argue that for many thousands of years there has been very little selective pressure on the dog species enabling genetic variation to enormously increase, an increase exploited by dog breeders when breeding types of dog. Dogs seem evidence for the claim that mutations can occur frequently. However other species, like leopards and kiwis and elephants for instance, are very uniform. If we assume Neo-Darwinism is correct and if we also assume that mutations are frequent, this uniformity can only be explained by also assuming very strong selective pressures on leopards, kiwis, and elephants, pressures continually weeding out nonconformists. There are puzzles here which I don' t think either evolutionary biologists or their opponents have solved.

This argument bears on modern humans. It is tempting to think that selective pressures in the contemporary world might be having an effect on the evolution of humankind today. Many people used to believe that intelligence is largely genetic, that 5 point increments in IQ could somehow be linked to the presence or absence of particular alleles; if this were the case we might worry that because so many intelligent women are choosing not to have children these days that humans might be becoming gradually stupider. In 2004, Dean Hamer proposed in a book called The God Gene that people who are more inclined to mystical experiences carry a variant of the gene that codes for a protein called VMTA2 that conveys many neurotransmitters from one neurone to another. (Even though Hamer might be wrong, this still qualifies as good science – unlike most other evolutionary psychologists Hamer has at least gone to the trouble of trying to identify an actual gene, variants of which seem to correlate with the presence or absence of a particular psychological trait.) If we assume that people who are more likely to be mystical are also more likely to be religious and also consider the apparent fact that religious people tend to have more children than atheists, this should make atheists like Richard Dawkins very concerned that evolution might favour theists, that it might be weeding atheists out of the gene pool. Humans would therefore be becoming gradually more religious. Of course this is to assume that intelligence and mysticism are genetic. We should also remember that atheism and women's rights only appeared very recently in our evolutionary history; it would probably take many thousands of years of social conditions very much like what we have today for us all to become stupider and more inclined to believe in God, even assuming that these traits are indeed genetic and are being selected for. Nevertheless anxieties of this sort, anxieties about the future of humanity, can plague a person whenever he or she worries that evolutionary psychologists might be right.

In the previous essay, I argued that genotype does not wholly determine phenotype. First I want to emphasise again the extraordinary complexity of humans: the human eyeball, with its cornea, pupil, iris, retina, two types of fluid, and other features is enormously complicated, and it is only one organ among many that makes up a person. In the previous essay I claimed, effectively, that given the complexity of a human, the DNA housed in each cell nucleus could not carry enough information for it to be considered the blueprint for the whole organism. I based this claim on the finding by the Human Genome Project that the human genome carries less than 20,000 protein-coding genes. However evolutionary biologists could push back against this claim in the following way: they could argue that in order to build and maintain cells, organs, the human's whole body, a whole lot of other genes aside from protein-coding genes are involved, genes that taken together influence when and how often the protein-coding genes are expressed. They could then argue that if we take into account all these other genes we can consequently account for the complexity of a human. But in order for them to make this case convincingly we would need to know how many active genes other than protein-coding genes there are in the human genome total and I do not believe the geneticists know this yet.

Another way evolutionary biologists could fight back against the arguments I presented in the previous essay is by invoking the idea of polygenic inheritance. The Human Genome Project was a tremendous achievement but after it was completed and more and more genetic surveys on different groups of people were carried out, geneticists realised that they had a serious problem. This problem is known as the Missing Heritability Problem, something I have talked about before. It had been assumed that many supposedly congenital conditions and diseases are caused by particular gene variants, but although this was found to be true of some conditions, many other conditions could not be correlated with specific genes. For instance, it was thought that there must be a 'gay gene' and a 'schizophrenia gene', genes that could be identified, but neither has been found despite the best efforts of geneticists to find them. If you think either gene has been discovered, you've been misled. Last year, here in New Zealand, The Listener published an interview with some American fascist in which he argued that mental asylums should be brought back – this article contained the false claim that the schizophrenia gene had been discovered. (Genetic studies, known as candidate studies, often throw up false positive results and this was probably the basis for this egregious claim.) This article upset me profoundly and, combined with other things happening in my life at the time, was what led me to drop out of the philosophy degree I was studying for last year.

Faced with with the Missing Heritability Problem, evolutionary biologists have retreated to the position that supposedly genetic conditions are caused by very many genes all working in concert, something known as polygenic inheritance. This concept seems to me extremely ad hoc. Because evolutionary biologists still desperately want to claim that many conditions are congenital and genetic they have decided to believe that conditions like 'schizophrenia' result from an interplay of very many genes. Some of them may even think that some conditions somehow can be attributed to the whole genome, a position that Munecat herself took in her video (in a way that was somewhat inconsistent with other claims she made in the video concerning human psychology). Evolutionary biologists had assumed certain genes could be linked to certain conditions but empirical science had shown this to be wrong and so they have adopted a position which, to be proved, would require geneticists to show how lots of genes working together make people gay or nuts, something very difficult to prove – I would like to wager that empirical science will eventually show that polygenic inheritance also doesn't make sense, forcing evolutionary biologists to retreat to an even less falsifiable position if they still want to cling to the idea that the genome is the blueprint of the adult organism. However I have to concede that these two notions, the notion that genes other than protein coding genes could determine the phenotype of an organism and the notion that characteristics could result from polygenic inheritance, are not unreasonable ways evolutionary biologists could push back against some of the claims I made in the previous essay.

I have been thinking about evolutionary biology for a long time – I published my first post concerning it way back in 2016 I think. I want to rehash the argument I made then briefly. Humans have 46 chromosomes. Horses have 64 and donkeys have 62. If a a donkey and a horse get intimate with each other, they beget a mule with 63 chromosomes, offspring that is sterile because an animal needs to have an even number of chromosomes in order to create viable gametes. One way that you can distinguish a particular species from another is by counting the number of chromosomes it has. It seems reasonable to suppose that horses might have descended from donkeys or the other way round or that they both might have shared a common ancestor in relatively recent evolutionary history. But, and this is where we arrive at the objection, how can a species with, say, 62 chromosomes evolve into one with 64? This seems to involve an individual being born with an enormous and abrupt mutation. This mutation must also help rather than hinder the individual's chances of surviving and reproducing. Furthermore if such a mutation were to occur, if a donkey were to give birth to the world's first horse, with whom would the horse mate and produce viable offspring? It seems we need at least one more horse, perhaps many, to suddenly appear all at the same time. It seemed to me then that such an enormous leap in equine evolution was too unlikely to occur by chance.

This argument occurred to me in 2013. Up until then I had never questioned the Neo-Darwinian orthodoxy. When it occurred to me it stunned me so much that, while sitting on my deck at night looking at the sky, it suddenly seemed to me that the moon was projected onto the clouds rather than being behind them. When I discussed this argument in 2016, some of my readers may have concluded that I was a Young Earth Creationist trying to debunk Evolution on religious grounds but, of course, it wasn't like I had suddenly decided that the Garden of Eden and Noah's Ark and all that had really existed; rather I had realised that Neo-Darwinism was wrong or incomplete. I didn't have an alternative. It wasn't until early 2018 that a solution occurred to me: when a mutation occurs it must occur to a number of individuals all at the same time. This solution seemed to me to involve something both supernatural and purposeful, a view I have not disavowed: this is why in the previous essay I argued that we should explain evolution in terms of top-down causation and action-at-a-distance.

The purpose of this essay however is to consider arguments that evolutionary biologists might make to defend their theory from attacks such as this. Recently I was thinking about Down syndrome. People with Down syndrome are born with an extra copy of chromosome 21 – this occurs as a result of a glitch during meiosis in the gonads of one of the parents. People with Down syndrome very rarely have children. An argument could be made that people with Down syndrome are so genetically unlike regular people that they are another species entirely; such an argument would have to contend with the apparent fact that people with Down syndrome don't reproduce. Now consider the following little story. We have a population of donkeys and, for some reason, many of them start being born with a chromosomal abnormality similar somewhat to Down syndrome – they are born with an extra copy of a chromosome. We assume furthermore that donkeys born with this chromosomal abnormality can breed with each other and with regular donkeys. In this population we will therefore presumably end up with some donkeys having 62 chromosomes, some having 63, and some having 64. We now suppose that this population is divided into two groups by some geographic change such as a glacier or continental drift; we then suppose that natural selection does its work and, because of different selective pressures associated with the two different environments, one of the two sibling populations entirely assumes a chromosome number of 64 and and the other 62. We now have two distinct species: horses and donkeys. The process I am describing is very close to allopatric speciation except that I am conjecturing that the genetic variation might exist prior to the geographic schism. Little stories like this are ones evolutionary biologists could postulate when fighting against arguments of the type I presented above.

What I have been trying to do in this essay so far is to set out some of arguments evolutionary biologists could employ when contesting the claims I made in the previous essay and in earlier posts. They could argue against my claim that we should replace the motto "Survival of the Fittest" with "Survival of the Luckiest" by proposing that beneficial mutations occur often within a species and that the Law of Large Numbers means that fitter individuals are, on average, more likely to survive and reproduce. They could argue against my claim that the genotype of a human is not information-rich enough to explain the complexity of the human phenotype by saying that genes other than protein-coding genes affect the phenotype and that polygenic inheritance also plays a role. They could argue that it is possible for a species with one number of chromosomes to evolve into a species with another number of chromosomes by inventing a just-so story like the one I told above. The important thing though is that, although evolutionary biologists might indeed deploy such arguments, they are all very speculative, all very unlikely, and all very difficult to falsify, and so we are entitled to suspect that there is a problem with Neo-Darwinism and that processes other than chance and natural selection might be involved in evolution.

Although I have conceded in this essay that the arguments I made in the previous essay were not watertight, I still believe that Neo-Darwinism is wrong. I want now to present another argument against it. It concerns sexual selection. Readers may be aware that evolutionary psychologists have an issue with peacocks. The problem they have is that peahens really like males with enormous multicoloured plumage even though peacocks have to invest a great deal of resources into growing them and even though their tails make them much more vulnerable to predation. Presumably hundreds of thousands of years of peahens selecting mates with more and more brilliant trains has resulted in such brilliant unwieldy trains becoming the norm among peacocks. Even though Darwin proposed this hypothesis, he so disliked it that throughout the later part of his life, he is reported to have said that just the sight of a peacock's tail made him nauseous. (If you don't believe me, see The Philosopher's Secret Fire by Patrick Harper for a discussion of this.). Evolutionary biologists find the evolution of peacocks' trains so vexing because they cannot find a satisfactory just-so story to explain why peahens find enormous brilliant tail feathers so sexy. For instance, it has been proposed that males grow these trains to show that they are so fit that they can waste resources by encumbering themselves with enormous tails and still win the Game of Life. Obviously this is a fairly unsatisfactory resolution to the paradox. Years ago Bret Weinstein, before he had self-destructed as a result of his anti-vax stance, during a public discussion about evolution with Richard Dawkins, a discussion which can be found on Youtube, brought up the fact that no one has provided a satisfactory explanation as to why peacocks have the tails they do; Dawkins, with his fundamentalist faith that Neo-Darwinism can explain everything, couldn't seem to perceive that it was even a problem.

I want to describe what I believe is another example of sexual selection at work. When people, experts and laypeople alike, think about evolution, one of their favourite go-to examples is giraffes. It is supposed that giraffes have evolved such long necks so that they can reach leaves growing very high up on trees. In the nineteenth century, some thinkers imagined that giraffes in the past would stretch their necks to reach leaves high in the canopy and had then passed on this acquired characteristic to their offspring. Neo-Darwinists today though would say that when giraffes were randomly born with genes that endowed them with longer necks, these giraffes, because they could reach leaves higher up, had survived longer and reproduced more than shorter necked giraffes, bringing about a gradual evolution in the whole giraffe species. However again there is a problem. On QI many years ago Steven Fy reported that zoologists had found that giraffes often bend down to eat leaves. An alternative explanation is required. Some evolutionary biologists, spurred by observations made by field researchers, have proposed that giraffes use their necks somehow when fighting with each other and longer necked giraffes were more successful when engaged in such internecine warfare. This is another fairly unsatisfactory resolution to a different problem. I would like to suggest here that the long necks of giraffes also result from sexual selection. Female giraffes simply find long necked male giraffes sexy and male giraffes simply find long necked female giraffes sexy.

Something I implied but didn't state explicitly in the previous essay is that I believe that humans and other animals have minds independent of their bodies. The minds of humans participate in the collective Human Mind and the minds of peacocks and giraffes participate respectively in the collective Peacock Mind and the collective Giraffe Mind. These minds or souls direct the behaviours exhibited by humans and other animals as well as their cognitions. When we consider that the sexual attractions peahens feel towards well feathered peacocks and the sexual attractions giraffes feel towards long necked giraffes have baffled evolutionary biologists, we are entitled to wonder if something mystical or supernatural might be behind these attractions, some process other than the mechanisms evolutionary biologists ordinarily propose, and we are entitled to wonder if some power might be acting through peahens and giraffes influencing their evolution. This speculative, partial explanation of sexual selection coheres with the theory I proposed in the previous essay.

In the next part of this essay I want to turn to another subject that readers know fascinates me: probability. Recently I read Ten Great Ideas About Chance by Persi Diaconis and Brian Skyrms. I have been thinking about some ideas related to probability, inspired by this book, and my aim here is to discuss some of these ideas. I'm not going to simply regurgitate their book because they were not always perfectly clear and I think it possible to present these ideas in a much more easily digestible style than Diaconis and Skyrms do.

When we think about probability we often think about randomness. True randomness is both difficult to produce and difficult to define. People still use calculators and on most modern calculators there is a button you can press which supposedly generates a 'random' number. In fact this number is not really random at all because computers rely on algorithms and you cannot produce random numbers algorithmically. Instead calculators produce numbers that are pseudorandom. For instance, a calculator might take the specific time of the day when the button was pushed and then perform some complicated procedure on it to produce a number that looks random but actually isn't. Sometimes scientists need random numbers to feed into their experiments and in their book Diaconis and Skyrms describe some very involved and unusual experimental setups scientists have implemented to get numbers that look as random as they can make them. It might be that the best way to generate random numbers would be to involve quantum mechanics because most physicists today believe that measurements of quantum phenomena produce random results; but we cannot easily equip handheld calculators with Geiger counters.

So it is almost impossible to produce random numbers and this may lead us to wonder if the term 'randomness' is even well defined. One way to approach the concept of randomness is to consider a sequence of numbers. We then ask if there is some pattern in it and if this pattern can enable us to predict the next number in the sequence – if there is no pattern and we cannot predict the next number we can call the sequence random. Consider first the sequence (1,1,1,1,1,1). It seems that this sequence has a pattern – it is just the number one repeated over and over again. If this pattern holds, we would expect that the next number will be the number one again. If this is the case, the sequence is not random. We cannot be sure that this pattern will continue, though – it might still be a random sequence that has accidentally hit on the number one six times. Even so we seem justified in supposing that this sequence displays a pattern that will continue and that is not random. A sign that a sequence is random might be that, assuming each term is drawn from a finite set of possibilities, each term appears, in the long run, with the same frequency as every other term. Suppose we throw a coin repeatedly – we would expect the sequence, in the long run, to contain as many heads as tails because of the randomness involved. However consider the sequence (1,0,1,0,1,0,1,0). Even though the frequencies of ones is the same as that of twos, there is still a pattern – the sequence seems to be alternating between one and two and we seem justified in predicting that the next term will be one. Again though we cannot be sure. Consider now the following sequence: (3,1,4,1,5,9,2). This sequence looks random but in fact it isn't – I have just picked the first seven digits of the number pi. Now that you know this you can pick the next number with certainty. It will be six. Finally consider the sequence: (2,4,3,6,5,10,9,18). Again the pattern is harder to spot but it is there. Each even term is double the previous term and each odd term, apart from the first, is the previous term minus one. If this pattern holds, we should expect the next term to be 17.

If a sequence is non-random it has a pattern that continues indefinitely – if no pattern at all can be discovered then the sequence is random. The problem is that, for any finite sequence, a pattern can always be discovered. This can be proved quite easily. Suppose we have a finite sequence of n terms – we could always suppose that this sequence of n terms will repeat indefinitely. We might have, for instance, the sequence (1,11, 23,7, 28) and then suppose that the next five terms will be (1,11, 23, 7, 28) again and so on indefinitely. However we still cannot be sure that it will persist. Our issue is a special case of Hume's Problem of Induction. It seems that we can only be sure that a sequence is random or not if the sequence is infinite. If some pattern holds indefinitely, then we can say that it is non-random; if we cannot find any pattern at all in such an infinite sequence, then it is random. The difficulty for mathematicians interested in probability is that in order to decide with certainty whether a sequence is random or not, we would somehow need to evaluate an infinite sequence of terms and test this sequence against all the possible patterns that might produce some or all of the sequence, a number of patterns that may also be infinite. The problem of deciding whether a sequence is random or not cannot be solved. But it can be ameliorated by Bayes' Theorem, the topic I want to turn to next.

Thomas Bayes, a Presbyterian minister who lived during the eighteenth century, was motivated by a fundamental issue in the philosophy of probability, the Inverse Problem. Up until Bayes, philosophers always moved from hypotheses concerning probabilities to frequencies; the task Bayes set himself was to show how we can move from frequencies to probabilities. Consider coin flips. Before Bayes, people would start by assuming that the probability of getting heads is 1/2 and then go on to ask: given some set of actual outcomes, say (H, H, T, H), what is the probability of these outcomes occurring? The probability here, because we assume the ordering does not matter, is 1/4. The project that Bayes set himself was to work out how we can start with the outcomes and then proceed to the probability. Bayes's Theorem is that the probability of a hypothesis given some observed evidence (known as the posterior probability) equals the probability of the evidence given the hypothesis is true multiplied by the probability of the hypothesis (known as the prior) divided by the probability of the observed evidence. Algebraically this is

P(H | E) = P(E | H) P(H) / P (E)

P(H | E) and P(E | H) are both conditional probabilities. One reason Bayes' Theorem has proved so popular among philosophers is that it seems to capture the way they think we ought to reason about the world. It is supposed that we constantly form hypotheses about the world to which we assign probabilities or credences; then on the basis of evidence we either increase or decrease our credence in these hypotheses. It seems a rational model for how we should adjust the strength of our beliefs when faced with confirming or disconfirming evidence. Ten Great Ideas About Chance is not as clear when it comes to Bayes' theorem as it could be and the treatments I have found on Bayes' Theorem on Youtube and elsewhere do not seem to me to capture the most important insight implied by the Theorem. Educators on Youtube often use medical trials as examples and focus on the notions of specificity, sensitivity and prevalence. Specificity is P(~E | ~ H), sensitivity is P(E | H), and prevalence is P(E); given these three we can work out P(H) and P(H | E). What I want to do though is approach the Theorem from a different angle then these online educators and the way I shall do so is by devising a kind of thought experiment, by making up a little story.

Suppose Jones is visiting his friend Smith, a farmer in Central Otago. Jones knows for sure that all farms in Otago contain either sheep or cows or some mixture of both. Jones does not know for sure the composition of Smith's farm but he hypothesises that all the livestock on Smith's farm are sheep. He assigns to this hypothesis a probability between 0 and 1 – it actually doesn't matter what this probability, the prior, is. Then when he arrives at Smith's farm, the first animal Jones sees is a cow. Obviously this immediately disproves his hypothesis – if one of Smith's animals is a cow then obviously not all of Smith's livestock are sheep. Let's now interpret Jones's thinking from a Bayesian perspective. Because Jones has seen a cow the probability of the evidence given the hypothesis, P(E | H), is 0. That is, if the hypothesis were correct we would expect it to be impossible for Jones to see a cow. In this situation the value of the prior, P(H), is irrelevant, as is the probability of the evidence P(E) (assuming that P(E) is greater than 0 which it must be if Jones has seen a cow): 0 multiplied by anything is still 0. Having seen a cow, Jones must alter his credence in his hypothesis from whatever it was before he visited the farm to 0.

Now suppose that when Jones visits the farm, the first animal he sees is a sheep. Then he sees another sheep and then another. To make the math simpler, we shall not assume that every sheep he sees is a different animal – it might be, although this is unlikely, that he is seeing the same sheep over and over again. It seems nevertheless that every time he sees a sheep this should increase Jones's confidence that his hypothesis is correct and to see if this is so we can put numbers into Bayes' Theorem. The probability of the evidence given the hypothesis, P(E | H), is 1. To put it another way, if Jones's hypothesis is correct, then it is certain that when he sees an animal on the farm, it will be a sheep. This means that we can simplify the formula: it becomes P(H | E) = P(H) / P(E). The first thing to note about this formula is that if P(E) is less than one then P(H | E) must always be greater than P(H). That is, every time time he sees a sheep, assuming he never sees a cow, he can have greater confidence in his hypothesis then before that every animal on the farm is a sheep. The posterior possibility will always be greater than the prior possibility. The second thing to note is that, because P(H | E) can never exceed one, P(E) can never be less than P(H).

When I was thinking about this over the last week or two, the fact that P(H) must always be less than (or equal to) P(E) initially confused me. One important idea behind Bayes Theorem is that P(H) can initially take any value we want: it is arbitrary, subjective. It is only after more and more evidence is gathered that the posterior possibilities will tend to converge on the correct number. Surely, though, if P(H) can take any value we want then we could always pick a value for it greater than P(E). We could suppose, as an alternative hypothesis, that there are the same number of cows as sheep and that therefore P(E) = 1/2; why could we not then choose a value for P(H) that is greater? Then I thought it through. P(E) is determined after P(H) is determined. P(H) can never be greater than P(E) because when Jones sees the sheep it could be that his hypothesis is correct or it could be that some other hypothesis is correct or it could be that whether he sees a cow or a sheep is completely random. (This last option is actually another hypothesis; I will clarify what I mean by this in a moment.) The probability that he sees a sheep must always be greater than the probability that his hypothesis is correct because his sheep-sighting could always be explained by alternative hypotheses instead of his main hypothesis. We can formalise this insight in the following way: the probability of the evidence must equal the probability of the evidence given that the hypothesis is correct multiplied by the probability of the hypothesis plus the the probability of the evidence given that the hypothesis is incorrect multiplied by the probability that the hypothesis is incorrect. Schematically,

P(E) = P(E | H)P(H) + P(E | ~H)P(~H) .

(The symbol, ~ ,called a tilde, just means "not".) In the particular situation we are imagining P(E | H) equals one, so we can simplify this equation to P(E) = P(H) + P(E | ~H)P(~H). We can simplify this equation still further: P(H) + P(~H) = 1 so P(~H) = 1 – P(H) and so P(E) = P(H) + P(E | ~H)(1 – P(H)).

However we now face another puzzle, a puzzle less easy to clear up. Bayes is attempting to solve the Inverse Problem, that is, how we can make the move from frequencies to probabilities. But his solution still involves sometimes going in the traditional direction, from probabilities to frequencies – we arrive at a value for P(E | H) by considering the hypothesis alone, a hypothesis that involves a fixed probabilistic distribution. In the case we are considering, the probability of P(E | H) is 1. How though can we evaluate P(E | ~H)? There is supposedly no specific hypothesis involved here, only the absence of a specific hypothesis. However when calculating P(E | ~H) we still need an alternative candidate hypothesis in order to find a value for it. This hypothesis could be that Smith owns 3 sheep and 47 cows, or 35 sheep and 83 cows, or 13 sheep and 7 cows, or any other distribution. In these cases, assuming (paradoxically) that one of these distributions is conceived by Jones as being a viable alternative candidate hypotheses, P(E | ~H)should equal 3/50, 35/118 or 13/20 respectively. Perhaps Jones should assume that, if his main hypothesis wrong, the distribution is completely random. But then we are forced to say what we mean by 'random'. We could define the notion of randomness in this case by supposing that the Principle of Indifference applies and that, because there are two options, cows or sheep, they are both equally likely. In this case, P(E | ~H) should be taken to be 1/2. Alternatively we could imagine that before he visited the farm, Jones knew that 35 percent of all farm animals in Otago are sheep and the other 65 percent are cows, in which case P(E | ~H) could be taken to be approximately 7/20. Perhaps such background knowledge should be taken into account when working out P(E | ~H). But it seems that Jones is not now comparing his hypothesis with its absence so much as comparing his hypothesis with a specific alternative hypothesis.

Despite this puzzle, in the case we are considering Bayes' Theorem is still actually very successful. This is because it turns out that we can assign any value we want to P(E | ~H) so long as it is between 0 and 1. It doesn't matter what the probability of P(E | ~H) is. Whatever probability we assign to P(E | ~H), the more sheep Jones sees without ever seeing a cow the more confident he can be that his hypothesis is correct, will approach certainty. Earlier I said that the probability Jones associates with his hypothesis before he visits the farm, P(H), can take on any value; when you think through the math, you realise that P(E | ~ H) can also take on any value. After repeated observations of sheep without ever seeing any cows, the posterior probability calculated every time will still always tend towards unity after multiple observations. Imagine that before he visits the farm, the credence Jones puts in his hypothesis that Smith only owns sheep is 1/10 and that the probability of seeing a sheep when his hypothesis is incorrect is 1/2. The first time he sees a sheep, his credence will change to be (1/10)/(1/10 + 9/20) or 2/11. This posterior becomes the prior when he next sees a sheep: the calculation then becomes (2/11)(2/11 +9/22) or 4/13. The next time he sees a sheep he should estimate his certainty as being (4/13)(4/13 + 9/26) or 8/17. Thus, after seeing three sheep, his confidence in his hypothesis has increased from 1/10 to nearly a half and it will continue to improve the more sheep he sees. More generally, if Jones, before visiting the farm, has estimated the probability of his hypothesis being true as being a and the probability that he sees a sheep when his hypothesis is incorrect as being b then the probability he should assign to his hypothesis after seeing n sheep without ever seeing a cow is a/(a + x(1-a)) where x is b to the power of n. And because b is between one and zero, x will tend towards zero as n tends towards infinity and the probability of Jones's hypothesis being correct should tend towards 1.

So Bayes' Theorem works in this case, in the case where P(E | H) = 1. However the problem I discussed above has only been resolved for this one case. What about other cases? We can rearrange Bayes' Theorem in the following way:

P(H | E) / P(H) = P(E | H) / (P(E | H)P(H) + P(E | ~H)(1 - P(H)))

If the evidence supports the hypothesis, the left side of the expression, and consequently the right side as well, will be greater than one; if the evidence militates against the hypothesis, both sides will be less than one. It just takes a little more rearranging to work out that if P(E | H) > P(E |~H), that is, if the probability of the evidence assuming the hypothesis is true is greater than the probability of the evidence assuming the hypothesis is false, then if presented with the evidence one's credence in the hypothesis should increase. If P(E | H) < P(E | ~H) then, when observing the evidence, one's confidence in the hypothesis should decrease. But the problem we discussed earlier reemerges: although we can work out P(E | H) we cannot work out P (E | ~ H). I'll give a concrete example. Suppose Jones's initial hypothesis is not, "Every animal on Smith's farm is a sheep" but rather "23 percent of the animals on Smith's farm are sheep". This hypothesis enables us to calculate that, if Jones sees a sheep, P(E | H) will be 23/100. But how can we calculate P(E | ~H)? We could assume that if the hypothesis is not true then 10 percent of the animals are sheep, in which case P(E | ~ H) will be 1/10. This would mean that if Jones sees a sheep his hypothesis has been strengthened. Or we could assume that if his hypothesis is not true then 75 percent of the animals are sheep, in which case P(E | ~H) equals 3/4; seeing a sheep should thus make the hypothesis weaker. Perhaps Jones should assume that if his hypothesis is wrong, sheep and cows are equally likely, that this is what we mean when we say that the distribution is random: in this case P(E | ~H) equals 1/2 and his seeing a sheep will weaken Jones's credence in his hypothesis. But there is no way of working out what value for P(E | ~H) we should choose. When carrying out statistical research scientists often talk about the null hypothesis. The null hypothesis involves assuming that there is no causal relationship between two sets of data, that any apparent correlation is the result of coincidence, chance. The null hypothesis is close to but not quite the same as P(E | ~H) – the former is supposedly a theoretical construct involving no hypothesis at all while the latter can be seen the probability of the evidence given any hypothesis other than Jones's, including the 'random' hypothesis, being true. It may be, as I intimated a little earlier, that we should estimate P(E | ~H) based on background assumptions but, and I know I'm repeating myself, this would mean that Bayes' Theorem should really being interpreted as concerned with evaluating one hypothesis against another rather than against an alternative in which there is no hypothesis at all.

Despite these puzzles and problems, Bayes's Theorem is a useful way of considering the number sequences discussed earlier. Suppose we are presented with the first ten numbers of a longer sequence. We can of course always find at least one pattern – we could take this pattern to be our hypothesis concerning the whole sequence. The probability we assign to this prior, P(H), is subjective, arbitrary, but we could perhaps reasonably assign larger probabilities to simpler patterns. We are then presented with the next number in the sequence. If this number breaks the pattern, then P(E | H) equals zero and we have falsified the hypothesis. If this next number conforms to the pattern, P(E | H) equals 1 and we can work out that P(H | E) = P(H) / (P(H) + P(E | ~H)P(~H)) . Although, as I've pointed out, it is difficult to determine what value we should assign to P(E | ~H), whatever value we give it, so long as it is less than one, will lead us, given this next observation, to strengthen our confidence that the pattern has continued and will continue. This is exactly analogous to my story about Jones's visit to Smith's farm. Consider the sequence (1,0,1,0,1,0,1,0). We could form the hypothesis that the sequence will alternate between 1 and 0 forever and assign to this hypothesis, arbitrarily, the probability of 0.1. If the next number is 0 again the hypothesis has been falsified. Suppose instead that the next number is 1. Let us suppose that we also know that this sequence consists entirely of ones and zeros in some order – we can then reasonably make the assumption that it might be random and that P(E/~H) could be taken to be 0.5. Then the posterior, P(H | E) equals 0.1/(0.1 + 0.5(1-0.1) or 2/11. With each successive term that conforms the pattern, we can increase our confidence that we are right, that this pattern exists and will continue.

This method of finding the probabilities involved in a number sequences has interesting implications. If we find one number that breaks the pattern our hypothesis has been falsified but we can imagine a situation in which we run through hundreds or thousands of numbers before the black swan appears. We can never be absolutely certain that the pattern will continue although (depending on P(E | ~H)) if a sequence persists for hundred or thousands of terms we might place very high confidence in the expectation that it will continue forever. However it is even more difficult to decided if a sequence is random – in order to decide if a finite sequence is random we would need to test it against every single candidate hypotheses, every single potential pattern, a number that may be extremely large if not infinite. And of course, as I argued above, for any n terms, no matter how large, there is always at least one potential pattern.

This brings me back to the issue of randomness. Science does not always proceed statistically but when it does, for instance in the research carried out today by university psychology departments, it requires the null hypothesis, the hypothesis that some set of data points might be wholly random. For a Philosophy course I took last year I wrote an essay concerning Jacob Stegenga's book Medical Nihilism in which he attempts to apply Bayesian reasoning to medical science with the aim of convincing people that they generally put more faith in medicine than they ought. In the essay I focussed on the hypothesis widely believed by ordinary people and mental health practitioners generally that antipsychotic medication successfully treats the symptoms of this supposed disease 'schizophrenia' that some people are supposedly born with, that in the absence of antipsychotic medication psychotic symptoms will reappear. As Stegenga does when attacking medical science more generally, I tried to use Bayes' theorem to show that people have higher confidence in this hypothesis than they should. The arguments I made in that essay were interesting even though I am not sure I fully understood Bayes' Theorem when I wrote it – but I am not going to recycle them here except to say that there is a problem with the null hypothesis. The issue of randomness also bears on the subject of the first part of this essay: evolutionary biology. Evolutionary biology relies on the concept of randomness because we cannot have natural selection without also supposing that random genetic mutations occur for nature to then select. My feeling is that randomness is not a real thing. My feeling is that the science that has built up around the notion of 'schizophrenia' malignly affects people labelled with this condition, and that supposedly random genetic mutations are not random all. I believe that the supposed randomness associated with quantum mechanics is also not random. These are extraordinary claims which would require extraordinary evidence. But to successfully rebut these claims, philosophers and scientists need to be clearer about what they mean by 'randomness'.

I'll finish this essay by making a broad generalisation about many philosophers today. The whole enterprise of philosophy is based around the notion of 'rationality'. Philosophers in the anglophone tradition value rationality highly and often believe that rationality is the highest virtue. But what does it mean to be 'rational'? I suspect that many philosophers today think that rationality should involve Decision Theory and Bayesian reasoning. When faced with a choice, a person should associate utilities and probabilities with all the options and choose the option which maximises Expected Utility. When working out probabilities, rational agents should apply Bayes' Theorem – that is, they should adjust the strengths of their beliefs after considering all the evidence presented to them. My problem with philosophers who embrace this position is that I simply do not believe that people travel through the world working out probabilities and utilities and adjusting them when encountering either confirming or disconfirming evidence in this way. It would involve assuming that they could somehow numerically determine the Utilities and Probabilities associated with particular events and then then subject them to complicated mental arithmetic. Real people in the real world do not do this. Suppose I am considering seeing the new film Deadpool & Wolverine. Philosophers of this sort will say that what I should do is assign, say, three units of positive utility to the prospect of seeing it and enjoying it, multiply this number by the probability that it will be enjoyable, say 70%, and then subtract from this product the Expected Utility of it being bad (say, four units) multiplied by the probability that is bad which we are supposing to be 30%. In this case, the net Expected Utility is positive and so I should go see it. Afterwards I can revise my hypothesis concerning whether it was good or not, retroactively, in a Bayesian way, having been presented with the evidence, my feelings about the film. But how can I possibly arrive at all these numbers? I do not believe people travel through life consciously assigning utilities and probabilities to options and basing their decisions on such calculations. Either people are rational but these calculations are unconscious, or people are not 'rational' at all (supposing we are to insist that the term 'rational' must mean the consistent application of Decision Theory and Bayesian reasoning). This position may seem nihilistic but I wouldn't have written this essay if I didn't believe in rationality at all. It is simply that I believe that if we are to be truly rational, it may lead us to conclusions that may seem extraordinary. And one conclusion is that we should reject the idea that Decision Theory and Bayesian reasoning are the only rational theories we should adopt to guide our actions.

silverfish

Friday, 2 August 2024

Concerning Evolution Again; Concerning Bayes' Theorem

No comments:

Post a Comment