The Mathematical Murder of Innocence Page 12
   There was extensive media coverage of the first landmark trial. The media originally condemned the defendant, and this can only have negatively influenced the jury. A juror was seen coming into the courtroom on the morning after the Professor’s testimony, holding a leading national paper who’s frontpage headline was the ‘one in 73 million’ statistic. The defendant was later found guilty and received a life sentence. Even her first appeal failed before she was later released, after three years in prison, on her second appeal, following another media storm, this time belatedly in her favour. Both the original trial and the first appeal were considered, afterwards, as travesties of justice. There never was a bright young juror to save the day.
   It was only after the original trial that the ‘one in 73 million’ statistic received criticism from professional statisticians. The Royal Statistical Society issued a press release stating that the figure had indeed ‘no statistical basis’ and was ‘one example of a medical expert making a serious statistical error’. The Society’s president later wrote an open letter of complaint to the Lord Chancellor about these concerns.
   The statistical criticisms were threefold: first, the ‘prosecutor’s fallacy’, where the unlikely probability of sequential natural deaths was favoured to ‘prove guilt’ to the detriment of testing the other hypothesis, which is murder, which is deemed even more unlikely; secondly, the ‘ecological fallacy’ where Professor A assumed the cot death probability within any single family was the same as the aggregate probability for all cot deaths, without taking into account conditions specific to individual families (such as the notion of a S.I.D.S. gene); this latter error was compounded with the notion of assuming ‘statistical independence’ between events.
   The perils of allowing non-statisticians to present unsound statistical arguments were expressed in an editorial in the British Medical Journal, pointing out that ‘defendants deserve the same protection as patients’.
   Furthermore, later research by the defendant’s husband into hospital records found that the pathologist had withheld results from blood tests that showed serious bacteriological infection.
   For the other, parallel, landmark trial of double cot deaths, several real-life prosecution arguments have also been integrated into this story: the suspicious telephone call to the husband, Professor A’s diagnosis of the mother suffering from F.D.I.A., and his rejection of a genetic explanation stating that there was no family history of cot deaths. Only after the trial did more detailed research into family ancestors show several cases of unexplained infant deaths…
   All this led to the different appeals where at last both defendants were acquitted.
   In the aftermath, several other convictions for unlawful infant death where Professor A had testified were also overturned on appeal.
   However, during the three years she served in prison, the mother from the first trial was separated from her third baby (who fortunately had no problems). She suffered abuse from other prisoners because, it was assumed, she was a child killer, on top of being the daughter of a policeman. These combined traumas led the mother to die, unintentionally, from alcohol poisoning a few years later. She never did recover from the ordeal.
   Professor A eventually apologised for all his misleading evidence at these various trials. The solicitor general barred him from any further court appearances.
   Professor A was found guilty of ‘serious professional misconduct’ by the General Medical Council and was struck off the medical register. However, and somewhat unbelievably, the Society of Expert Witnesses commented that the severity of this punishment would cause many professionals to reconsider whether to stand as expert witnesses. Professor A later won a High Court appeal against the G.M.C. decision and was reinstated.
   Later on, advice was given to all prosecutors to never allow a conviction only on the evidence of an ‘expert witness’.
   A ruling from the Supreme Court of Queensland was later adopted into English law, that effectively banned the use of F.D.I.A. as an identifiable disease. It could only be used as a description of a range behaviours: ‘A label used to describe a behaviour is not helpful in determining guilt and is prejudicial.’
   Professor A’s ex-wife accused him of ‘seeing mothers with F.D.I.A. symptoms wherever he looked’.
   Acknowledgements
   I have used public documents to understand the statistical and other errors made by the prosecution in the original landmark trials; but any similarities stop there, since this is a work of fiction. The families of those originally concerned have suffered enough, and so have not been involved in this work.
   Meanwhile, Nassim Taleb’s books are very real, and provide probably the best understanding possible of randomness and uncertain events, as applied to life, finance and even philosophy.
   I, like my fictional narrator, also once studied Ocean Engineering with all the mathematical statistical analysis of random waves, so I particularly enjoyed Taleb’s books. And I would even thoroughly recommend these books for the Judge Braithwaites of this world (who probably did – just – pass his maths O-level).
   I borrowed unashamedly many of Taleb’s useful concepts to illustrate the use and misuse of statistics, in particular:
   •A lottery winner being accused of cheating due to the improbability of any individual winning.
   •Whether you would cross a river that is four feet deep on average.
   •Wrongly assuming that random results means random causes.
   •The ‘silent evidence’ bias: people who have died (in infancy in this case) are no longer around to tell you about it.
   •Wrongly confusing ‘no evidence of… (disease, or cot death gene)’ with ‘evidence of no… (disease, or cot death gene)’.
   •Why there is no need to panic if a woman tests positive for a mammography.
   Michael Carter
   February 2020
   The following technical appendices are for those who wish to understand the mathematics behind some of the probability concepts discussed in this book, in particular the ‘prosecutor’s fallacy’.
   Appendix 1
   Bayes’ Theorem
   Bayes’ theorem is:
   P(A | B) = P(B | A) P(A) / P(B)
   where A and B are events, and P(B) ≠ 0.
   P(A) and P(B) are the probabilities of A and B happening independently of each other.
   P( A | B ) is a conditional probability: the likelihood of A occurring given that B is true.
   P( B | A ) is also a conditional probability: the likelihood of B occurring given that A is true.
   How to derive Bayes’ theorem from first principals:
   P(A | B) = P(A ᴖ B) / P(B), if P(B) ≠ 0
   P(B | A) = P(B ᴖ A) / P(A), if P(A) ≠ 0
   where P(A ᴖ B) is the joint probability of A and B being true.
   Since P(B ᴖ A) = P(A ᴖB)
   → P(A ᴖ B) = P(A | B) P(B) = P(B | A) P(A)
   → P(A | B) = P(B | A) P(A) / P(B), if P(B) ≠ 0
   Appendix 2
   Application of Bayes’ theorem to compare two cot deaths with two murders, and comparison with the ‘prosecutor’s fallacy’.
   Reminder:
   P(A | B) = P(B | A) P(A) / P(B)
   Let us call event A ‘two cot deaths’, represented by 2CD.
   And call event B ‘two deaths’, represented by 2D.
   If event 2M is ‘two murders’.
   Let us now simplify and say that it is either two cot deaths or two murders, then:
   P(2D) = P(2CD) + P(2M)
   Then since P(2D | 2CD) = 1 as explained above
           P( 2CD | 2D )
     = P(2CD) / [P(2CD) + P(2M)]
      = 1 / [1 + P(2M)/P(2CD)]
   The prosecutor’s fallacy is to say that ‘the probability of two cot deaths bein
g the cause of the two deaths that have already happened P(2CD | 2D ) is equal to the probability of two cot deaths independently P(2CD)’, which this equation shows not to be true.
   The real probability is a function of the ratio of the probability of two murders to the probability of two cot deaths:
   1 / [ 1 + P(2M)/P(2CD)]
   So, we see that it is not the rarity of the probability of two cot deaths that matters P(2CD), this value alone is irrelevant. Once a rare event of two deaths has actually happened, what counts is the ratio of the probability of the two hypotheses, both rare events.
   Different examples
   If the probability of two murders were to approach one, that is certainty, (due to other overwhelming evidence to show it was murder), only then can the prosecutor make this shortcut to stop the defendant claiming it was accidental:
           Only if P(2M)
      ̴ 1, then:
     P( 2CD | 2D )
     ̴ 1 / [1 + 1/P(2CD)]
      ̴ P(2CD) / [P(2CD) + 1]
      ̴ P(2CD) since P(2CD) <<1
   And since P(2CD) can be quite a low value, the probability of innocence is in this case also low.
   Let’s explore some other simple cases:
   If, for example, the probability of two cot deaths was very much larger than the probability of two murders:
   P(2CD) >> P(2M)
   So, the probability of the two deaths being due to two cot deaths is:
   P( 2CD | 2D ) ̴ 1 / 1
   or almost 100%, i.e. it was certainly accidental, not murder, which makes sense.
   If, for example, there was an equal probability of two cot deaths or two murders, then:
   P(2CD) = P(2M)
   So, the probability of the two deaths being due to two cot deaths is:
           P(2CD | 2D)
     = 1 / [1 + 1]
       = ½
   or 50%, which also makes sense. But such a high probability of innocence should never lead to a conviction in a court of law.
   If, for example, the probability of two accidental cot deaths was three times more likely than the probability of two murders:
   P(2CD) = 3 P(2M)
   So, the probability of the two deaths being due to two accidental cot deaths is:
           P(2CD | 2D)
      = 1 / [1 + 1/3]
       = 0.75, or 75% probability of innocence.
   Possible scenario for the Richardson family
   Let us go back to the Richardson trial in this book. Goodwin himself suggested that one murder was less likely than one cot death. But he also insisted that if the mother suffers from F.D.I.A., then both murders are more likely; similarly, Fielding insists that if there is a cot death gene, both cot deaths are more likely (statistical dependence in both cases).
   So, for the sake of argument:
   •Let us assume the average probability for a cot death is one in 8,500, while the average probability for infant murder is half (it is doubtless less than this), one in 17,000.
   •Let us assume Mrs Richardson has a cot death gene that multiplies the likelihood of cot death by 100, so each cot death has one in 85 chance. So, a double cot death has a one in 7,225 chance.
   •To please Professor Goodwin, let us assume Mrs Richardson also suffers from F.D.I.A. that multiplies her risk of murdering her child by the same factor of 100, so each murder now has a one in 170 chance. So, a double murder has a one in 28,900 chance.
   In which case the probability of double cot death is 28,900 / 7,225 = 4 times that of double murder (which makes sense, we assumed double the odds for cot death, which with two events becomes 2² = 4 times more likely).
   Putting P(2M)/P(2CD) = 1/4 in our above equation gives a probability of the two deaths being due to two accidental cot deaths of:
           P(2CD | 2D)
     = 1 / [1 +1/4]
      = 0.8 or 80% chance of innocence.
   This sort of a ratio would totally exonerate Mrs Richardson in any court of law.
   Impact of other evidence to suggest murder
   Now let us analyse a scenario where there is other evidence to suggest Mrs Richardson had indeed committed murder. If we really thought that there was, say, a 50% chance that Mrs Richardson had killed her son or sons, and that this evidence was specific to her, perhaps due to some clearly identified motive, or perhaps due to the fact that she boasted about it afterwards, in this case some of Professor’s Goodwin’s last minute comments would be valid.
   Even if we have the ‘best case’ (for Mrs Richardson’s) gene situation, where the probability of cot death is 1000 times higher than the average, P(2CD) = 1/72.25 = 0.01384.
   But the stand-alone probability of murder P(2M) has shot right up to 0.5, which is much more than P(2CD), so this will have an important impact on the calculation. The plausible ‘alternative murder hypothesis’ comes into play:
   So, the probability of innocence:
           P(2CD | 2D)
      = 1 / [1 + P(2M)/P(2CD)]
       = 1 / [1 + 0.5 / 0.01384]
       = 0.02694, or one out of 37.
   This would not look good for Mrs Richardson and would quite possibly get her convicted.
   But remember, here we are approaching the evidence from a completely different angle. The other evidence strongly suggests foul play, specific to Mrs Richardson. This is when we look into whether the statistics can provide some proof to exonerate her, or not, with the existence of a natural death hypothesis. In this case, not really.
   However, in the scenario of this book, as in the true story, it seems legitimate to say the marks around the mouth could have happened to any family finding itself in a similar situation, thus it is not specific to Mrs Richardson (as Fielding points out to Goodwin near the end of the trial), hence you cannot use the 50% probability, or some other percentage, as the P(2M) in the above equations.
   Appendix 3
   Applying Bayes’ theorem to mammograph testing for breast cancer
   P(A | B) = P(B | A) P(A) / P(B)
   Let us call event A ‘has cancer’, represented by C.
   And let us call event B ‘positive mammography result’, represented by M+.
   If we use USA statistics:
   •Every year 38 million women are tested for breast cancer.
   •Of these 140,000 have cancer.
   •Let’s assume the probability of a false positive for a test is 10% (i.e. out of a hundred healthy women without cancer that are tested, the mammography will give an erroneous positive result for ten of them).
   So, the probability of getting a positive mammography result:
           P(M+)
     = probability of detecting a cancer + probability of false positive
      = [140,000 + 10% x (38 million – 140,000)] / 38 million
      = 0.1033
           P(C)
     = probability of cancer in general population
      = 140,000 / 38 million
      = 0.003684, or 0.3684%
   P(M+ | C) = Probability of the mammography giving a positive result if there is a cancer. Assume this = 1 (i.e. no false negatives)
   So, the probability of cancer once a mammography gives a positive result:
          
 P(C | M+)
     = P(M+ | C) x P(C) / P(M+)
       = 1 x 0.003684 / 0.1033
       = 0.03566, or 3.566%
   So due to the false positives, a positive mammography test still only indicates 3.6% chance of cancer. The mammography has served to eliminate 90% of the healthy women. So, a positive result increases the probability of cancer almost ten times from 0.37% for the general population to 3.6%.
   Appendix 4
   Applying Bayes’ theorem to the ‘defence attorney’s fallacy’ in the O.J. Simpson trial