We don’t normally think of induction and statistics as a part of critical thinking courses, but I think we should. Logic doesn’t end with deduction, after all, and there are few other instances in a college curriculum where students are asked to think carefully about how they ought to evaluate evidence, rather than being asked to apply a particular evidentiary standard given by a discipline. What’s more, most of the benefits of numeracy are actually tied to the ability to evaluate formulaic and statistical reasoning, rather than do sums in your head. So in my course, we do a section on evidence and explanation. Here is my effort to explain Bayes’ Theorem to my students.
Strange things happen, and we shouldn’t always update our beliefs on the basis of a weird fluke. Bayes theorem is a way to decide how new evidence should lead us to change our beliefs. Put another way, it’s a way to calculate the likelihood that some piece of data is evidence of a conclusion, considering the possibilities of false positives, misleading evidence, and statistically improbable events.
It looks scary, but bear with me:
Pr(h) × Pr(e|h)
[Pr(h) × Pr(e|h)] + [Pr(~h) × Pr(e|~h)]
Here’s what it means: what is the probability of some hypothesis, h, given some piece of evidence, e? Since the evidence might be misleading, we have to consider both the possibility that the evidence is there because the hypothesis is true, and the possibility that the evidence is there even though the hypothesis is false.
Consider a simple example. What is the probability that someone is guilty given that their fingerprints were found at the scene of the crime?
Well, we start with the basics. Either the she’s innocent or she’s guilty, right? We might write that this way:
Guilt + Innocent
Why are we putting it like that? By analogy, the likelihood of flipping a coin and getting heads is:
Heads + Tails
But so far, we don’t have any information about how likely those things are. What we’re looking for is a way to use evidence to tip the scales from the 50/50 of the coin flip. We’re looking for the probability of someone’s guilt, if their fingerprints were discovered. In terms of probabilities, the left side of the equation would look like this:
Pr(Guilt if Fingerprints Discovered)
That’s Pr(h|e). The right side of the equation will look like this:
Pr(Guilt) × Pr(Fingerprints Discovered if Guilty)
[Pr(Guilt) × Pr(Fingerprints Discovered if Guilty)] + [Pr(Innocent) × Pr(Fingerprints Discovered if Innocent)]
So what does that mean? In ordinary English, we’re saying that that probability that the person is guilty is equal to the probability that the fingerprints are evidence of her guilt divided by the disjunction of all the possibilities: in this case, her fingerprints were found but she is either guilty or innocent.
The only additional issue is that we have to consider the possibilities of false positives and false negatives. So, what is the probability of guilt, given the evidence? What is the probability of innocence, given the evidence? These accuracy indicators modify the various elements.
But what if there are more than two possibilities? Another example is the likelihood that a high fever is evidence of the flu:
Fever Evidence of Flu
Fever Evidence of Flu + Fever Evidence of Food Poisoning + Fever Evidence of Broken Thermometer + Etc.
An Alternative to the Algebraic Formulation of Bayes’ Theorem
The same information can be developed as a table:
||Hypothesis is True
||Hypothesis is False
So out of the total population, how often is the hypothesis true, and how often is it false? Using incidence data, we can make a general statistical claim about the population and then drill down to discover how often the evidence will be present.
For the accused criminal whose fingerprints were found at the scene of the crime, we’d draw the table like this:
||Guilt and Fingerprints
||Innocent but Fingerprints
|No Fingerprints Found
||Guilt but No Fingerprints
||Innocent, No Fingerprints
Then we’d fill in the total “guilt” and “innocent” figures from crime statistics. Since we’re using historical data, there is always the possibility that something has changed or that the data will be biased in some important way. Still, this can give us a good first approximation. Adding the charts allows us to take advantage of the much more powerful frequentist intuitions for understanding probabilities.
By the way, this post uses an application of Bayes’ Theorem from Sinnott-Armstrong and Fogelin’s Understanding Arguments, where they use it to evaluate the case of Sally Clark. Using this technique to demonstrate the injustice of her case blew my students’ minds. It’s that good. I haven’t yet told them about this case, though!