Why tests don’t tell the whole story? A layman’s guide to tests and what they tell us.

Over the past two years or so, I like so many others have been bombarded with statistics and mathematics: growth rates, logarithms, death rates, geographical breakdowns…

Let me just be specific that I know a little on statistics from my days studying A-Level Maths with Statistics, but that I also have an unhealthy interest in several stereotypically ‘nerdy’ subjects and that everything in this blog post is in no way shape or form comprises that of proper statistician. Now, we have that sensitive issue behind us, let’s talk tests!

Chapter 1: Let’s Define the Story

Every story needs a strong protagonist, unfortunately I couldn’t afford one and didn’t have the time nor effort to create one, so I introduce Alex.

Alex wants to know whether they have contracted a completely random disease we shall call this disease, Disease X.

Disease X has a prevalence of around 5% (1 in 20) in the general population.

Chapter 2: That doesn’t seem too accurate.

The first thing that Alex probably thinks about if they are trying to figure out whether they have this disease is to take a test.

Luckily for them, Scam Industries have just released a test with 95% accuracy!

Alex is astounded and takes the test and is relieved when it comes back negative.

Should Alex be trustful of this result?

My initial thoughts to this question were “of course it’s 95% accurate”, it’s more likely to be right than wrong, and I kind of feel that intuitively this make sense.

Figure 1 shows which individuals in the population were sick (in red).

But the test in question always outputs negative regardless!

Therein lies the problem, Accuracy only considers the output of the test but has no relationship to the population as a whole; the test only achieved 95% accuracy because that was the proportion of people who just so happened to agree with this completely useless test.

Chapter 3: Specifically Sensitive

In the previous section we saw that ‘accuracy’ alone is not very helpful, what we really need to do to make sense of this is to start to think about the potential outcomes of a test.

Disease Status	Test Result	Outcome
Does not have Disease X	Negative	True negative
Does not have Disease X	Positive	False Positive
Has Disease X	Negative	False Negative
Has Disease X	Positive	True Positive

We can see from the above table that there are four possible outcomes that we can use to assess how well a test is working, and this is where the naïve accuracy approach has failed.

To combat these problems two more measures of accuracy were created:

Sensitivity
- How well can the test tell if someone has the disease?
Specificity
- How well can the test tell if someone is healthy?

Referring to our first example the test always correctly identifies healthy subjects (that is it never tells a healthy person they are ill) but it’s terrible at identifying ill subjects (it’s incapable of identifying illness).

Chapter 4: Some worked examples and bringing it all together

We have seen a simple example of how thinking about things a little differently has given us two measures which can better communicate the characteristics of a test. The big question is how much information a test result conveys.

This is often something I’ve thought about in these weird few years. If I receive a negative test how likely am I to really be negative?

Examples comparing two tests

Disease X prevalence in population 5%

Test A Accuracy: Sensitivity 95%, Specificity: 75%

Test B Accuracy: Sensitivity 75%, Specificity: 95%

What we really need to know is given a test result how likely is it that this person is ill?

This is a perfect place where Bayes’ theorem would be useful, but instead let’s crunch the numbers to show what’s going on here. It’ll be helpful to bring back our table from earlier.

Disease Status	Test Result	Outcome	Proportion Test A	Proportion Test B
Does not have Disease X	Negative	True negative	71.25%	90.25%
Does not have Disease X	Positive	False Positive	23.75%	4.75%
Has Disease X	Negative	False Negative	0.25%	1.25%
Has Disease X	Positive	True Positive	4.75%	3.75%

What we have done is multiply the proportions out by the known characteristics of our respective tests to find out what proportion of outcomes we should expect, then all that’s left to compute is how likely is that the test result is true which is as simple as:

For Positives results
- True Positive / All Positive Results
For Negatives results
- True Negative / All Negative Results

	Probability of Being Ill given the test result
Outcome	Test A	Test B
Positive	16.67%	44.12%
Negative	0.35%	1.37%

Interpreting this we see that the probability that a person has Disease X given a positive result using Test B would be 44.12% compared to 5% prior to the test and 16.67% for Test A.

These are both surprisingly low probabilities, and this is the reason why doctors don’t or shouldn’t order tests in bulk. The tests have been designed to work on a specific population for example people who are exhibiting a specific set of symptoms and failing to use the appropriate screening criteria can lead to needless stress and medical interventions for a condition that a patient doesn’t have.

Chapter 5: A real example with COVID-19

Finally let’s do a real example, according to the ONS the overall infection rate was around 4.87% in the UK at the time of writing this blog post (23^rd March). A study of a specific type of rapid antigen test found that it had a Sensitivity of 65.3%, and a Specificity of 99.9%.

	Probability of Being Ill given the test result
Outcome
Positive	97.10%
Negative	1.75%

I hope this small blog post has given a little bit of insight as to what the outcome of test really means, I hope I’ve shown that ‘accuracy’ alone is not that helpful and that we need to consider how common a condition is in the selected population to estimate how much useful information a test outcome can give.