As the antibody tests of SARS-CoV-2 are being carried out, and people start to explain (e.g., on Twitter) what is means to test positive, I want to elaborate on a concept that most elementary probability textbook have mentioned, since people may be overly anxious (or too careless) about a positive result.

The estimate of how many people are positive in your region (*prior*) affects **a lot** the estimated chances of you having it, given you tested positive (*posterior*).

## What is known and unknown?

Let’s clarify some terms, those known, unknown, and requires assumption:

- : people actually wants to know this. The probability of I have it, given I test positive. However, this is very commonly confused with:
- : the
*sensitivity*of test. According to the FDA document, Cellex antibody tester has sensitivity^{1} - : the
*specificity*of the test. Cellex has . - : how much of the population might be affected, or . You need to assume a value, in order to compute from those known values using the Bayes Rule. This is the
*prior*. - P(test+): how many of the tests might be positive. Computing this depends on P(+). More specifically:

## The Bayes Rule

The first time of understanding Bayes Rule requires getting over some intuitions, but once done, things start getting intuitive again.

Bayes Rule starts from the fact that the joint probability could be factorized in two ways:

The left hand sides of the above two equations are the same, so we have:

In other words:

Now that we can plug in the values to see our likelihood of actually having it given a positive test.

## What are the results?

The results depend on the *prior* assumption. **A lot**. Here is how.

**First scenario**

If you assume that there are plenty of SARS-nCoV-2 tests, and that there are not too many asymptomatic carriers — in other words, is close to , which is around in US right now ^{2}, then:

Which means you only have slightly more one in fifty chance of actually having it, when your test result is positive.

**Second scenario**

If you assume that COVID-19 tests are insufficient, and that many people with slight symptoms just stayed at home and recovered. Only those with serious symptoms went to get a test. Let’s say is somewhat underestimated by by a factor of ten. Then let’s assume in the country have got COVID-19 or some light-symptom variants, then:

Which means you only have less than one in five chance of actually having it.

**Third scenario**

If you think there are *way more* people who have it than the COVID-19 positive test result shows, for example, by considering that the mortality rate appears overwhelmingly high in some countries (e.g., more than in UK vs. around in Germany). Let’s be maniac and assume of people in this country have it, with most () of them didn’t even think they need to be tested. In this case, let’s set the prior to be , then:

This is surprisingly high — almost three out of four! You can see how ridiculous conspiracy theories can change your results.

**What do the varying numbers mean?**

Don’t scare yourself to death by placing conspiracy theories on the *priors*, please. There are not that many overestimates. Even if all countries underreport the numbers, my intuition is that can’t possibly underestimate by, let’s say, a factor of ten.

In short — I think^{3} what happened is closer to the first scenario in Canada, Japan, Korea, China except Hubei (a month ago), and most states in US, and closer to the second scenario in Hubei, New York / New Jersey, UK, Spain, and Italy.

## How about the diagnosis?

Following a similar line of thought, one might ask if there are such a high chance of “not having it when testing positive” for the COVID-19 virus test (not the antibody).

I tend to believe **no**. Here’s why.

Since majority of COVID-19 virus tests are done with those who have symptoms, we need to condition everything on the variable “symptom=True”. Therefore, would be of the scale of let’s say , estimated by^{4} . Assuming the sensitivity of COVID RNA tests are high (e.g., ), then the posterior likelihood, “chance of having it when tested true”, would not differ much from the test sensitivity.

## Conclusion

A good estimate of the prior probability in your region is essential for an accurate value. Dependent on the regional situation, we should neither be overly anxious, nor too careless about the testing results.

## Footnotes

- 1.These are positive percent agreement actually -- I'm saying sensitivity for the purpose of this blog. ↩
- 2.Data source: worldometers ↩
- 3.I am not healthcare professional. I study computer science (AI), and have taken probability & stats courses. All of my source of analysis come from public source. If you want medical advice, ask a healthcare professional, please. ↩
- 4.Data source: 1point3acres ↩