Lately a lot of people have been trying their hand at understanding the data behind COVID-19. Understandably so. However, what most people are forgetting is that to truly understand Coronavirus data you need to look at all of the factors influencing that data.
I thought I should break down how I try read and understand data. I am not a data analyst, however my job requires data analysis. My full-time job is in SEO, we optimise websites for search engines. Every decision we make needs to be informed by data and the variables behind that data, otherwise our work is meaningless. So with that being said, let’s take a deep dive into some data and stats.
Last week this graph posted by a professional Rugby player went semi-viral:
If you’re like me the first thing you do when you view that graph is freak out. Then you check the source. I did so, the source is legitimate. I am going to give Bryan the benefit of the doubt, I think he was sharing this with the right intent. He wanted people to stay home, which is totally what we should have been doing at the time and should be doing now.
That being said, that graph does not take two key factors into account and thus does not tell much of an actual story. First factor: how quickly was testing done after the first COVID-19 case was found? Secondly, at what point did the countries implement social distancing or a lockdown?
COVID-19 hit South Africa after we had seen the damage it did to Italy. Naturally as a nation we were more cautious and started testing earlier, which would result in more confirmed cases but less spreading. Here is the full graph as of the 29th of March 2020.
As you can see South Africa’s curve begins to flatten long before the other countries, in terms of the number of cases.
On top of that, you’ll notice that at around about the 10-20 cases mark the other countries begin to spike. That would more than likely be because people were beginning to fear the virus more and were making sure they started getting tested.
To further support my argument that we as a country began full testing earlier than the other countries, I have changed the graph to show calendar dates instead of relative dates.
As you can see the virus hit South Africa a month after the other countries. Meaning that we had seen the havoc it had caused in Europe and knew to get tested earlier.
Those graphs are looking at the number of cases from the time of the 1st case being picked up in a country. Let’s try after 150 cases have been picked up. I am looking at relative dates here rather than calendar dates.
Are you starting to see the trend? The stats were more worrying to start with because of factor 1. We more than likely had more tests done after the first case was picked up.
Our curve is starting to flatten because we implemented social distancing and our national lockdown before the other countries did. How do I know this? Well, Italy implemented a full lockdown on the 9th of March roughly after they had just under 10 000 confirmed cases of COVID-19.
South Africa’s lockdown was announced once we had 552 confirmed cases of COVID-19.
Okay cool, South Africa is flattening the curve. Excellently, according to the current trend.
That being said, something else that I have been misinterpreting is the death rate. When we calculate the death rate we all probably look at amount of cases to amount of deaths. In truth the death rate is the difference between the recovered cases and confirmed dead cases. Which means that the death rate is far worse than you’d expect.
What that means is, if there are 654 841 confirmed cases globally and 44 169 deaths (according to COVID Visualizer at time of writing). I would then think the death rate would be the percentage of deaths compared to cases. If that were the case the death rate would be 6.75% roughly. Which is quite low. However, the problem is we are yet to confirm whether the active cases will recover or not.
A more accurate way of interpreting the death rate would be to add the total recoveries with confirmed deaths. Right now, the amount of recovered patients = 185,180 and confirmed deaths = 44,169. That total =229, 349. The death rate would then be = to what 44,169 is percentage wise compared to the total. That percentage is 19.26%. That is a pretty intense death rate.
There are other factors behind that death rate, such as the age of the patient, previous health conditions, quality of medical care, did they die of COVID-19 or did they die of other causes and happen to have COVID-19 at the same time. There are various factors that we as the public aren’t seeing. So to be honest we cannot actually make truth claims about the virus, because we don’t have all of the information. We’re working with stats which we don’t fully understand and are interpreting.
Another way this is being done is by everyone refering to Italy as an example of how bad the virus isgoing to get, while not understanding that Italy may be an outlier in terms of death rate. Let us confirm this by taking Italy away from the amount of cases:
Total: 229, 349 – (12,428(deaths)+15,729(recoveries)) = 201,192
Deaths: 44,169 – 12,428 = 31, 741
Recoveries: 185,180 – 15,729 = 169, 451
The death rate of COVID-19 when you take Italy away (44.2% death rate), the percentage is 18.73%. Okay, maybe Italy isn’t that big of an outlier. However there are other outliers like America, whose death rate at the time of writing is 35.9%.
If I you take America and Italy out of the equation the global death rate becomes 17.1%.
The death rate is bad, that cannot be denied. The lockdowns across the world should be helpful and hopefully we will see fewer deaths from COVID-19. Hopefully this whole crisis will end soon.
I did not write this post to stoke fear or to encourage stupid bravery. The stats suggest we should be staying home for now, but that we shouldn’t panic. I wrote this post because I get frustrated when people post stats without giving them a critical eye, which has been happening a whole lot over the last few weeks in South Africa. What I want to achieve is a more critical analysis of the stats, where we assess all of the factors. No line graph, bar graph or statistic is a statistic in isolation, there are other factors that need to be explored.
With that in mind, all I ask is that you give the stats a proper read and think before reacting or sharing them.