Review of COVID-19 Data

Data will talk to you if you are willing to Listen.
Jim Bergeson

Data is not mere numbers but can showcase powerful information.
Here is the iconic Johns Hopkins Covid-19 dashboard .

I appreciate the information provided by Johns Hopkins and it is great that such a level of transparency is provided. However, I have few opinions on the visualizations and the information that is provided.

It shows daily confirmed, deaths and recovered information along with an yellow line chart that is going in only one direction and that is up indicating cumulative confirmed cases. The size of the red bubbles depends on the number of cumulative counts of patients. There is a tab which shows the active cases.

There is another tab at the right bottom corner which shows daily increase.

On the left confirmed cases by country.These are confirmed cases and not necessarily deaths, I would have chosen orange or yellow but it is red for some reason.

Then the deaths and this is most confusing to me. How can you use white font to depict number of deaths. I understand the recovery being green. Don’t get me wrong, if I were showing the chart, I would select Red for deaths, orange/yellow for the confirmed cases and green for recovery to be consistent with our traffic light system unless you are motivated to scare the public about the pandemic (which may not be the case). Another confusing piece in this section is that the data is divided by country, city and region as a result I am more confused and have to use other means to calculate the deaths in china (by adding the regions etc.,)

This is all and good but does it provide all the information I need. Let us review.
– Total confirmed cases by country and world – Yes
– Total deaths by region, city – Yes
– Total deaths by country – Yes
– Total recovered – Yes
– Daily increase – Yes
– Zeroing in on the country, state and zip – Yes
– Total daily increase – Yes
– Total daily increase by country – Yes
– Compare couple of countries – No
– Daily new and confirmed with country comparison – sort of
– Rolling 3-day average of deaths – no
– Deaths increasing at different rates – no
– how long it is taking to double or other rates – no
– daily new confirmed cases comparing with other countries – no

Therefore I had to find another resource to answer some of my questions. I came across a site Our world in data that gave me these answers.

The following chart tells me, how long it took to double deaths by country and what is the current number. In this case it took 42 days to double in china while US it is doubling in three days. This could be because we are at the early stage of epidemic.

Following shows the confirmed deaths and I picked US, Italy,China and India for comparison.

In the graphic below, we see the daily new confirmed deaths again used four countries (US, Italy, China and India). An interesting pattern is observed in Italy where there is a dip during 3/24 but it increased again by 3/28. I couldn’t understand the reason behind it, may be due to lack of following “Social distancing” (my opinion and couldn’t find any sources to confirm)

Here is the rolling 3-day average

Death rates of various countries and this is important to understand how the pandemic is spreading in various countries and why.

How long did it take to double in the confirmed cases for each country.

Total daily confirmed cases for each country. Here we see US just surpassed Italy.

New confirmed cases. Many reasons can be attributed to this dramatic growth in US. One is that test kits may be available to confirm the cases which were scarce couple of weeks ago.

Rolling 3-day average is the most concerning aspect. As you can see, US has drastically exceeded even China in this average.While china had only 7000 US is 17000.

This fatality rate gives hope but US is in early stage

Iconic “Flattening the curve” which demonstrates the power of social distancing and its effects on the spread. Best visualization of the data and I salute to the folks that came up with it. Let’s do our part to stay home.

There is much more in Our World In Data. Numbers alone cannot tell a story, the data analyst/visualizer has to understand what information he needs to relay to the audience unambiguously. Otherwise, you create panic and confusion. My notes above is to provide an insight into visualization and an attempt to sift through the data for some answers.

Is there a data problem?

Data image

“Numbers don’t lie. Women lie, men lie, but numbers don’t lie.” – Max Holloway

Data in its simplest form may not be just numbers but it can communicate meaningful information in our lives.  Take salary for example, we all know that it has to go up and when it comes down everybody notices it. Typically no one complains when it goes up but never fail to report if the numbers are down. Let us consider an hypothetical problem  or perceived problem of salaries paid to employees.

The employee Gabriel in the month of April reviews his salary and realizes there is a $200 drop from his salary from January and promptly calls HR. HR reaches to IT for clarification. Usually the HR software stores the details and we can easily extract it but in this case let’s assume that the calculations are executed in the backend code and just the results are stored in the table. 

IT looks at the numbers and sure enough there is a drop, let’s see what we can find. The general tendency is to assume that there is a problem in the system. With that assumption they do all querying, walking through the code applying the business rules but fail to find any smoking gun.

Then they realize after spending many hours that the system works as designed. There could be reasons outside the system that need to be validated. Meanwhile, they see some note that says the bonus is paid at the end of year but credited beginning of the year.  IT reviews with the business owners and sure enough they remember bonus given in December gets credited in January. Employee is notified and he goes back and checks his December salary and it matches with February salary. However, if the same situation happens in the future the same song and dance had to be done to identify the issue because the same employees may or may not be there to support the application. So it is critical we design our systems with enough logging and adding business calculations in the system designs.

In this case, we need to have included the bonus information in the table and a total on how we arrived at employee’s final salary.

The above use case is only for demonstrative purpose to explain the concept in simple terms and real life structures are much more complex.

Bottom line, when you go about analyzing the data make sure you trace the processes step by step and understanding if the numbers jive with the previous step, documenting it until you come to the end point. We need to have an open mind without any bias, in approaching these real or perceived data issues.