Above: Tacking public COVID-19 testing data in Excel with Power Query
In this article, I want to share a quick example of how to track testing for COVID-19 using Excel and publicly available data. This is a bare bones tutorial, focused only on the basics of connecting Excel with publicly available data.
The end result is a simple Excel table that shows the most recent testing data by state. The data is fetched and “shaped” with Power Query, then dropped back into Excel, where it can be refreshed with a single click. The approach is general and can be used with all kinds of public data. The complete Excel file is attached below for reference.
For this example, we’re going to use data from the COVID tracking project website. The COVID Tracking Project collects data at the U.S. State level on testing for coronavirus, SARS-CoV-2. Note: the data is not perfect, and varies state-by-state. The API for requesting data is explained here. In this example, we are fetching the “States Current Values” data which, as of this writing contains 19 columns, most of which we discard.
This project depends on Power Query, so you’ll need at Excel 2013 or later on Windows. On the Mac, you can refresh queries with Office 365 Excel (subscription) but you can’t yet edit or create queries (as of March 25, 2020).
Getting the data into Excel
The best tool for the job is Power Query. Power Query is part of Microsoft’s BI suite. In a nutshell, Power Query is a tool for fetching, cleaning, and shaping data.
If you are new the Power Query, be aware that it has a vast feature set and an intimidating interface. Even if you spend a lot of time in Excel, you are going to feel like you’ve landed in an alien world. Familiar, yet distinctly different.
Never fear, we are going to keep things as simple as possible. There are many ways this example can be improved or embellished once you get things working.
To orient you, here are the high level steps we are going to perform:
- Create new excel workbook
- Create a new query to fetch data
- Edit query to shape data
- Load data back to Excel Table
- Add formulas as desired
The first two steps happen in Excel. The last two steps are done in power query. Once you have the query set up, you can right-click inside the table and select refresh. Fresh data will be collected, and the data will be shaped according to the steps defined in the query.
Steps to create the query
These are the steps I used to create the query that fetches data from the tracking website.
- Click Data > Get Data > From web
- Enter the url: https://covidtracking.com/api/states.csv and click OK
- Click the Transform Data button to launch Power Query:
- Power Query will automatically add three steps: Source, Promote Headers and Change type. If you select a step, you an see what it does.
- Control-click to select five columns: state, positive, death, dateModified, totalTestResults. Then, right-click on a select column and choose “Remove other columns”.
- Rename columns to: state, positive, death, modified, total (optional). Double-click header to rename columns.
- Reorder columns to state, positive, death, dateModified, totalTestResults. Drag to reorder.
- Sort data by the “positive” column in descending order.
- Rename Query to “states”:
- Verify you have five columns of data like this:
- Click Close and Load button on Data tab of ribbon.
- The data will end up in an Excel Table called “states”
To fetch the latest data, right-click in the table and select “Refresh”. Power Query will pull down a fresh set of source data, run through the steps defined above, and deliver the result back to Excel.
Back in Excel
Once the data is in an Excel table, I added a column called “pos %” that calculates what percentage of total tests are positive with this formula:
=[@positive]/[@total] // calculate percent positive
This is just something I was curious about. It could be added in Power Query instead (before loading to Excel) but to keep things simple, the formula was added manually in Excel. Since it lives in an Excel Table, it stays up to date when date changes.
Then I added formulas to summarize the data:=SUM(states[total]) // J4 =SUM(states[positive]) // J5 =J5/J4 // J4 =SUM(states[deaths]) // J7 =MAX(states[modified]) // J9 =MIN(states[modified]) // J10
How to edit the query
- Click Queries and Connections on the Data tab of the ribbon
- Double click the “states” query to edit
- Many of the articles I’ve read about COVID-19 warn against focusing on testing, because it distracts from the more serious problem, exponential spread of a contagious disease. To be clear, this article’s only purpose is to show one example of how to get public data into Excel.