Open collaboration on COVID-19
In the midst of the uncertainty and seriousness of COVID-19, we’ve been inspired to see a global community of scientists, government officials, journalists, programmers, and concerned citizens come together to collaborate on a variety of projects with the shared goal of understanding COVID-19 and coordinating on the best response. Even though many of these projects aren’t traditional software projects, the same collaborative development model is being applied to curated data sets, DIY instruction sets, and more.
Below are some of the most impactful open source projects we’ve seen for tracking, understanding, and responding to COVID-19 so far.
Tracking the pandemic collaboratively
One of the most cited open COVID-19 datasets is provided by Johns Hopkins University (JHU). Epidemiologists, journalists, and statisticians from around the world are treating this as one of the canonical sources of data on the outbreak. The data is also used to power this interactive dashboard, which tracks reported cases of COVID-19 in real time. As they explain in their article in The Lancet, the Johns Hopkins Center for Systems Science and Engineering developed the dashboard “to provide researchers, public health authorities, and the general public with a user-friendly tool to track the outbreak as it unfolds.”
The data is pulled in from various sources (primarily DXY), verified by cross-referencing other sources such as the WHO. This dashboard has generally been faster than the WHO in reporting countries’ first cases. The JHU team believes the dashboard is especially useful in providing essential information for appropriate responses in the earliest stages of viral outbreak.
Another high-quality dataset made available to the public is the nCoV2019 dataset by the Institute for Health Metrics and Evaluation at the University of Washington. The data is presented in this dashboard. The dataset contains highly individual data for each patient such as date of symptom onset, date of laboratory confirmation, and more. It’s intended to aid in calculating key statistics of COVID-19 such as reproduction number, incubation period, and other important factors.
Tracking cases in the US
The most comprehensive data source on US testing and infection rates is the COVID-19 Tracker project.[3,4] The project’s numbers are available on a web page and Google sheet, and via a public API. This project was started in early March, led by a partnership of The Atlantic and the founder of Related Sciences out of concern about the lack of testing information being provided by the CDC. The partners put out a call for volunteers, who quickly developed a collection of software packages to crawl state websites, aggregate the data, and make the dataset available to the public via APIs. The project was developed quickly, and the team shared its source code and datasets. Our World in Data’s page on COVID-19 testing used to list COVID19 Tracker numbers alongside CDC numbers, but now only reports the COVID19 Tracker numbers.
Volunteer computing for large scale research
Scientific work is also being carried out on COVID-19, both for epidemiological research and in the hopes of finding a vaccine or a cure. Folding@home is a distributed-computing project that uses the personal computers of volunteers to model molecular dynamics for, among other things, computational drug design. They have started an effort focused on COVID-19 to find potentially druggable protein targets. Data for this effort is stored in this repository. Folding@home is an open-source project, and all of its datasets and software are available.
Helping the public
The WHO app collective is rapidly putting together a mobile application to help people around the world cope with COVID-19. The team, led by Dr. Daniel Kraft, is rapidly putting together a first version of the app. Their goal is to have the app provide local information for people and have their data feedback to public health officials to improve accuracy for other users.
Faster application of the scientific method
Nextstrain is an open-source project for tracking and analyzing pathogen genomes. They run a dashboard of the genomic epidemiology of COVID-19. The dashboard shows the evolutionary relationships of the mutations of the HCoV-19 viruses, which can help to trace the origins of the virus. Nextstrain’s goal is to aid epidemiological understanding of viruses to improve outbreak response. They state explicitly on their website that “current scientific publishing practices hinder the rapid dissemination of epidemiologically relevant results,” and they are dedicated to providing high-quality data quickly to minimize the damage done by pandemic outbreaks. Nextstrain’s COVID-19 dashboard sources its data from GISAID, which has strict sharing guidelines, but its software is all open source.
Smaller scientific datasets abound, such as this repository of chest X-ray images, aimed at developing AI to improve diagnostic accuracy and predict the infection.
There are numerous smaller-scale scientific visualization projects on COVID-19. The Novel Coronavirus Infection Map provides visualizations of infection histories globally or broken down by country. It’s the work of the Humanistic GIS Lab at the University of Washington and pulls in data from numerous government and public health organizations.
COVID-19 Scenarios is a COVID-19 outbreak simulator designed to determine strain on the health care systems in various regions as the outbreak unfolds.
COVID-19 Dashboards is a set of interactive visualizations of the Johns Hopkins COVID-19 data built in Jupyter Notebooks and converted to blog posts with fastpages. GitHub Actions are used to keep the COVID-19 Dashboards dataset up to date, so the visualizations are always current. This entire site is open source and has been built by a group of volunteer programmers and data scientists. The site includes predictions as well as visualizations, and so is well suited to an open source approach where the source code of the predictive model can be directly examined (fastpages presents the source code directly embedded in the generated web page).
In a similar vein, Predict COVID-19 (repository) allows users to compare the number of COVID-19 cases between different countries, which gives an idea of how the epidemic might progress in the coming days.
A number of projects like this one have been developed to simplify programmatic access to COVID-19 data. This API, serving out the Johns Hopkins data, drives numerous COVID-19 visualization sites, including almost 20 responsive live visualizations.
Nations, states, municipalities, communities
The country of Italy is sharing all its latest COVID-19 data. This data is used to power a dashboard, which tracks infections throughout the country in real time. In this same vein, various metropolitan areas like Tokyo and Zurich are storing and sharing real-time infection information via GitHub repositories.
The Wuhan2020 community project is a self-organized, open source community project aimed at “establishing a data service for real-time synchronization of hospitals, factories, procurement and other information, and convening all those who want to contribute to this fight against viruses”.
Finally, the Low-Cost Open-Source Ventilator project gives extensive instructions on how to build a low-cost respirator, which may save lives if hospitals’ supplies of standard respirators become exhausted.
Need GitHub’s help for a COVID-19 project?
We’re inspired to see this community of contributors come together with such a robust response to the COVID-19 outbreak. We are already donating 60,000 computing hours/day to Folding@home, and we’ve reached out to other projects to offer support. If you’re on a team that needs access to any GitHub products or services for a project related to COVID-19, send us a note with information about your project and how GitHub can help.