The recent outbreak of the coronavirus 2019-nCoV has been unsettling. As of 25-01-20 there have been 56 deaths contributed to the virus out of 1350 lab-confirmed cases1; a 4% death rate. Coronaviruses are common among animals with humans typically getting infected once over their lifetimes resulting in a mild cold or cough2. Like all viruses however, they can mutate into something more serious leading to acute illness and sometimes death particularly in those already battling with other illnesses.
We’ve seen coronaviruses before – SARS in 2002 and MERS in 2012 were both coronaviruses. What’s different about 2019-nCoV (n for novel, and CoV for coronavirus) is the radical steps the taken by the Chinese government to halt the spread of the virus, deciding to restrict transport in and out of 12 cities to date and cancelling all Lunar New Year celebrations – and other governments have been following suit. Schools and attractions such as Disneyland in Hong Kong have been closed until further notice, Taiwan has suspended travel permit applications from Chinese citizens and the U.S began evacuating all American citizens from Wuhan City (the city believed to be ground zero of this virus).
I recently completed a module in social network analysis (SNA) and coincidentally (and somewhat concerningly) for a group project assignment Eoghan Keegan3, Isabel Sicardi Rosell4 and myself created a simulation study of how a virus closely modeled on the 2002 SARS virus would spread across the USA airport network to understand the extent of a potential outbreak and to assess what the most effective vaccination strategy would be.
Using Python to create the simulation model and Gephi to visualize results we were able to determine that in the absence of a complete transportation shutdown, an evolving strategy of assigning limited resources to the airports with the most connections to other airports on a day by day basis was the best way to combat the spread of the virus.
SNA is a useful tool to understand complex and dynamic relationships between entities, objects or people. The name SNA can be misleading- while it’s origins come from the study of sociology it can be applied to a range of topics. In biochemistry SNA is used to understand the relationships of proteins in the human body while in engineering it’s used to fault test power grids. In this instance, we used it to understand the relationships between airports in the U.S. One of the many benefits of SNA is that because each network generates mathematical properties that allow us to understand things like how closely networks are connected, we can compare how connected one network is compared to another. These properties characterize networks.
After sourcing internal U.S flight data from the year 2008 we created a network of flights between airports using each airport as the nodes (the object we want to understand) and the flights between each node as the edges (the relationship between each object). The graph to the left is the network graph created with each circle representing an airport (the nodes) and each line representing a flight connection (the edges) and is organised by its geographical location.
The next step in our process was to understand the strength of the relationships between airports because this is critical in understanding how a disease will spread throughout a network. In SNA, you can measure the strength of relationships in a number of ways and this is called the weight of the relationship. The bigger the weight, the stronger the relationship. In this instance we used the number of flights between airports as the weights.
Measures of Centrality
There are a variety of methods that can be used to determine which airports were the most important in our network and these are known as measures of centrality of a node:
- degree centrality is the number of edges directly connected to the node. An example of a European airport with high degree centrality would be Amsterdam because it acts as a hub to other airports.
- closeness centrality is the measure of the average length of the shortest path between the node and all other nodes – essentially, how many steps away is the airport from connecting to another airport. An example is flying from Dublin to Singapore – we would need to stop over in London to continue our journey, so in this instance the length of the journey is 1. We calculate this for every node and average the result, leaving our closeness centrality measurement.
- betweenness centrality measures how many times the node is on the shortest path between nodes. Going back to our Amsterdam airport example, this would have a high betweenness measure because it acts as a connection point for many of the worlds airports.
- eigenvector centrality ranks each airports importance by the number of important airports it’s connected to on a scale of 0 to 1 with 1 being the most important – essentially, how important airports are depends on the importance of the airports it’s connected to.
We decided to measure the importance of airports using eigenvector centrality. In the graph to the right, each airports node size is relative to its eigenvector centrality rating. The bigger the circle, the greater the importance. Our results indicate that Hartsfield Jackson Atlanta International Airport was the most important in 2008 with Chicago O’Hare International and Memphis International following closely.
A useful tool in Gephi is the ability to cluster nodes together using modularity. Modularity allows us to assign each airport to a cluster depending on the shared connections between them. If you take another look at the above graph you’ll notice there are 4 distinct colors – these represent the clusters each airport relate to. Taking the North Eastern seaboard as an example (those nodes highlighted in green), while airports in this cluster have connections to airports in other clusters, our results indicate that they are highly connected to each other. The same can be said about the South Eastern seaboard while we have more of a mix between Central and Western clusters.
A final note on the network graph created by the U.S airport network is that is creates a scale free network. Without going too much into details, it shows us that the nodes with the higher eigenvector centrality are at the centre of the network representing how important they are. An important characteristic of this type of network is that it is susceptible to targeted attacks. By removing a few of these important nodes we’re able to break the network. Taking the left most node in the graph as an example – if we were trying to get from there across to the right most node in the graph, we need to go through the central nodes. If the central nodes don’t exist, it may be impossible to get to the right node – thus our network is susceptible to targeted attacks!
As with all simulation studies, parameters have to be set. To design the virus we researched various outbreaks and decided our hypothetical virus would be based on the SARS virus from 2002. Our virus had a rapid onset time resulting in immediate transferal upon contraction, an infection rate of 19% and a death rate of 9.6%.
Our model is based on a susceptible-infected-removed model with each day representing 1 time period. This model calculates the nodes population on a daily basis and is made up of S (susceptible), I (infected) and R (removed). Susceptible represents the population susceptible to infection from its neighbors, infected represents the population who carry the virus and removed represents the population that has been removed from the model either due to recovery after contracting the virus or death as a result of the virus. For more detailed information of our model such as our random implementation of outbreak of disease, please see the download button at the bottom of this post for our full paper.
We proposed that we could engineer a vaccine within 30 days of the virus outbreak (very unrealistic in real life!) so we ran the model for 30 days without interruption. We implemented several vaccine strategies after this point that provided an immediate 25% reduction of infection with diminishing returns once introduced. Most of our strategies were resource allocation calculations, meaning we decided how to allocate resources based on the importance of nodes as calculated by measures of centrality like degree and closeness. We also decided to attempt to split our resources evenly across the network as a comparison.
In total there were over 7 million flights transporting 534 million passengers between 516 airports. When left unchecked, the hypothetical virus infected 2.2 million passengers. While 2.2 million passengers doesn’t sound like a lot, our population is closed meaning we don’t consider people who aren’t travelling and how the disease spreads among them.
By distributing resources evenly we showed that the infection rate drops to 1.1 million but when we use SNA measures to allocate resources we found that a greedy algorithm was most effective. We call it a greedy algorithm because it targets the node with he most infected population at each iteration and allocates all resources to that node for that day. Using the greedy algorithm, only 370,000 passengers were infected – a third when compared to even distribution.
While some parameters of our hypothetical model are unrealistic, the results are concerning. It shows that the rapid spread of contagious viruses are a genuine threat. Taking another look at the infection rate graph, we can see that there is an exponential growth of the virus in the first 30 days – information that is readily available to all epidemic specialists and it is with this lens that we can perhaps conclude that China’s radical response to 2019 n-CoV is the correct one.