Ireland and her exports

On July 15th 2020 Ireland won its appeal against a European Commission ruling that it gave Apple illegal state aid in the form of tax advantages that gave it an unfair advantage over other EU countries. According to the Commission, Apple owes Ireland a hefty sum of €14.3 billion in uncollected taxes. Margrethe Vestager, the EU Commissioner for Competition Policy and ultimately the figurehead for the EU on this front, has been fighting what she determines are unfair tax breaks being applied by States like Ireland. Labelled as a “consumer champion” by some commentators, Vestager is determined to bring order to an ever growing digital economy.

Vestager is likely to bring the case to the EU’s highest court, the Court of Justice of the European Union with potentially significant implications for the future of Irish tax policies and the companies that have chosen to establish their EU headquarters on the island. Naturally there are conflicting opinions on the ruling. Some argue that Ireland is blatantly allowing international powerhouses funnel money through it’s Irish headquarters, akin to money laundering. Others argue without foreign direct investment, Ireland wouldn’t be in the strong position it finds itself in, Covid-19 crises and all. Quoting the EU directly:

“As a small open economy Ireland’s financial fortunes are largely dependent on international trade and influenced by global markets.”

https://ec.europa.eu/ireland/news/key-eu-policy-areas/economy_en

So, with this backdrop, I continued my exploration of Irish exports over the last 5 years and took a look into the commodities we’re sending overseas.

data preprocessing

In my last post I showed how easy it is to map data in Python by aggregating an Irish exports dataset published by the CSO. For this post I used the same dataset and followed a similar method of aggregating the data – except this time I condensed the commodities into groups. This was a critical part of the process because there are over 60 commodity classifications in the dataset and too many to graph at once.

Typically you’d have an SME on hand to confirm your groupings and naming conventions but in this case I’ll trust my gut – although upon reflection after looking at my results, the commodity group ‘Transport’ should have been something like ‘Automobiles’ but we’ll plough on anyway. By grouping the categories, I was able to reduce them down to 13 groups and 1 catch all group ‘None’. The catch all group is created as a back up if we miss any categories and for those that have no category at all so hopefully the total of this group is negligible. You’ll find an extract of how I did this below in the code snippet but as usual, you’ll find the full code at the bottom of this post.

# setting up a function that searches for the string in the category
# each category has a number with the description, so by searching for the number, I can assign it a new group
# for example, Live animals except fish etc. (00) is a category - so I search (00) & assign it agriculture  

def commodity(x):
    if '(00)' in x:
        return 'Agriculture'
    elif '(01)' in x:
        return 'Food & Beverage'
    elif '(02)' in x:
        return 'Food & Beverage'
    elif '(03)' in x:
        return 'Food & Beverage'
    elif '(04)' in x:
        return 'Agriculture'
 .......

home of pharma & chemical exports

The next step in the process is to visualise the groupings. Since starting in Bank of Ireland I’ve been using Tableau as my main data visualisation tool & I have to say – I’m sold! It’s such a handy tool as long as the data is in the correct format (which it is, thanks to Python). So I continued the trend and used it to create the below visuals.

Line charts are useful tools for showing trends over time however when you have too many categories to visualise at once it’s sometimes difficult to see what story is.

Here we can see that there are 2 categories (pharma & chemical manufacturing) that are several times higher in value than all other categories and look to be strong areas of growth in Ireland. After that it’s quite difficult to see trends in other categories because the lines are overlapping. What we can say is that apart from machinery, things looks fairly steady suggesting minimal growth. For reference, the colour legend in the data prep section represents each commodity.

A way to visualise the composition of categories is to create bar charts. By allowing for a direct comparison from year to year we can see a clearer view of the difference between categories and years. By arranging the bars from highest to lowest, highest value being at the bottom and lowest on top, we can see how pharma and chemical manufacturing have a significant contribution to total exports. Whats difficult about this view is that we can’t really see the composition of the other categories – similarly to the line chart above, it looks steady but it’s hard to tell for sure. This is where stacked bar charts prove their worth!

stacks, stacks, stacks

A stacked bar chart is the same as a normal bar chart however the values are converted into percentage values of the total sum. Taking 2019 for instance, instead of showing the actual value of export for each commodity group, we can see the percentage value relative to all other commodities. This allows us to identify the areas of growth in the economy relative to all other areas.

For instance, while we see growth in the food & beverage sector in real value terms between 2015 & 2019 (€7.9B in 2015 to €10.B in 2019), in percentage worth to the economy, it remains steady because of the increases in other sectors. Taking a look into the agriculture sector we see a modest increase in real value terms (€3.5B in 2015 to €3.6B in 2019) but a decrease in percentage terms.

overall value

Heatmaps are a really useful tool if we’re concerned with a point in time proportional view of our data. In this instance, I totaled the value of each commodity group between the years 2015 & 2019 to see how much each contributes to the Irish economy. Again, we can see just how important pharma & chemical exports are to Ireland with general manufacturing and machinery being the next most important commodity groups.

the movers and shakers

The last visual I created shows the percentage change in each commodity group over the 5 years. As 2015 is the base year there is no value assigned. The value in 2016 represents the percentage change between 2015 & 2016, with 2017 being the change between 2016 & 2017 and so on. By representing the percentage change in colour format, it allows us to visualise areas of growth and decline. A darker brown/orange colour shows a higher negative change while a darker blue shows a higher positive change.

This view shows us that tobacco exports have rapidly declined while there has been 2 years of decline in transport related exports. The commodity seeing the biggest swings is machinery with a massive increase of 37% worth of exports in 2019. With companies like Kingspan & Combilift expanding in recent years I wonder how much these companies are contributing to that growth.

final thoughts

I was really surprised with just how much Ireland relies on high tech exports. As recent as the 1970’s Ireland’s main export was agriculture based products to the U.K. So, in this context, is it surprising that the Irish government has strongly opposed the anti-competition charges it has faced? And how long can it defend itself against the heavy hitters of the EU who claim that it’s cheating the system? Only time will tell.

Thanks for stopping by. If you’re interested in trying it for yourself, the code and data have been included below.

mapping Irish exports in Python

Recently, there’s been a lot of talk about a reduction in consumer spending in economies around the world so it got me thinking about who are Ireland’s biggest trading partners. Fortunately, the wonderful people over at the Central Statistics Office publish this data and more but for now, I’m going to focus on creating a map in Python of total Irish exports since 2015 to identify who we’ve been exporting to the most.

Most programming tools have mapping functionalities with some easier to use than others. I prefer using folium in Python because unlike packages like ggmap in R, we don’t need to sign up for API’s and it’s relatively straight forward to use.

data preprocessing

Before we’re able to map the data, we need to preprocess it so that it’s in the correct format for mapping – and to be honest, data preprocessing is generally the most time consuming part of data science! The main elements of preprocessing for this dataset was to:

  • create a full record per row
  • replace blank values with 0’s
  • remove any total or blank columns
  • convert all remaining variables to numeric form
  • aggregate exports per country
  • create a new column calculating each country’s percentage value of total exports
  • assign countries their longitudes & latitudes for mapping

Once preprocessing is complete we can move onto the fun stuff! Below is an excerpt of the code used to create our map of exports:

# Initialize the base map - these are starting points for the map:
m3 = folium.Map(location=[20, 0], zoom_start=2)
 
# defining the settings for the chloropleth:
m3.choropleth(
 geo_data=country_shapes, # using the country shapes we got the json file
 name='Total Irish Exports between 2015 & 2020',  # name of our map
 data=countries_total, # what data we want to map
 columns=['country_codes', 'perc'], # what columns from our dataframe that we want to map
 key_on='feature.id', # what we're matching with - in our instance, we're joining the country codes to the IDs in the json file
 fill_color='YlGnBu', # the colours we want to use in the ma
p
 fill_opacity=0.5, # similar to transparency - colour setting
 line_opacity=0.5,
 legend_name='Percentage Value of Exports', # the name under our legend
 smooth_factor=0, # lines around the countries as we zoom in 
 highlight=True, # does the map highlight the country when we hover over it with a mouse 
)
folium.LayerControl().add_to(m3)

The choropleth function allows us to colour the map depending on the value attributed with each country. In this instance, the percentage of total exports column was used. The higher the percentage, the darker the colour. Surprisingly, the US are the biggest importers of Irish goods (in total € value) with more than double the total value of the next largest importer, Belgium, who are closely followed by Great Britain. The countries in black are unrecorded in our dataset (the joys of working with data – some of it simply isn’t available or falls into a category like ‘other countries’).

The downside to using something like choropleth is that when values are very similar it’s difficult to differentiate between variables. In the above map, countries between 0 and 5% of total exports are indistinguishable from each other so in this instance a simple bar chart may be more suitable depending on your needs.

an alternative map in Tableau

At my day job, I’ve been working with Tableau and I’m always pleasantly surprised at its functionality so long as your data has been preprocessed using a programming language like Python or SQL. So, here’s what it looks like in Tableau.

Which is nicer? I’ll let you decide!

Thanks for stopping by. If you’re interested in trying it for yourself, the code and data have been included below.

where’s the rain?

We love talking about the weather in Ireland. Absolutely love it. And there’s an unlimited amount of ways to describe it – fierce weather; good drying weather; scorcher; belter; cracker – can all be used to say it’s rather sunny outside. And my favourite – there’s a grand stretch in the evening!

I’ve often wondered why we love to talk about the weather and the conclusion I always come to is that it’s just an easy conversation opener and instantly puts (Irish!) people at ease. Right now, we’re beginning to hit leaving cert weather. Leaving cert weather is how we refer to the first couple of weeks in June – you’re almost always guaranteed to have sunshine in Ireland. However we’ve been a bit spoilt for sunshine in the last 2 months and it’s got me thinking, where’s the rain? So, using Python, I decided to take a look into the data to see what it’s telling us.

Sourcing data from Met Éireann (who fortunately have an open data policy and publish data collected from their 25 weather stations around Ireland on a daily basis) I created a dataset comprising of monthly rainfall and air temperature stats from 2017-2020 for each weather station. The dataset also includes the mean figures for each station over the last 30 years.

rain, rain, go away

First, I decided to take a look into total rainfall across Ireland. Living in Dublin, I’ve always felt that it rarely rains here – and the data backs me up. The boxplot to the left shows the 3 stations with the least amount of rain in the last 3 years are located in Dublin (Casement, Dublin Airport & Phoenix Park) with the wettest stations located in Newport in Mayo & Valentia in Kerry.

A boxplot is a useful tool that shows the range of values for each variable, displaying a condensed view of data that we would otherwise find difficult to compare. The line in the colored box indicates the median value for each variable (the median is the most common value). The larger the box, the larger the range of values.

Turning our attention to Athenry in the above graph, it has a median annual rainfall value of 1200 millimetres but we’ve also had years when the rainfall was 1410 mm and 1100 mm. This proves useful in 2 ways. Firstly, it allows us to compare the median of all stations at once. Secondly, it allows us to compare stations range of values to all other stations – for example, when we compare Athenry to a station like Mace Head, we can see that there is a larger amount of variation in rainfall in Athenry than Mace Head – very useful to know if you’re planning on travelling to either of those places.

too wet, too dry?

When we want to compare something over time, simple line charts are very effective. The graph to the left shows the average rainfall per month over the last 3 years in Newport (the wettest station in Ireland) and Phoenix Park (the driest station in Ireland). By comparing the last 3 years to the mean over the last 30 years, we can see that Newport has generally been wetter during the winter/autumn months and drier during the summer months excluding August which has seen nearly 33% more rain in recent times! While in the Phoenix Park, it’s generally been drier over the last 3 years than it’s been over the last 30 years – excluding the spikes in March and November.

This got me thinking – can we prove with the limited amount of data we’re working with that the weather in Ireland is getting more extreme?

To try and answer this the histogram on the right represents each stations annual rainfall grouped by year. Let’s take the mean as an example. Each one of the yellow bars represent 1 or more station and their relative annual rainfall. Taking the 800 mark into consideration, it shows that 3 stations have had 800mm of rain on average per year while at the 1200 mark it shows that 6 stations have had around 1200mm of rain on average per year. This becomes useful to know when we plot the data for 2017, 2018 and 2019.

What this shows us is that over the last 3 years we’ve seen stations with more rain than before – we have several stations past the 1600 mark while there’s only been 2 stations reaching the 1600 mark over the last 30 years. Of course, there is a possibility of these being outliers (once off events) however when we look at where the spikes are in 2017 and 2019, there are more stations getting less rain than when we compare it to the mean. Could this point to wetter winters and drier summers?

February was a wet month, right?

Come to think of it, it rained almost every day in February. Coming back to our friend the boxplot and by isolating data for February, we can see that there was nearly 3 times more rain across the nation in February 2020 when compared to February of previous years.

but May has been really dry?

By following the same method as above, we can see that May 2020 was drier than previous years, with the median almost half of what’s expected for this time of year. We can also see that there’s been a consistent drop in rainfall in May over the last 4 years and perhaps it’s figures like this that raises concerns over the possibility of a drought in Ireland in the months to come (apologies for the doomsday rhetoric!).

average rainfall per month – an alternative view

Heatmaps are another useful tool for showing the variety in data over time through the use of size representation or colour coding (as is the case in the above heatmap). Each block represents the average rainfall for each station and month colour coded relative to all other stations and months. The darker blue represents a high volume of rain while the light yellow represents a low volume of rain as represented on the color scale on the right of the heatmap. Taking the bottom left corner block as an example- it shows that Valentia in January has a high volume of rain when compared to Athenry in January (the top left corner block). Taking a look at our wettest and driest stations again – we can see it’s pretty wet all year round in Newport apart from June while it’s fairly dry all year round in Phoenix Park.

what’s the relationship between rain and air temperature?

Exploring the relationship between air temperature and rain on an aggregated level is tricky because the air temperature figures used in this dataset are averages per month while the rain is the total rainfall measurement. This is an example of how analysis can be skewed and it’s something readers should be aware of when reading any type of analysis, but we’ll plough on here anyway.

The graph to the left is a scatterplot. Each point on the graph represents a station’s average total rainfall and average air temperature grouped by month over the last 30 years. There are 25 stations and 12 months, giving us 300 individual points in total.

‘Grouped by month’ in this case means that all points that are from the same month have the same colour. For instance, all points for January are coloured pink while those for September are blue. By grouping them like this we can easily see that the relationship between total rainfall and average temperature varies depending on what month it is. Unsurprisingly, winter and spring months bring lower temperatures and a mix of high and low volumes of rainfall while summer and early autumn months bring higher temperatures with lower rainfall.

different season, different relationship

By dividing the months into winter/spring and summer/autumn, the difference between the conditions are clearer. The top graph to the right shows the winter/spring months while the bottom graph shows the summer/autumn months. By dividing the months out like this we also see a change in the correlation figures. The correlation between the total rainfall and average air temperature for the full dataset was -0.21, but when we calculate the correlation for the divided months it jumps to 0.35 and 0.32 respectively. This indicates weak positive correlation but I imagine that if we were to exclude nighttime temperatures from the averages we’d have a different picture.

one more for luck

The final graph I’ll leave you with is a heatmap representing actual heat! Similar to the heatmap used for the total rainfall per station per month, the graph to the left shows the range in average air temperatures for each station per month over the last 30 years. Unsurprisingly, it’s warmer in the summer and colder in the winter so if you’re hoping for some sun in Ireland, July and August is your best bet.

that’s all for now

Thanks for reading and I hope you enjoyed the post. The code used to create this analysis can be downloaded below along with the associated dataset. The data has been sourced from www.meteireann.ie.

Covid-19: making the most out of a not-so-good situation

There’s something strange about being told you must do something – it inherently makes you not want to do it. With many of us being advised to work from home where possible and the increasingly likeliness of Ireland following countries like Italy and Spain in enforcing lockdowns, I’m sure there’s people out there already climbing the walls. With the gym closed and public gatherings grinding to a halt, somehow the day feels a lot longer than normal. I was thinking about why and it’s all to do with our mindset.

Think back to the last Sunday evening before work when you wish the weekend would last that bit longer. Days like that go by so fast you wonder did someone speed up the clock while you weren’t looking. But when someone tells you that you’ll be at home for the next 2 weeks, time seems to drag and drag.

An example of this comes from a childhood story of mine. Being the oldest in my family meant my parents were always pushy when it came to things like homework and study. Obviously, wanting the best for their child, they made sure I had my homework done. If I got my homework done faster than normal the question ‘have you no studying to be doing instead?’ was asked. As a kid, you can imagine how well I took this. I just wanted to be outside and playing football with my mates.

Then it comes to my sister, the middle child. Again, the same questions. ‘Why aren’t you studying?’ but perhaps a little less intense. And again, similar results. What’s interesting is that for the youngest in my family, my parents were more relaxed about these things (and everything else in general). The results were different. He would spend hours studying, so much so that myself and my sister would always hear ‘he’s always studying, I never have to tell him to do it. In fact, I sometimes have to tell him the opposite!’. Why? Mindset.

So with that in mind here’s some suggestions for the weeks ahead to avoid climbing the walls:

    1. If you’re a beginner runner, try a 6 week couch to 5k program like this one
    2. If you’re an intermediate runner, why not try a 10k program like this one
    3. Bored of eating the same food? How about trying this smart recipe suggestion tool that allows you to specify what ingredients you have and suggests a recipe for them
    4. Try some meditation
    5. Take a free python course
    6. Build a personal WordPress website for your CV
    7. Read some comics – I’ve recently gotten into Rick & Morty
    8. See if you like yoga
    9. Read a book
    10. Chill out and enjoy the time with family because we’re more than likely not going to see the likes of this again

Personally, I’m going to try and customize this site to look like a Gameboy while avoiding my college work.

Stay safe!

Are you Surrounded by Idiots?

We’re all surrounded by idiots.

This is the tongue in cheek declaration by Thomas Erikson, author of the international bestseller Surrounded by Idiots… so don’t take it personally Yellows! In a saturated market of self-help books, Surrounded by Idiots is an easy-going read for those who want to understand their fellow humans.

Similar to concepts that inspired personality tests such as Myers-Briggs and DISC, Swedish behavioral expert Thomas Erikson categorizes people into 4 distinct colors (Red, Blue, Yellow and Green) with associated unique character traits. Using a mix of real life experiences and theoretical behavioral studies, Erikson attempts to give you the tools to not only understand those around you but to influence them too.

This book is definitely one for the reading list if you’ve ever asked yourself ‘why don’t they get me?’. So, without giving away too much of the book (and hopefully not get into any copyright trouble!), below is a quick outline of each personality type.

Why is he so pushy?

Direct, goal-orientated, decisive, impatient and ambitious are all common character traits of a Red. Who turns a playful office 5-a-side game into an all out serious competitive event? A Red. Who is happy when a meeting that was scheduled for 30 minutes only lasts 15 minutes? A Red. Who is confused when a colleague is upset after giving them negative feedback when they were asked for their honest feedback? A Red.

Remind you of anyone? 🙂

Why is she reading the IKEA instructions AGAIN?

Perfectionist, logical, distant, critical and thoughtful are all traits of a Blue. There’s nothing more satisfying to a Blue than being methodical. Details, details, details. Blues can’t get enough! We could all use a bit more Blue in our lives.

Why does he never stop talking?

If you know a friend who captures an audience’s attention or who could sell water to a fish, then you know a Yellow. Enthusiastic, creative, talkative and outgoing are characteristics of Yellows. Great to be around if you’re low on energy, Yellows can bring you up to their natural high with their outgoing and friendly nature.

She never forgets my birthday

Patient, loyal, thoughtful, kind and considerate. We all know a Green. Greens are considered the best team players as a result of their desire to keep everyone around them happy. Never one to want to let team members down, when a Green says they’ll do something, you can rest assured that it’ll be done.

Chameleons – it’s in our nature

Generally people are a mix of colors and it’s certainly not a one size fits all concept. By this I mean that people can have varying personalities. Personally, I’m a mix of Green and Red in the office but at home the Red side of me can’t help but show its face (apologies to my better half and family for that one!). Conversely, at social gatherings I can be a Yellow and when fulfilling my role as a Data Scientist, I’m a Blue.

While some may be against playing the ‘social game’, we all do it. Whether it’s consciously or sub-consciously, we adjust our personalities in different situations depending on the scene and setting – so we may as well understand each other a little more and stop thinking we’re surrounded by idiots.

If you’re interested in finding out what color you are, here’s a link to the test.

And here’s a link to buy the book. Happy reading!

A Disease Epidemic Study

The recent outbreak of the coronavirus 2019-nCoV has been unsettling. As of 25-01-20 there have been 56 deaths contributed to the virus out of 1350 lab-confirmed cases1; a 4% death rate. Coronaviruses are common among animals with humans typically getting infected once over their lifetimes resulting in a mild cold or cough2. Like all viruses however, they can mutate into something more serious leading to acute illness and sometimes death particularly in those already battling with other illnesses.

Source: https://www.bbc.com/news/world-asia-china-51249208

We’ve seen coronaviruses before – SARS in 2002 and MERS in 2012 were both coronaviruses. What’s different about 2019-nCoV (n for novel, and CoV for coronavirus) is the radical steps the taken by the Chinese government to halt the spread of the virus, deciding to restrict transport in and out of 12 cities to date and cancelling all Lunar New Year celebrations – and other governments have been following suit. Schools and attractions such as Disneyland in Hong Kong have been closed until further notice, Taiwan has suspended travel permit applications from Chinese citizens and the U.S began evacuating all American citizens from Wuhan City (the city believed to be ground zero of this virus).

I recently completed a module in social network analysis (SNA) and coincidentally (and somewhat concerningly) for a group project assignment Eoghan Keegan3, Isabel Sicardi Rosell4 and myself created a simulation study of how a virus closely modeled on the 2002 SARS virus would spread across the USA airport network to understand the extent of a potential outbreak and to assess what the most effective vaccination strategy would be.

Using Python to create the simulation model and Gephi to visualize results we were able to determine that in the absence of a complete transportation shutdown, an evolving strategy of assigning limited resources to the airports with the most connections to other airports on a day by day basis was the best way to combat the spread of the virus.

SNA is a useful tool to understand complex and dynamic relationships between entities, objects or people. The name SNA can be misleading- while it’s origins come from the study of sociology it can be applied to a range of topics. In biochemistry SNA is used to understand the relationships of proteins in the human body while in engineering it’s used to fault test power grids. In this instance, we used it to understand the relationships between airports in the U.S. One of the many benefits of SNA is that because each network generates mathematical properties that allow us to understand things like how closely networks are connected, we can compare how connected one network is compared to another. These properties characterize networks.

After sourcing internal U.S flight data from the year 2008 we created a network of flights between airports using each airport as the nodes (the object we want to understand) and the flights between each node as the edges (the relationship between each object). The graph to the left is the network graph created with each circle representing an airport (the nodes) and each line representing a flight connection (the edges) and is organised by its geographical location.

The next step in our process was to understand the strength of the relationships between airports because this is critical in understanding how a disease will spread throughout a network. In SNA, you can measure the strength of relationships in a number of ways and this is called the weight of the relationship. The bigger the weight, the stronger the relationship. In this instance we used the number of flights between airports as the weights.

Measures of Centrality

There are a variety of methods that can be used to determine which airports were the most important in our network and these are known as measures of centrality of a node:

  • degree centrality is the number of edges directly connected to the node. An example of a European airport with high degree centrality would be Amsterdam because it acts as a hub to other airports.
  • closeness centrality is the measure of the average length of the shortest path between the node and all other nodes – essentially, how many steps away is the airport from connecting to another airport. An example is flying from Dublin to Singapore – we would need to stop over in London to continue our journey, so in this instance the length of the journey is 1. We calculate this for every node and average the result, leaving our closeness centrality measurement.
  • betweenness centrality measures how many times the node is on the shortest path between nodes. Going back to our Amsterdam airport example, this would have a high betweenness measure because it acts as a connection point for many of the worlds airports.
  • eigenvector centrality ranks each airports importance by the number of important airports it’s connected to on a scale of 0 to 1 with 1 being the most important – essentially, how important airports are depends on the importance of the airports it’s connected to.

We decided to measure the importance of airports using eigenvector centrality. In the graph to the right, each airports node size is relative to its eigenvector centrality rating. The bigger the circle, the greater the importance. Our results indicate that Hartsfield Jackson Atlanta International Airport was the most important in 2008 with Chicago O’Hare International and Memphis International following closely.

A useful tool in Gephi is the ability to cluster nodes together using modularity. Modularity allows us to assign each airport to a cluster depending on the shared connections between them. If you take another look at the above graph you’ll notice there are 4 distinct colors – these represent the clusters each airport relate to. Taking the North Eastern seaboard as an example (those nodes highlighted in green), while airports in this cluster have connections to airports in other clusters, our results indicate that they are highly connected to each other. The same can be said about the South Eastern seaboard while we have more of a mix between Central and Western clusters.

A final note on the network graph created by the U.S airport network is that is creates a scale free network. Without going too much into details, it shows us that the nodes with the higher eigenvector centrality are at the centre of the network representing how important they are. An important characteristic of this type of network is that it is susceptible to targeted attacks. By removing a few of these important nodes we’re able to break the network. Taking the left most node in the graph as an example – if we were trying to get from there across to the right most node in the graph, we need to go through the central nodes. If the central nodes don’t exist, it may be impossible to get to the right node – thus our network is susceptible to targeted attacks!

Our Simulation

The Virus

As with all simulation studies, parameters have to be set. To design the virus we researched various outbreaks and decided our hypothetical virus would be based on the SARS virus from 2002. Our virus had a rapid onset time resulting in immediate transferal upon contraction, an infection rate of 19% and a death rate of 9.6%.

The Model

Our model is based on a susceptible-infected-removed model with each day representing 1 time period. This model calculates the nodes population on a daily basis and is made up of S (susceptible), I (infected) and R (removed). Susceptible represents the population susceptible to infection from its neighbors, infected represents the population who carry the virus and removed represents the population that has been removed from the model either due to recovery after contracting the virus or death as a result of the virus. For more detailed information of our model such as our random implementation of outbreak of disease, please see the download button at the bottom of this post for our full paper.

The Simulation

We proposed that we could engineer a vaccine within 30 days of the virus outbreak (very unrealistic in real life!) so we ran the model for 30 days without interruption. We implemented several vaccine strategies after this point that provided an immediate 25% reduction of infection with diminishing returns once introduced. Most of our strategies were resource allocation calculations, meaning we decided how to allocate resources based on the importance of nodes as calculated by measures of centrality like degree and closeness. We also decided to attempt to split our resources evenly across the network as a comparison.

The Results

In total there were over 7 million flights transporting 534 million passengers between 516 airports. When left unchecked, the hypothetical virus infected 2.2 million passengers. While 2.2 million passengers doesn’t sound like a lot, our population is closed meaning we don’t consider people who aren’t travelling and how the disease spreads among them.

By distributing resources evenly we showed that the infection rate drops to 1.1 million but when we use SNA measures to allocate resources we found that a greedy algorithm was most effective. We call it a greedy algorithm because it targets the node with he most infected population at each iteration and allocates all resources to that node for that day. Using the greedy algorithm, only 370,000 passengers were infected – a third when compared to even distribution.

Concluding Remarks

While some parameters of our hypothetical model are unrealistic, the results are concerning. It shows that the rapid spread of contagious viruses are a genuine threat. Taking another look at the infection rate graph, we can see that there is an exponential growth of the virus in the first 30 days – information that is readily available to all epidemic specialists and it is with this lens that we can perhaps conclude that China’s radical response to 2019 n-CoV is the correct one.

References

1. https://www.ecdc.europa.eu/en/novel-coronavirus-china

2. https://www.livescience.com/new-china-coronavirus-faq.html

3. https://www.linkedin.com/in/eoghan-keegan-76486310b/

4. https://www.linkedin.com/in/isabelsicardi/

A Disease Epidemic Study