The stock market. Buy low, sell high. Unfortunately, it’s not that easy and if it was we’d all be silly rich and probably not writing blog articles on personal websites about it! Fortunately, there are a variety of tools that traders can use to gain an advantage in the stock market. One of those tools is social network analysis.
what is social network analysis?
Social network analysis (SNA) is the study of relationships between people, entities or objects that can be measured by graph properties and visualised as a network graph. The nodes (people, entities or objects) are connected by edges (weights) that allows the measurement of the relationships between them. The larger the weight, the stronger the relationship.
A simple example of an undirected network graph is a friendship circle. Person A knows person B and C but person B and C don’t know each other. Person C knows person D, who knows person E. Person E also knows person G and A. Finding it difficult to keep track? Wondering what the best way to visualise these relationships is? You’re probably thinking of drawing it out. You’re thinking of a network graph.

By drawing the connections between people we can understand the relationships between them. For example, who do you think is the most connected person in this graph? Person E. Why? Because he is connected to A, D & G. The least connected are persons B & G (they are only connected to 1 other person).
This is known as an undirected graph because the relationship goes both ways. Person A cannot be a friend of Person B if Person B isn’t their friend back. In a directed graph, the relationship is 1 way. An example of this is a direct flight from Dublin to Knock.
By applying the same logic to any dataset we can view complex relationships in compact network graphs.
applying SNA to the Irish stock market
How can we create a network graph with the returns of the Irish stock market? First we need to define our nodes and in this case, our nodes are the companies listed in the stock market. Second we need our edges – this is how we connect nodes together. For our example we can use the correlation of daily share price returns for each pair of companies. By using the correlation we can determine the strength of the relationship between the nodes.
To keep things simple we can include a threshold for correlation. If a pair of companies are moderately correlated (> 0.3) then they will be included in our dataset. This is more commonly known as the ‘winner takes all’ approach.
This can all be done in Python so the next part of this article focuses on how we can achieve this using extracts from a script I recently developed. The dataset we’re working with here are share price returns for January 2009 and is available at the bottom of this article.
our trustworthy friend, Python
First we need to calculate the correlation matrix:
Jan_2009_dropped_NAN_corr = Jan_2009_dropped_NAN_floats.corr().abs()
To manipulate the data so that we have a list of pairs of companies and their correlation coefficients we can use .stack() along with some dataframe adjustments:
df = Jan_2009_dropped_NAN_corr.stack()
df1 = pd.DataFrame(df)
df1 = df1.reset_index(level=[0])
df1 = df1.rename(columns = {'Company_Name':'Company_Name_2', 0:'Correlation_Value'})
df1 = df1.reset_index().rename(columns = {'Company_Name':'Company_Name_1'})
To finalise our new dataframe we isolate all company pairings where the correlation is greater than 0.3:
df2_reduced = df2.loc[(df2['Correlation_Value'] > 0.3)]
df2_reduced_company_pairs = df2_reduced[['Company_Name_1', 'Company_Name_2']]
To initialize and visualize our graph using the Python package networkx is quite simple:
G = nx.Graph()
G = nx.from_pandas_edgelist(df2_reduced_company_pairs, 'Company_Name_1', 'Company_Name_2')
figure(figsize=(10, 7))
nx.draw_shell(G, with_labels=True,width=1)
network graph of the Irish stock market

Et voila. A network of companies on the Irish stock market that share a correlation of > 0.3. We can see companies such as Tullow Oil and FBD holdings are connected with many companies in the graph, indicating an underlying relationship with each other.
Companies that aren’t connected don’t share a connection. For instance, Ovoca Bio is connected to PetroNeft and no other company indicating its share price is independent of other companies in the market.

As an alternative to Python, to the left is the same network graph as above but created using Gephi (software specifically developed to create network graphs). Not only does it make aesthetically pleasing graphs, it allows drag and drop customization that Python does not offer.
One example is that we can color the nodes to a gradient depending on the importance they are to the overall network with importance being measured by the number of nodes it’s connected to.
In this instance, dark green indicates most important while dark purple indicates less important – white indicates being in the middle.
so what?
When working with data it’s easy to get carried away with the doing and lose sight of the reason why data science exists – the so what? What value does this piece of work bring to the table? In our case, by applying SNA to stock market price returns we can infer how stock prices move together and how this changes over time.
As a result of SNA, we know that for January 2009 Tullow Oil was the most connected company in the network. One tactic to reduce risk taking in trading is diversification. This involves buying multiple stocks in different industries so that when the value of certain stocks decrease, the whole portfolio isn’t impacted. However if we had a limited amount of capital to invest we could use our analysis and invest only in Tullow Oil as it’s closely related to the highest number of stocks in the market and therefore a safer bet (theoretically – this is not financial advice!).
Thanks for reading. The dataset and code used to create the above graph has been included below and feel free to reach out if you’ve any questions.