Skip to Tutorial Content

Setting up

In the last tutorial, we looked at network centrality and centralisation. Nodal centrality is often thought to be an expression of structural inequality. But nodes don’t always seek to differentiate themselves in a network. Sometimes they are interested in being part of a group. In this tutorial, we’re going to consider various ways in which ‘groupness’ might be examined.

gif of man not knowing how to study

The data we’re going to use here, “ison_algebra”, is included in the {manynet} package. Do you remember how to call the data? Can you find out some more information about it?

# Let's call and load the 'ison_algebra' dataset
data("ison_algebra", package = "manynet")
# Or you can retrieve like this:
ison_algebra <- manynet::ison_algebra
# If you want to learn more about the 'ison_algebra' dataset, use the following function (below)
?manynet::ison_algebra
data("ison_algebra", package = "manynet")
?manynet::ison_algebra
# If you want to see the network object, you can run the name of the object
ison_algebra
# or print the code with brackets at the front and end of the code
(ison_algebra <- manynet::ison_algebra)

We can see after printing the object that the dataset is multiplex, meaning that it contains several different types of ties: friendship (friends), social (social) and task interactions (tasks).

Adding names

The network is also anonymous, but I think it would be nice to add some names, even if it’s just pretend. Luckily, {manynet} has a function for this, to_named(). This makes plotting the network just a wee bit more accessible and interpretable. Let’s try adding names and graphing the network now:

ison_algebra <- to_named(ison_algebra)
graphr(ison_algebra)
ison_algebra <- to_named(ison_algebra)
graphr(ison_algebra)

Note that you will likely get a different set of names, as they are assigned randomly from a pool of (American) first names.

Separating multiplex networks

As a multiplex network, there are actually three different types of ties (friends, social, and tasks) in this network. We can extract them and graph them separately using to_uniplex():

# to_uniplex extracts ties of a single type,
# focusing on the 'friends' tie attribute here
friends <- to_uniplex(ison_algebra, "friends")
gfriend <- graphr(friends) + ggtitle("Friendship")
# now let's focus on the 'social' tie attribute
social <- to_uniplex(ison_algebra, "social")
gsocial <- graphr(social) + ggtitle("Social")
# and the 'tasks' tie attribute
tasks <- to_uniplex(ison_algebra, "tasks")
gtask <- graphr(tasks) + ggtitle("Task")
# now, let's compare each attribute's graph, side-by-side
gfriend + gsocial + gtask
# if you get an error here, you may need to install and load
# the package 'patchwork'.
# It's highly recommended for assembling multiple plots together.
# Otherwise you can just plot them separately on different lines.
friends <- to_uniplex(ison_algebra, "friends")
gfriend <- graphr(friends) + ggtitle("Friendship")

social <- to_uniplex(ison_algebra, "social")
gsocial <- graphr(social) + ggtitle("Social")

tasks <- to_uniplex(ison_algebra, "tasks")
gtask <- graphr(tasks) + ggtitle("Task")

# We now have three separate networks depicting each type of tie from the ison_algebra network:
gfriend + gsocial + gtask

Note also that these are weighted networks. graphr() automatically recognises these different weights and plots them. Where useful (less dense directed networks), graphr() also bends reciprocated arcs. What (else) can we say about these three networks?

Cohesion

Let’s concentrate on the task network for now and calculate a few basic measures of cohesion: density, reciprocity, transitivity, and components.

Density

Density represents a generalised measure of cohesion, characterising how cohesive the network is in terms of how many potential ties (i.e. dyads) are actualised. Recall that there are different equations depending on the type of network. Below are three equations:

\[A: \frac{|T|}{|N|(|N|-1)}\] \[B: \frac{2|T|}{|N|(|N|-1)}\] \[C: \frac{|T|}{|N||M|}\]

where \(|T|\) is the number of ties in the network, and \(|N|\) and \(|M|\) are the number of nodes in the first and second mode respectively.

One can calculate the density of the network using the number of nodes and the number of ties using the functions net_nodes() and net_ties(), respectively:

# calculating network density manually according to equation
net_ties(tasks)/(net_nodes(tasks)*(net_nodes(tasks)-1))

but we can also just use the {manynet} function for calculating the density, which always uses the equation appropriate for the type of network…

net_density(tasks)

Note that the various measures in {manynet} print results to three decimal points by default, but the underlying result retains the same recurrence. So same result…

Density offers an important baseline measure for characterising the network as a whole.

Closure

In this section we’re going to move from generalised measures of cohesion, like density, to more localised measures of cohesion. These are not only measures of cohesion though, but are also often associated with certain mechanisms of closure. Closure involves ties being more likely because other ties are present. There are two common examples of this in the literature: gloss("reciprocity"), where a directed tie is often likely to prompt a reciprocating tie, and gloss("transitivity"), where a directed two-path is likely to be shortened by an additional arc connecting the first and third nodes on that path.

gif of ah ha gotcha

Reciprocity

First, let’s calculate reciprocity in the task network. While one could do this by hand, it’s more efficient to do this using the {manynet} package. Can you guess the correct name of the function?

net_reciprocity(tasks)
# this function calculates the amount of reciprocity in the whole network

Wow, this seems quite high based on what we observed visually! But if we look closer, this makes sense. We can use tie_is_reciprocated() to identify those ties that are reciprocated and not.

tasks %>% mutate_ties(rec = tie_is_reciprocated(tasks)) %>% graphr(edge_color = "rec")
net_indegree(tasks)

So we can see that indeed there are very few asymmetric ties, and yet node 16 is both the sender and receiver of most of the task activity. So our reciprocity measure has taught us something about this network that might not have been obvious visually.

Transitivity

And let’s calculate transitivity in the task network. Again, can you guess the correct name of this function?

net_transitivity(tasks)
# this function calculates the amount of transitivity in the whole network

Projection

A two-mode network

The next dataset, ‘ison_southern_women’, is also available in {manynet}. Let’s load and graph the data.

# let's load the data and analyze it
data("ison_southern_women")
ison_southern_women
graphr(ison_southern_women, node_color = "type")
graphr(ison_southern_women, "railway", node_color = "type")
data("ison_southern_women")
ison_southern_women
graphr(ison_southern_women, node_color = "type")

Project two-mode network into two one-mode networks

Now what if we are only interested in one part of the network? For that, we can obtain a ‘projection’ of the two-mode network. There are two ways of doing this. The hard way…

twomode_matrix <- as_matrix(ison_southern_women)
women_matrix <- twomode_matrix %*% t(twomode_matrix)
event_matrix <- t(twomode_matrix) %*% twomode_matrix

Or the easy way:

# women-graph
# to_mode1(): Results in a weighted one-mode object that retains the row nodes from
# a two-mode object, and weights the ties between them on the basis of their joint
# ties to nodes in the second mode (columns)

women_graph <- to_mode1(ison_southern_women)
graphr(women_graph)

# note that projection `to_mode1` involves keeping one type of nodes
# this is different from to_uniplex above, which keeps one type of ties in the network

# event-graph
# to_mode2(): Results in a weighted one-mode object that retains the column nodes from
# a two-mode object, and weights the ties between them on the basis of their joint ties
# to nodes in the first mode (rows)

event_graph <- to_mode2(ison_southern_women)
graphr(event_graph)

{manynet} also includes several other options for how to construct the projection. The default (“count”) might be interpreted as indicating the degree of opportunity between nodes that comes from sharing ties to the other mode. “jaccard” divides this count by the number of nodes in the other mode that to which either of the nodes are tied. It can thus be interpreted as opportunity weighted by participation. “rand” instead counts both shared ties and shared absences, and can thus be interpreted as the degree of behavioural mirroring between the nodes. Lastly, “pearson” (Pearson’s coefficient) and “yule” (Yule’s Q) produce correlations in ties for valued and binary data respectively.

to_mode2(ison_southern_women, similarity = "jaccard")
to_mode2(ison_southern_women, similarity = "rand")
to_mode2(ison_southern_women, similarity = "pearson")
to_mode2(ison_southern_women, similarity = "yule")

Let’s return to the question of closure. First try one of the closure measures we have already treated that gives us a sense of shared partners for one-mode networks. Then compare this with net_equivalency(), which can be used on the original two-mode network.

# net_transitivity(): Calculate transitivity in a network

net_transitivity(women_graph)
net_transitivity(event_graph)
# net_equivalency(): Calculate equivalence or reinforcement in a (usually two-mode) network

net_equivalency(ison_southern_women)
net_transitivity(women_graph)
net_transitivity(event_graph)
net_equivalency(ison_southern_women)

Try to explain in no more than a paragraph why projection can lead to misleading transitivity measures and what some consequences of this might be.

Components

Now let’s look at the friendship network, ‘friends’. We’re interested here in how many components there are. By default, the net_components() function will return the number of strong components for directed networks. For weak components, you will need to first make the network undirected . Remember the difference between weak and strong components?

net_components(friends)
# note that friends is a directed network
# you can see this by calling the object 'friends'
# or by running `is_directed(friends)`
# Now let's look at the number of components for objects connected by an undirected edge
# Note: to_undirected() returns an object with all tie direction removed, 
# so any pair of nodes with at least one directed edge 
# will be connected by an undirected edge in the new network.
net_components(to_undirected(friends))
# note that friends is a directed network
net_components(friends)
net_components(to_undirected(friends))

So we know how many components there are, but maybe we’re also interested in which nodes are members of which components? node_components() returns a membership vector that can be used to color nodes in graphr():

friends <- friends %>% 
  mutate(weak_comp = node_components(to_undirected(friends)),
         strong_comp = node_components(friends))
# node_components returns a vector of nodes' memberships to components in the network
# here, we are adding the nodes' membership to components as an attribute in the network
# alternatively, we can also use the function `add_node_attribute()`
# eg. `add_node_attribute(friends, "weak_comp", node_components(to_undirected(friends)))`
graphr(friends, node_color = "weak_comp") + ggtitle("Weak components") +
graphr(friends, node_color = "strong_comp") + ggtitle("Strong components")
# by using the 'node_color' argument, we are telling graphr to colour 
# the nodes in the graph according to the values of the 'weak_comp' attribute in the network 
friends <- friends %>% 
  mutate(weak_comp = node_components(to_undirected(friends)),
         strong_comp = node_components(friends))
graphr(friends, node_color = "weak_comp") + ggtitle("Weak components") +
graphr(friends, node_color = "strong_comp") + ggtitle("Strong components")

Factions

Components offer a precise way of understanding groups in a network. However, they can also ignore some ‘groupiness’ that is obvious to even a cursory examination of the graph. The irps_blogs network concerns the url links between political blogs in the 2004 election. It is a big network (you can check below). In our experience, it can take a few seconds

# This is a large network
net_nodes(irps_blogs)
# Let's concentrate on just a sample of 490
blogs <- delete_nodes(irps_blogs, sample(1:1490, 1000))
graphr(blogs)

But are they all actually linked? Even among the smaller sample, there seems to be a number of isolates. We can calculate the number of isolates by simply summing node_is_isolate().

sum(node_is_isolate(blogs))

Since there are many isolates, there will be many components, even if we look at weak components and not just strong components.

net_components(blogs)
net_components(to_undirected(blogs))

Giant component

So, it looks like most of the (weak) components are due to isolates! How do we concentrate on the main component of this network? Well, the main/largest component in a network is called the giant component .

blogs <- blogs %>% to_giant()
sum(node_is_isolate(blogs))
graphr(blogs)

Finally, we have a single ‘giant’ component to examine. However, now we have a different kind of challenge: everything is one big hairball. And yet, if we think about what we might expect of the structure of a network of political blogs, we might not think it is so undifferentiated. We might hypothesise that, despite the graphical presentation of a hairball, there is actually a reasonable partition of the network into two factions.

Finding a partition

To find a partition in a network, we use the node_in_partition() function. All node_in_*() functions return a string vector the length of the number of nodes in the network. It is a string vector because this is how a categorical result is obtained. We can assign the result of this function to the nodes in the network (because it is the length of the nodes in the network), and graph the network using this result.

blogs %>% mutate_nodes(part = node_in_partition()) %>% 
  graphr(node_color = "part")

We see from this graph that indeed there seems to be an obvious separation between the left and right ‘hemispheres’ of the network.

Modularity

But what is the ‘fit’ of this assignment of the blog nodes into two partitions? The most common measure of the fit of a community assignment in a network is modularity.

net_modularity(blogs, membership = node_in_partition(blogs))

Remember that modularity ranges between 1 and -1. How can we interpret this result?

While the partition algorithm is useful for deriving a partition of the network into the number of factions assigned, it is still an algorithm that tries to maximise modularity. Other times we might instead have an empirically collected grouping, and we are keen to see how ‘modular’ the network is around this attribute. This only works on categorical attributes, of course, but is otherwise quite flexible.

graphr(blogs, node_color = "Leaning")
net_modularity(blogs, membership = node_attribute(blogs, "Leaning"))

gif of Chevy Chase saying plot twist

How interesting. Perhaps the partitioning algorithm is not the algorithm that maximises modularity after all… Perhaps we need to look further and see whether there is another solution here that returns an even greater modularity criterion.

Communities

Ok, the friendship network has 3-4 components, but how many ‘groups’ are there? Visually, it looks like there are two denser clusters within the main component.

Today we’ll use the ‘friends’ subgraph for exploring community detection methods. For clarity and simplicity, we will concentrate on the main component (the so-called ‘giant’ component) and consider friendship undirected.

# to_giant() returns an object that includes only the main component without any smaller components or isolates
(friends <- to_giant(friends))
(friends <- to_undirected(friends))
graphr(friends)

Comparing friends before and after these operations, you’ll notice the number of ties decreases as reciprocated directed ties are consolidated into single undirected ties, and the number of nodes decreases as two isolates are removed.

There is no one single best community detection algorithm. Instead there are several, each with their strengths and weaknesses. Since this is a rather small network, we’ll focus on the following methods: walktrap, edge betweenness, and fast greedy. (Others are included in {manynet}/{igraph}) As you use them, consider how they portray communities and consider which one(s) afford a sensible view of the social world as cohesively organized.

Walktrap

This algorithm detects communities through a series of short random walks, with the idea that nodes encountered on any given random walk are more likely to be within a community than not. It was proposed by Pons and Latapy (2005).

The algorithm initially treats all nodes as communities of their own, then merges them into larger communities, still larger communities, and so on. In each step a new community is created from two other communities, and its ID will be one larger than the largest community ID so far. This means that before the first merge we have n communities (the number of vertices in the graph) numbered from zero to n-1. The first merge creates community n, the second community n+1, etc. This merge history is returned by the function: # ?igraph::cluster_walktrap

Note the “steps=” argument that specifies the length of the random walks. While {igraph} sets this to 4 by default, which is what is recommended by Pons and Latapy, Waugh et al (2009) found that for many groups (Congresses), these lengths did not provide the maximum modularity score. To be thorough in their attempts to optimize modularity, they ran the walktrap algorithm 50 times for each group (using random walks of lengths 1–50) and selected the network partition with the highest modularity value from those 50. They call this the “maximum modularity partition” and insert the parenthetical “(though, strictly speaking, this cannot be proven to be the optimum without computationally-prohibitive exhaustive enumeration (Brandes et al. 2008)).”

So let’s try and get a community classification using the walktrap algorithm, node_in_walktrap(), with path lengths of the random walks specified to be 50.

friend_wt <- node_in_walktrap(friends, times=50)
friend_wt # note that it prints pretty, but underlying its just a vector:
c(friend_wt)
# This says that dividing the graph into 2 communities maximises modularity,
# one with the nodes 
which(friend_wt == 1)
# and the other 
which(friend_wt == 2)
# resulting in a modularity of 
net_modularity(friends, friend_wt)
friend_wt <- node_in_walktrap(friends, times=50)
# results in a modularity of 
net_modularity(friends, friend_wt)

We can also visualise the clusters on the original network How does the following look? Plausible?

# plot 1: groups by node color

friends <- friends %>% 
  mutate(walk_comm = friend_wt)
graphr(friends, node_color = "walk_comm")
#plot 2: groups by borders

# to be fancy, we could even draw the group borders around the nodes using the node_group argument
graphr(friends, node_group = "walk_comm")
# plot 3: group and node colors

# or both!
graphr(friends,
       node_color = "walk_comm",
       node_group = "walk_comm") +
  ggtitle("Walktrap",
    subtitle = round(net_modularity(friends, friend_wt), 3))
# the function `round()` rounds the values to a specified number of decimal places
# here, we are telling it to round the net_modularity score to 3 decimal places,
# but the score is exactly 0.27 so only two decimal places are printed.
friends <- friends %>% 
  mutate(walk_comm = friend_wt)
graphr(friends, node_color = "walk_comm")
# to be fancy, we could even draw the group borders around the nodes using the node_group argument
graphr(friends, node_group = "walk_comm")
# or both!
graphr(friends,
       node_color = "walk_comm",
       node_group = "walk_comm") +
  ggtitle("Walktrap",
    subtitle = round(net_modularity(friends, friend_wt), 3))

This can be helpful when polygons overlap to better identify membership Or you can use node color and size to indicate other attributes…

Edge Betweenness

Edge betweenness is like betweenness centrality but for ties not nodes. The edge-betweenness score of an edge measures the number of shortest paths from one vertex to another that go through it.

The idea of the edge-betweenness based community structure detection is that it is likely that edges connecting separate clusters have high edge-betweenness, as all the shortest paths from one cluster to another must traverse through them. So if we iteratively remove the edge with the highest edge-betweenness score we will get a hierarchical map (dendrogram) of the communities in the graph.

The following works similarly to walktrap, but no need to set a step length.

friend_eb <- node_in_betweenness(friends)
friend_eb

How does community membership differ here from that found by walktrap?

We can see how the edge betweenness community detection method works here: http://jfaganuk.github.io/2015/01/24/basic-network-analysis/

To visualise the result:

# create an object

friends <- friends %>% 
  mutate(eb_comm = friend_eb)
# create a graph with a title and subtitle returning the modularity score

graphr(friends,
       node_color = "eb_comm",
       node_group = "eb_comm") +
  ggtitle("Edge-betweenness",
    subtitle = round(net_modularity(friends, friend_eb), 3))
friends <- friends %>% 
  mutate(eb_comm = friend_eb)
graphr(friends,
       node_color = "eb_comm",
       node_group = "eb_comm") +
  ggtitle("Edge-betweenness",
    subtitle = round(net_modularity(friends, friend_eb), 3))

For more on this algorithm, see M Newman and M Girvan: Finding and evaluating community structure in networks, Physical Review E 69, 026113 (2004), https://arxiv.org/abs/cond-mat/0308217.

Fast Greedy

This algorithm is the Clauset-Newman-Moore algorithm. Whereas edge betweenness was divisive (top-down), the fast greedy algorithm is agglomerative (bottom-up).

At each step, the algorithm seeks a merge that would most increase modularity. This is very fast, but has the disadvantage of being a greedy algorithm, so it might not produce the best overall community partitioning, although I personally find it both useful and in many cases quite “accurate”.

friend_fg <- node_in_greedy(friends)
friend_fg # Does this result in a different community partition?
net_modularity(friends, friend_fg) # Compare this to the edge betweenness procedure
# Again, we can visualise these communities in different ways:
friends <- friends %>% 
  mutate(fg_comm = friend_fg)
graphr(friends,
       node_color = "fg_comm",
       node_group = "fg_comm") +
  ggtitle("Fast-greedy",
    subtitle = round(net_modularity(friends, friend_fg), 3))
# 
friend_fg <- node_in_greedy(friends)
friend_fg # Does this result in a different community partition?
net_modularity(friends, friend_fg) # Compare this to the edge betweenness procedure

# Again, we can visualise these communities in different ways:
friends <- friends %>% 
  mutate(fg_comm = friend_fg)
graphr(friends,
       node_color = "fg_comm",
       node_group = "fg_comm") +
  ggtitle("Fast-greedy",
    subtitle = round(net_modularity(friends, friend_fg), 3))

See A Clauset, MEJ Newman, C Moore: Finding community structure in very large networks, https://arxiv.org/abs/cond-mat/0408187

Free play

gif of two dancing

We’ve looked here at the irps_blogs dataset. Now have a go at the irps_books dataset. What is the density? Does it make sense to investigate reciprocity, transitivity, or equivalence? How can we interpret the results? How many components in the network? Is there a strong factional structure? Which community detection algorithm returns the highest modularity score, or corresponds best to what is in the data or what you see in the graph?

irps_books

Glossary

gif of annie raising her hand

Here are some of the terms that we have covered in this module:

Component
A component is a connected subgraph not part of a larger connected subgraph.
Giant
The giant component is the component that includes the most nodes in the network.
Reciprocity
A measure of how often nodes in a directed network are mutually linked.
Transitivity
Triadic closure is where if the connections A-B and A-C exist among three nodes, there is a tendency for B-C also to be formed.
Undirected
An undirected network is one in which tie direction is undefined.

Cohesion and Community

by James Hollway