Today, I am going to re-implement in clojure a part of a networking analysis done by John R. Ladd, Jessica Otis, Christopher N. Warren, and Scott Weingart titled “Exploring and Analyzing Network Data with Python” at Programming Historian. The original analysis is done with python using networkx. I am going to re-implement the part of the analysis about finding maximum cliques within the network. I am going to use JGraphT which is as extensive as networkx in its functionality. The rest of the analysis can too be repeated within clojure.
The network data is from the Six Degrees of Francis Bacon project. Project aims to show and analyze the social networks of early modern Britain (1500-1700). The dataset composed of pairs of important Quaker figures that know each other. The data is culled from the Oxford Dictionary of National Biography. I collected two files: a list of nodes and a list of edges from Programming Historian.
Here is a small sample from the nodes list:
Name, Historical Significance, Gender, Birthdate, Deathdate, ID
Joseph Wyeth, religious writer, male, 1663, 1731, 10013191
Alexander Skene of Newtyle, local politician and author, male, 1621, 1694, 10011149
James Logan, colonial official and scholar, male, 1674, 1751, 10007567
Dorcas Erbery, Quaker preacher, female, 1656, 1659, 10003983
Lilias Skene, Quaker preacher and poet, male, 1626, 1697, 10011152
and a sample from the edge list:
Source, Target
George Keith, Robert Barclay
George Keith, Benjamin Furly
George Keith, Anne Conway Viscountess Conway and Killultagh
George Keith, Franciscus Mercurius van Helmont
George Keith, William Penn
George Keith, George Fox
George Keith, George Whitehead
George Keith, William Bradford
James Parnel, Benjamin Furly
James Parnel, Stephen Crisp
Peter Collinson, John Bartram
Peter Collinson, James Logan
I am going to use JGraphT,
and it needs to declared in the list of dependencies. Here’s my
deps.edn
dependencies:
{ :deps {org.jgrapht/jgrapht-ext {:mvn/version "1.5.1"}
org.jgrapht/jgrapht-opt {:mvn/version "1.5.1"}
org.jgrapht/jgrapht-io {:mvn/version "1.5.1"}
org.jgrapht/jgrapht-core {:mvn/version "1.5.1"}}
:mvn/repos { "central" {:url "https://repo1.maven.org/maven2/"}
"clojars" {:url "https://repo.clojars.org/"}}}
Next, the namespace:
ns quakers
(:require [clojure.set :as s])
(:require [clojure.java.io :as io])
(:import [java.io File])
(:import [org.jgrapht.alg.clique BronKerboschCliqueFinder])
(:import [org.jgrapht.alg.scoring PageRank])
(:import [org.jgrapht.graph DefaultEdge SimpleGraph DefaultEdge])) (
Let me start by a utility code that reads the necessary file and converts into a usable sequence:
defn from-file [file processor]
(->> file
(
File.
io/reader line-seq
map processor)
(rest))
#'quakers/from-file
Let us first ingest the nodes. For this analysis, I care only about the names. So, I’ll ingest just those. Instead of showing the whole list, I’ll show the first 5 names.
def nodes (from-file "quakers_nodelist.csv"
(-> % (clojure.string/split #",") first)))
#(->> nodes (take 5) (into [])) (
#'quakers/nodes
"Joseph Wyeth" "Alexander Skene of Newtyle" "James Logan" "Dorcas Erbery" "Lilias Skene"] [
Next, I’ll ingest the edges which are pairs of nodes. As before, I’ll show only the first 5:
def edges (from-file "quakers_edgelist.csv"
(-> % (clojure.string/split #","))))
#(->> edges (take 5) (into [])) (
#'quakers/edges
"George Keith" "Robert Barclay"] ["George Keith" "Benjamin Furly"] ["George Keith" "Anne Conway Viscountess Conway and Killultagh"] ["George Keith" "Franciscus Mercurius van Helmont"] ["George Keith" "William Penn"]] [[
Now, I’ll construct the graph.
def graph (SimpleGraph. DefaultEdge))
(doseq [x nodes] (.addVertex graph x))
(doseq [[x y] edges] (.addEdge graph x y)) (
#'quakers/graph
The next step is to calculate the maximal cliques within the network and list them:
def result (->> graph
(
BronKerboschCliqueFinder.
.maximumIteratoriterator-seq
map seq)
(into [])))
( result
#'quakers/result
"Edward Burrough" "George Fox" "John Crook" "William Dewsbury") ("Edward Burrough" "George Fox" "John Crook" "John Perrot") ("James Nayler" "Edward Burrough" "George Fox" "Francis Howgill") ("James Nayler" "Edward Burrough" "George Fox" "Thomas Ellwood") ("James Nayler" "Edward Burrough" "George Fox" "John Perrot") ("William Penn" "George Fox" "George Keith" "George Whitehead") ("Benjamin Furly" "William Penn" "George Fox" "George Keith") ("James Nayler" "Richard Farnworth" "Margaret Fell" "Anthony Pearson")] [(
There are 8 maximal cliques each with 4 names in them:
count result) (map count result)] [(
8 (4 4 4 4 4 4 4 4)] [
In this part of the post, I’ll do something different. Let me apply the PageRank algorithm to this network to find the important nodes. As before, I’ll display the top 5 nodes.
def result2 (->> graph
(
PageRank.
.getScoresinto {})
(sort-by second)
(reverse))
->> result2 (map first) (take 5) (into [])) (
#'quakers/result2
"George Fox" "William Penn" "James Nayler" "George Whitehead" "Margaret Fell"] [