The Kitchen Sink and Other Oddities

Atabey Kaygun

A Clique Analysis of Quakers in early modern Britain (1500-1700)

A description of the problem

Today, I am going to re-implement in clojure a part of a networking analysis done by John R. Ladd, Jessica Otis, Christopher N. Warren, and Scott Weingart titled “Exploring and Analyzing Network Data with Python” at Programming Historian. The original analysis is done with python using networkx. I am going to re-implement the part of the analysis about finding maximum cliques within the network. I am going to use JGraphT which is as extensive as networkx in its functionality. The rest of the analysis can too be repeated within clojure.

Data

The network data is from the Six Degrees of Francis Bacon project. Project aims to show and analyze the social networks of early modern Britain (1500-1700). The dataset composed of pairs of important Quaker figures that know each other. The data is culled from the Oxford Dictionary of National Biography. I collected two files: a list of nodes and a list of edges from Programming Historian.

Here is a small sample from the nodes list:

 Name, Historical Significance, Gender, Birthdate, Deathdate, ID
 Joseph Wyeth, religious writer, male, 1663, 1731, 10013191
 Alexander Skene of Newtyle, local politician and author, male, 1621, 1694, 10011149
 James Logan, colonial official and scholar, male, 1674, 1751, 10007567
 Dorcas Erbery, Quaker preacher, female, 1656, 1659, 10003983
 Lilias Skene, Quaker preacher and poet, male, 1626, 1697, 10011152

and a sample from the edge list:

 Source, Target
 George Keith, Robert Barclay
 George Keith, Benjamin Furly
 George Keith, Anne Conway Viscountess Conway and Killultagh
 George Keith, Franciscus Mercurius van Helmont
 George Keith, William Penn
 George Keith, George Fox
 George Keith, George Whitehead
 George Keith, William Bradford
 James Parnel, Benjamin Furly
 James Parnel, Stephen Crisp
 Peter Collinson, John Bartram
 Peter Collinson, James Logan

The Code

Preamble

I am going to use JGraphT, and it needs to declared in the list of dependencies. Here’s my deps.edn dependencies:

 { :deps {org.jgrapht/jgrapht-ext  {:mvn/version "1.5.1"}
          org.jgrapht/jgrapht-opt  {:mvn/version "1.5.1"}
          org.jgrapht/jgrapht-io   {:mvn/version "1.5.1"}
          org.jgrapht/jgrapht-core {:mvn/version "1.5.1"}}
   :mvn/repos { "central" {:url "https://repo1.maven.org/maven2/"}
                "clojars" {:url "https://repo.clojars.org/"}}}

Next, the namespace:

(ns quakers
  (:require [clojure.set :as s])
  (:require [clojure.java.io :as io])
  (:import [java.io File])
  (:import [org.jgrapht.alg.clique BronKerboschCliqueFinder])
  (:import [org.jgrapht.alg.scoring PageRank])
  (:import [org.jgrapht.graph DefaultEdge SimpleGraph DefaultEdge]))

Let me start by a utility code that reads the necessary file and converts into a usable sequence:

(defn from-file [file processor]
    (->> file
         File. 
         io/reader 
         line-seq 
         (map processor)
         rest))
#'quakers/from-file

Let us first ingest the nodes. For this analysis, I care only about the names. So, I’ll ingest just those. Instead of showing the whole list, I’ll show the first 5 names.

(def nodes (from-file "quakers_nodelist.csv" 
                      #(-> % (clojure.string/split #",") first)))
(->> nodes (take 5) (into []))
#'quakers/nodes
["Joseph Wyeth" "Alexander Skene of Newtyle" "James Logan" "Dorcas Erbery" "Lilias Skene"]

Next, I’ll ingest the edges which are pairs of nodes. As before, I’ll show only the first 5:

(def edges (from-file "quakers_edgelist.csv"
                      #(-> % (clojure.string/split #","))))
(->> edges (take 5) (into []))
#'quakers/edges
[["George Keith" "Robert Barclay"] ["George Keith" "Benjamin Furly"] ["George Keith" "Anne Conway Viscountess Conway and Killultagh"] ["George Keith" "Franciscus Mercurius van Helmont"] ["George Keith" "William Penn"]]

Now, I’ll construct the graph.

(def graph (SimpleGraph. DefaultEdge))
(doseq [x nodes] (.addVertex graph x))
(doseq [[x y] edges] (.addEdge graph x y))
#'quakers/graph

Maximal cliques

The next step is to calculate the maximal cliques within the network and list them:

(def result (->> graph
                 BronKerboschCliqueFinder.
                 .maximumIterator
                 iterator-seq
                 (map seq)
                 (into [])))
result
#'quakers/result
[("Edward Burrough" "George Fox" "John Crook" "William Dewsbury") ("Edward Burrough" "George Fox" "John Crook" "John Perrot") ("James Nayler" "Edward Burrough" "George Fox" "Francis Howgill") ("James Nayler" "Edward Burrough" "George Fox" "Thomas Ellwood") ("James Nayler" "Edward Burrough" "George Fox" "John Perrot") ("William Penn" "George Fox" "George Keith" "George Whitehead") ("Benjamin Furly" "William Penn" "George Fox" "George Keith") ("James Nayler" "Richard Farnworth" "Margaret Fell" "Anthony Pearson")]

There are 8 maximal cliques each with 4 names in them:

[(count result) (map count result)]
[8 (4 4 4 4 4 4 4 4)]

PageRank

In this part of the post, I’ll do something different. Let me apply the PageRank algorithm to this network to find the important nodes. As before, I’ll display the top 5 nodes.

(def result2 (->> graph
                 PageRank.
                 .getScores
                 (into {})
                 (sort-by second)
                 reverse))
(->> result2 (map first) (take 5) (into []))
#'quakers/result2
["George Fox" "William Penn" "James Nayler" "George Whitehead" "Margaret Fell"]