After playing with faculty hiring data yesterday, I needed to play with faculty hiring networks for math, but AMS does not disclose its data for public use. Shame! But there is a proxy data that anybody can play with: Math Genealogy database.
Math Genealogy Project doesn’t disclose its data either, but you can scrape it off of its several mirrors. There are some scrapers out there. I used Konstantin Kefaloukos’ code to do that part of the job.
Next, I generated two tab separated CSV files: vertex data and edge data. The vertex data consists of 5 columns: person id of the PhD recipient, name of the PhD granting school, school’s country, school’s id and the year of the PhD. The edge data consists of advisor id, PhD recipient’s id and the year of the PhD.
Using the edge data, now I can form a link between two schools: the school that PhD advisor got his PhD and the school from which PhD is granted.
There are millions of holes in this model, but in the absence of publicly available (Yes AMS, I mean you!) anonymized hiring data for math PhD’s this will do.
The unadultrated graph is way too complicated. So, I decided to split it historically. For the last two graphs, I have 2 versions: a full version and a version from which the US schools are removed (too many connections obscuring vertices, too noisy.) I chose time frames that made sense to me, but you are free to play with the data as you wish.