I realized that I haven’t posted anything on data analysis lately, or something with python. When I found that Civic Impulse has been collecting data on voting records of the US Congress, I thought I would type up something.
Polarization of the Congress and the Senate has been described as a big problem in the US politics. Could we actually see or quantify this polarization from the voting behaviours of the US Congress?
First, I am going to need some libraries:
import glob as g
import json
I downloaded data from govTrack.us. I arranged the data in directories separated by years.
First, I will define a function that will process the data
def process(year):
data = {}
bills = g.glob(str(year)+'/h*')
for x in bills:
with open(x+'/data.json') as input:
raw = json.load(input)
if(raw[u'category'] == u'passage'):
data.update({x: raw[u'votes']})
return(data)
I will extract ’Yea’s and ’Nay’s as follows:
def extract(data):
z = ['y','n','x']
votes = {}
for x in data.keys():
try:
data[x][u'Aye']
y = ['Aye','No','Not Voting']
except:
y = ['Yea','Nay','Not Voting']
res = {}
for i in range(3):
temp = {u'D':0, u'R':0, u'I':0}
for j in data[x][y[i]]:
temp[j[u'party']]+=1
res.update({z[i]: temp})
votes.update({x: res})
return(votes)
And finally a function that calculates polarization. I used the following idea: if \(a\) and \(b\) are Democrats and Republicans on an issue voting a particular way then I will say the polarization is \(\frac{\|a-b\|}{a+b}\). Then I will sum up these values for Yea’s and Nay’s.
The following function returns the percentage of bipartisan bills from a given year’s voting data. I will consider a specific voting bipartisan if the polarization measure is less than or equal to 0.5.
def polarization(votes):
result = 0
for x in votes.keys():
local = votes[x]
measure = abs(local['y']['D']-local['y']['R'])
measure += abs(local['n']['D']-local['n']['R'])
measure += abs(local['x']['D']-local['x']['R'])
total = 0
for k in ['y','n','x']:
for l in ['D','R','I']:
total += local[k][l]
measure /= (1.0*total)
if(measure<=0.5):
result+=1
return(100.0*result/len(votes))
Let us run this over our data:
Year | Percent |
---|---|
1990 | 61.38 |
1991 | 59.06 |
1992 | 47.40 |
1993 | 24.29 |
1994 | 40.83 |
1995 | 37.63 |
1996 | 48.74 |
1997 | 53.24 |
1998 | 45.58 |
1999 | 50.61 |
2000 | 58.44 |
2001 | 54.96 |
2002 | 50.00 |
2003 | 44.52 |
2004 | 50.00 |
2005 | 46.10 |
2006 | 33.33 |
2007 | 20.52 |
2008 | 15.33 |
2009 | 10.12 |
2010 | 10.16 |
2011 | 14.62 |
2012 | 12.38 |
2013 | 11.63 |
2014 | 17.05 |
Here is the plot of the data:
Something must happened in 1993. My guess is that prior to 1994 when the Republicans gained control of both chambers first time in the last 40 years, the tensions in the Congress must have been high. But of course, this is a wild guess. I am not a political scientist. One can also see that the situation got worse during the second term of Bush the Second, and has not improved since.