The Kitchen Sink and Other Oddities

Atabey Kaygun

Using Quandl with kixi.stats on Clojure

Description of the problem

Incanter was such a let down for various reasons, and I’ve been on the market for a sensible data analysis solution on clojure.  Then I saw kixi.stats package at Elise Huard’s talk at Euro Clojure. This specific package does only descriptive statistics, but it has a nice sensible API, it is fast and relies on standard data structures as opposed to cooking their own.  This was one of my big problems with incanter.

So, today I am going to write some clojure code to fetch data from quandl and process them with kixi.stats.  

Implementation

I am writing this on gorilla-repl.  So, in my project.prj file I listed my dependencies as

(defproject quandl-clojure "0.1.0-SNAPSHOT"
  :description "Process data from quandl"
  :dependencies [[org.clojure/data.json "0.2.6"]
                 [kixi/stats "0.3.9"]
                 [clj-http "3.7.0"]]
  :main ^:skip-aot gorilla-test.core
  :target-path "target/%s"
  :plugins [[lein-gorilla "0.4.0"]]
  :profiles {:uberjar {:aot :all}})

One other option is to use boot and define your dependencies dynamically.  But not today.

Let me start with my namespace.

(ns quandl-clj
   (:require [clojure.data.json :as json])
   (:require [kixi.stats.core :as ks])
   (:require [clj-http.client :as client]))

nil

Below is the function that talks with the quandl API to download the data and convert it into a convenient map.

(defn quandl-get [api-key data-set start end]
  (-> (apply str 
             "https://www.quandl.com/api/v3/datasets/"
             data-set 
             ".json"
             "?api_key=" api-key 
             "&start_date=" start 
             "&end_date=" end)
      client/get
      (get :body)
      json/read-str
      (get "dataset")))

#'quandl-clj/quandl-get

Now that we can fetch data from quandl, let us do something with it. First, let me download some data from quandl.  In order to run the code below, you will need to get your own api-key from quandl.  I am going to assume you already have one:

 (def raw-data (map #(quandl-get *YOUR-API-KEY* % "2017-01-01" "2017-12-31") 
              '("ECB/EURCAD" "ECB/EURUSD" "ML/AAATRI" "ML/AAAEY" "NASDAQOMX/NQGI")))

#'quandl-clj/raw-data

I need to convert quandl data to a sequence of maps.  Quandl’s JSON lists column names, but I set the function so that you can also provide your own column names.

(defn process 
  ([data](let [col-names (map keyword (get data "column_names"))]
             (map (fn [xs] (zipmap col-names xs)) (get data "data"))))
  ([data col-names] (map (fn [xs] (zipmap col-names xs)) (get data "data"))))

#'quandl-clj/process

OK. Now, let me calculate the mean and the variance of EUR/CAD exchange rates.

(let [CAD (process (nth raw-data 0) '(:Date :Value))]
   {:mean     (transduce (map :Value) ks/mean CAD)
    :variance (transduce (map :Value) ks/variance CAD)})

{:mean 1.4544239361702123, :variance 0.0015927670710546838}

And, let me calculate the correlation between EUR/CAD and EUR/USD exchange rates.

(transduce identity  
           (ks/correlation :CAD :USD) 
           (map conj (process (nth raw-data 0) '(:Date :CAD)) 
                     (process (nth raw-data 1) '(:Date :USD))))

0.6493154811299391

An analysis

R-language does all of these statistical analyses and much much more, readily and easily. If you’d like to do exploratory data analysis then I would strongly suggest that you learn R and use it. One can do what I have done above with python with scipy, pandas and matplotlib, but personally I find python a little awkward to use. I mainly use it as a teaching tool.

I wrote the same code in common lisp few days ago. I like lisp: any vanilla lisp code is rock solid, and the code I wrote is going to run as is any years from now.  But I used some external libraries, and in order to run these in the future as is, I would need the libraries I used to be around by then. Not likely.

I chose clojure because it is fun to write clojure code even though the code (because of its external dependencies) most likely won’t run as is a year or two from now. Scala is fun too, but I really didn’t want to go through the painful process of explicitly writing the column types of each dataset I am going to fetch from quandl.