Today, I am going to write an implementation of multivariate regression in clojure with all statistical bells and whistles.
Here are the dependencies I have for this implementation:
:deps { net.mikera/core.matrix {:mvn/version "0.62.0"}
net.mikera/vectorz-clj {:mvn/version "0.48.0"}
org.apache.commons/commons-math3 {:mvn/version "3.5"}
}
With the dependencies above, I will put everything I need into my namespace.
(ns regression
(:use clojure.core.matrix)
(:use clojure.core.matrix.operators)
(:import [org.apache.commons.math3.distribution
NormalDistribution
TDistribution
FDistribution]))Here is my implementation. The first term in the coefficients is the intercept.
(defn regression [xs y]
(let [[k n] (shape xs)
nk (- n k 1)
tdist (TDistribution. nk)
fdist (FDistribution. k nk)
sig (fn [dist x] (* 2.0 (- 1.0 (.cumulativeProbability dist x))))
X (join [(repeat n 1.0)] xs)
XXt (mmul X (transpose X))
beta (-> XXt inverse (mmul X y))
yhat (mmul beta X)
res (sub y yhat)
ybar (/ (reduce + y) n)
y-cen (sub y (repeat n ybar))
yhat-cen (sub yhat (repeat n ybar))
SSR (dot yhat-cen yhat-cen)
SST (dot y-cen y-cen)
SSRes (dot res res)
se (map sqrt (scale (/ SSRes n) (diagonal XXt)))
tscore (abs (map / beta se))
F (/ (/ SSR k) (/ SSRes nk))
R2 (/ SSR SST)]
{:coeffs (seq beta)
:se (seq se)
:t0 (seq tscore)
:tsig (map #(sig tdist %) tscore)
:F F
:Fsig (sig fdist F)
:RSquare R2}))#'regression/regressionLet us test it on random data. The coefficients are -2 and 1 with intercept being 4.
(let [normal (NormalDistribution. 4.0 0.2)
random-vector (fn [n] (take n (repeatedly rand)))
X (array (join [(random-vector 10)]
[(random-vector 10)]))
Y (add (mmul [-2 1] X)
(seq (.sample normal 10)))]
(regression X Y)){:coeffs (4.0169543977679725 -1.9653087520525703 0.8539922022675653), :se (0.5189380093158233 0.30494977845244536 0.3768881235684564), :t0 (7.740721098969014 6.444696441578315 2.2659037227859207), :tsig (1.1242165296532214E-4 3.520298919033049E-4 0.05782572705551492), :F 45.39319256356398, :Fsig 1.9629026632062008E-4, :RSquare 0.9284153924812029}