Today, I am going to write an implementation of multivariate regression in clojure with all statistical bells and whistles.
Here are the dependencies I have for this implementation:
:deps { net.mikera/core.matrix {:mvn/version "0.62.0"}
net.mikera/vectorz-clj {:mvn/version "0.48.0"}
org.apache.commons/commons-math3 {:mvn/version "3.5"}
}
With the dependencies above, I will put everything I need into my namespace.
ns regression
(:use clojure.core.matrix)
(:use clojure.core.matrix.operators)
(:import [org.apache.commons.math3.distribution
(
NormalDistribution
TDistribution FDistribution]))
Here is my implementation. The first term in the coefficients is the intercept.
defn regression [xs y]
(let [[k n] (shape xs)
(- n k 1)
nk (
tdist (TDistribution. nk)
fdist (FDistribution. k nk)fn [dist x] (* 2.0 (- 1.0 (.cumulativeProbability dist x))))
sig (join [(repeat n 1.0)] xs)
X (
XXt (mmul X (transpose X))-> XXt inverse (mmul X y))
beta (
yhat (mmul beta X)
res (sub y yhat)/ (reduce + y) n)
ybar (repeat n ybar))
y-cen (sub y (repeat n ybar))
yhat-cen (sub yhat (
SSR (dot yhat-cen yhat-cen)
SST (dot y-cen y-cen)
SSRes (dot res res)map sqrt (scale (/ SSRes n) (diagonal XXt)))
se (map / beta se))
tscore (abs (/ (/ SSR k) (/ SSRes nk))
F (/ SSR SST)]
R2 (:coeffs (seq beta)
{:se (seq se)
:t0 (seq tscore)
:tsig (map #(sig tdist %) tscore)
:F F
:Fsig (sig fdist F)
:RSquare R2}))
#'regression/regression
Let us test it on random data. The coefficients are -2 and 1 with intercept being 4.
let [normal (NormalDistribution. 4.0 0.2)
(fn [n] (take n (repeatedly rand)))
random-vector (join [(random-vector 10)]
X (array (10)]))
[(random-vector 2 1] X)
Y (add (mmul [-seq (.sample normal 10)))]
( (regression X Y))
:coeffs (4.0169543977679725 -1.9653087520525703 0.8539922022675653), :se (0.5189380093158233 0.30494977845244536 0.3768881235684564), :t0 (7.740721098969014 6.444696441578315 2.2659037227859207), :tsig (1.1242165296532214E-4 3.520298919033049E-4 0.05782572705551492), :F 45.39319256356398, :Fsig 1.9629026632062008E-4, :RSquare 0.9284153924812029} {