The Kitchen Sink and Other Oddities

Atabey Kaygun

Statistical Distributions using Apache Commons Math in Clojure

Description of the problem

More than often, I need random numbers from standard statistical distributions such as Gaussian, Poisson, T, F, or Beta. Since I like using Clojure so much, I have been looking for a satisfactory solution with as little friction as possible. Then I remember, Apache Commons Math has an excellent statistical sub-library that has all the distributions I need (and more) already implemented. All I need is to form a bridge between Clojure and Apache Commons Math.

Those who are interested in a pure Clojure solution should check out kixi.stats.

Implementation

Let me start with my dependencies in my deps.edn:

{:deps {org.apache.commons/commons-math3 {:mvn/version "3.6.1"}}}

Next, my namespace:

(ns distributions.core
  (:import [org.apache.commons.math3.distribution
            BetaDistribution 
            BinomialDistribution 
            CauchyDistribution 
            ChiSquaredDistribution 
            ExponentialDistribution 
            FDistribution 
            GammaDistribution 
            GeometricDistribution 
            GumbelDistribution 
            HypergeometricDistribution 
            IntegerDistribution 
            KolmogorovSmirnovDistribution 
            LaplaceDistribution 
            LevyDistribution 
            LogNormalDistribution 
            LogisticDistribution 
            MixtureMultivariateNormalDistribution 
            MixtureMultivariateRealDistribution 
            MultivariateNormalDistribution 
            MultivariateRealDistribution 
            NakagamiDistribution 
            NormalDistribution 
            ParetoDistribution 
            PascalDistribution 
            PoissonDistribution 
            RealDistribution
            TDistribution
            TriangularDistribution 
            UniformIntegerDistribution 
            UniformRealDistribution 
            WeibullDistribution 
            ZipfDistribution]))

I included all distributions in the library in case you’d like to experiment with them later on. Next, some utility functions. I am following R’s naming conventions:

(defn quartile [dist x]
  (.inverseCumulativeProbability dist x))

(defn pdf [dist x]
  (.density dist x))

(defn cdf [dist x]
  (.cumulativeProbability dist x))

(defn sample [dist N]
  (seq (.sample dist N)))

Some tests

Let us test the code. First, the normal distribution with 0 mean and 1.0 variance:

(def ndist (NormalDistribution. 0.0 1.0))
(quartile ndist 0.975)
(pdf ndist 12.0)
(sample ndist 10)
#'distributions.core/ndist
1.959963984540054
2.1463837356630728E-32
(0.7328831550323387 -1.7564646871411873 0.3036384595319081 -0.6552956779199097 0.09651672484712626 2.100967417766368 -0.3455198591244783 0.4937778023356781 0.44520438525164974 -0.33095042749939735)

Next, the Poisson distribution with p=0.05:

(def pdist (PoissonDistribution. 0.05))
(sample pdist 100)
#'distributions.core/pdist
(0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0)

And, finally the T-distribution with 2 degrees of freedom

(def tdist (TDistribution. 2))
(quartile tdist 0.975)
(pdf tdist 12.0)
(sample tdist 10)
#'distributions.core/tdist
4.302652729698313
5.668533483577864E-4
(0.9571878079111966 0.22544989371671462 -1.4840659295754255 1.1260852156451235 0.38857568503851414 -1.4375968836484603 0.15007647730820867 -1.1068773851595448 0.0021123655126886205 -0.2345292434886456)