I was looking at CLML Tutorial. CLML seems to be a well-designed and comprehensive machine learning library that can do a lot of things from time series analysis to neural networks.
I am against using heavy frameworks for doing anything. I think libraries too should confine themselves to do one thing well. If your language is well-designed, like common lisp, combining specialized libraries to do even the most difficult and complicated tasks become possible after careful design.
Today’s task, then, is to use common lisp and relatively familiar
common lisp libraries to achieve what has been done in the
aforementioned tutorial. Instead of using babel and org mode, I will do
this post on cl-jupyter
.
First, let us define our package and load our libraries:
(mapc #'require '(:drakma :cl-ppcre :lla :gsll))
(defpackage #:my-test
(:use #:cl)
(:import-from #:lla svd svd-u svd-d svd-vt dot)
(:import-from #:grid column row)
(:import-from #:cl-ppcre split)
(:import-from #:drakma http-request))
(in-package #:my-test)
#<PACKAGE "MY-TEST">
Now, let me define a function to get the data from the web from a given URL:
(defun slurp-data (URL &optional (separator ","))
(mapcar (lambda (x) (mapcar #'read-from-string (split separator x)))
(split "\n+" (http-request URL))))
SLURP-DATA
OK. Let us get the data:
(defparameter raw (slurp-data "http://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data"))
(list (car raw) (cadr raw))
((1 14.23 1.71 2.43 15.6 127 2.8 3.06 0.28 2.29 5.64 1.04 3.92 1065)
(1 13.2 1.78 2.14 11.2 100 2.65 2.76 0.26 1.28 4.38 1.05 3.4 1050))
We don’t need the first column since it tells us the class.
(defparameter wine (mapcar #'cdr raw))
WINE
I am going to apply singular value decomposition on the data. For that I am using the LLA library. The library requires that the data is in the form of a 2d-array:
(defparameter matrix
(let ((n (length wine))
(m (length (car wine))))
(make-array (list n m) :initial-contents wine)))
(defparameter svd-result (svd matrix))
SVD-RESULT
Let us check the singular values:
(slot-value (svd-d svd-result) 'cl-num-utils.matrix::elements)
#(10886.66990656052d0 493.56204760867354d0 57.1488432147479d0
30.100125524021887d0 18.54281551828445d0 14.46302059358436d0
11.036037653318322d0 5.289890187623465d0 4.456588316877591d0
3.575271453862534d0 2.6012217367795545d0 1.9868082509451046d0
1.2139139647789066d0)
Result indicates that the first two singular vectors should be just
enough. In order to project our 14D vectors to 2D vectors, I am going
to need a dot-product. The LLA
library already has that.
After I project the data-points, I will combine the wine-class data
with the projected vector. The result will be written in a file which I
will use with gnuplot.
(let* ((vecs (mapcar (lambda (i) (row (svd-u svd-result) i)) (list 0 1)))
(data (mapcar (lambda (x) (coerce x 'vector)) wine))
(class (mapcar #'car raw))
(projected-data
(mapcar (lambda (x y) (cons y (mapcar (lambda (z) (dot z x)) vecs)))
data
class)))
(with-open-file (out "data.csv" :direction :output :if-does-not-exist :create :if-exists :supersede)
(format out "~{~{~5,4F~^, ~}~%~}" projected-data)))
NIL
Good separation between class 1 and class 2 wines, and class 3 is scattered in between. Looking at the data, one might have used only the most dominant vector to achieve the same separation.
Unlike the CLML Tutorial, I didn’t normalize the data because the separation was much worse after a normalization.
So, there you have it. “Look ma! No frameworks.”