The Kitchen Sink and Other Oddities

Atabey Kaygun

Online Regression

Description of the problem

Assume that we have a functional relation between two variables of the form

yf(αx+β)+ϵ

where ϵ is our error term, f is a known function, and x is a numerical vector. The parameters α and β are unknown. Classical numerical multivariate regression uses f(x)=x. If we have a collection of data points (y(i),x(i))iI, can we find the best approximating α and β?

Let me add another constraint: assume we do not have access to the whole data set, and we are allowed to make only one pass. Can we design an online algorithm to solve this problem?

A stochastic solution

Assume we have a data point (y(i),x(i)), and let us assume we already have an estimate (α(i),β(i)). The error in our computation is

Δy(i)=y(i)f(α(i)x(i)+β(i))

so that we have

y(i)Δy(i)=f(α(i)x(i)+β(i))

Assume for now that f is invertible with an inverse g(x). Then

g(y(i)Δy(i))α(i)x(i)+β(i)g(y(i))Δy(i)

On the other hand, the derivative of the inverse function g(x) can be calculated from f(x)

=α(i)x(i)+β(i)Δy(i)f(α(i)x(i)+β(i))=α(i+1)x(i)+β(i+1)

This means we get update rules of the form

β(i+1)β(i)ηΔy(i)f(α(i)x(i)+β(i))

and

α(i+1)α(i)ηΔy(i)f(α(i)x(i)+β(i))x(i)

where η is the learning rate. Since this is an iterative solution, we can use this method to process the whole data without keeping the whole dataset in memory.

An implementation

Fist I am going to need two utility functions: the first is an implementation of the ordinary dot product of two vectors, the second is an implementation of numerical differentiation of a function at a given point.

(defun dot (xs ys) (reduce #'+ (map 'list #'* xs ys)))

(defun diff (fn x &optional (epsilon 1d-2))
  (let ((delta (/ epsilon 2)))
    (*  (- (funcall fn (+ x delta))
           (funcall fn (- x delta)))
        (/ epsilon))))
DOT
DIFF

For the regression function, I am going to assume the dataset is a stream, i.e. given by a function (with an internal state) that accepts no arguments and every time this function is called it returns a new data point until it exhaust all points and starts returning nil.

(defun regression (data fn &optional (eta 1d-2))
  (let* ((alpha (loop repeat (length (funcall data)) collect (- 1.0 (random 2.0)))))
    (do ((point (funcall data) (funcall data)))
    ((null point) alpha)
      (let* ((x (cons 1.0 (cdr point)))
         (y (car point))
         (calculated (dot alpha x))
         (derivative (diff fn calculated))
         (delta (- (funcall fn calculated) y))
         (update (/ delta derivative)))
    (loop for i from 0 below (length alpha) do
         (decf (elt alpha i) (* eta update (elt x i))))))))
REGRESSION

The following function creates an experiment dataset:

(defun generate-data (n m a b fn delta)
  (let ((res nil)
    (len (- b a)))
    (dotimes (i n res)
      (let* ((x (loop repeat m collect (+ (random len) a)))
         (y (+ (funcall fn x) (- (random delta) (/ delta 2.0)))))
    (push (cons y x) res)))))
GENERATE-DATA

Here n stands for the number of points created, m stands for number of features, a and b are the bounds for the features, fn is our function, and δ is for (uniform) error term. However, the dataset is a list. We must turn this list into a stream:

(defun streamify (data)
  (let ((xs (copy-list data)))
    (lambda ()
      (let ((x (car xs)))
        (setf xs (cdr xs))
        x))))
STREAMIFY

Now, we are ready to write the experiment code:

(defun experiment (param a b n fn eta delta)
  (let* ((data (generate-data n (1- (length param)) a b (lambda (x) (funcall fn (dot param (cons 1.0 x)))) delta))
     (xs (streamify data))
     (alpha (regression xs fn eta))
     (n 0)
     (r-square 0))
    (dolist (point data)
      (let* ((x (cdr point))
         (y (car point))
         (z (funcall fn (dot alpha (cons 1.0 x)))))
    (incf n)
    (incf r-square (expt (- z y) 2))))
    (format nil "param = ~{~4,4f~^, ~}~%alpha = ~{~4,4f~^, ~}~%R2=~4,4f N=~d~%" param alpha r-square n)))
EXPERIMENT

Our first experiment is a standard regression:

(experiment '(1.0 -1.0 2.0 3.2 -1.2 0.8) -3.0 2.0 250000 (lambda (x) x) 0.025 0.05)
param = 1.0000, -1.0000, 2.0000, 3.2000, -1.2000, .8000
alpha = 1.0003, -.9984, 1.9983, 3.1977, -1.2001, .8003
R2=58.1951 N=250000

Pretty good. Now, with logistic function:

(defun logit (x) (/ 1.0 (+ 1.0 (exp (- x)))))

(experiment '(1.2 -1.8 2.1 3.4) 0.0 1.0 12000 #'logit 0.08 0.03)
LOGIT
param = 1.2000, -1.8000, 2.1000, 3.4000
alpha = 1.0552, -1.6287, 2.0517, 3.0794
R2=2.8681 N=12000

Still, pretty good.