Assume that we have a functional relation between two variables of the form
\[ y \sim f(\alpha\cdot x + \beta) + \epsilon \]
where \(\epsilon\) is our error term, \(f\) is a known function, and \(x\) is a numerical vector. The parameters \(\alpha\) and \(\beta\) are unknown. Classical numerical multivariate regression uses \(f(x)=x\). If we have a collection of data points \({(y^{(i)},x^{(i)})}_{i\in I}\), can we find the best approximating \(\alpha\) and \(\beta\)?
Let me add another constraint: assume we do not have access to the whole data set, and we are allowed to make only one pass. Can we design an online algorithm to solve this problem?
Assume we have a data point \((y^{(i)},x^{(i)})\), and let us assume we already have an estimate \((\alpha^{(i)},\beta^{(i)})\). The error in our computation is
\[ \Delta y^{(i)} = y^{(i)} - f(\alpha^{(i)}\cdot x^{(i)} + \beta^{(i)}) \]
so that we have
\[ y^{(i)} - \Delta y^{(i)} = f(\alpha^{(i)}\cdot x^{(i)} +\beta^{(i)}) \]
Assume for now that \(f\) is invertible with an inverse \(g(x)\). Then
\[ g(y^{(i)} - \Delta y^{(i)}) \approx \alpha^{(i)}\cdot x^{(i)} + \beta^{(i)} - g'(y^{(i)})\Delta y^{(i)} \]
On the other hand, the derivative of the inverse function \(g(x)\) can be calculated from \(f(x)\)
\[ = \alpha^{(i)}\cdot x^{(i)} +\beta^{(i)} - \frac{\Delta y^{(i)}}{f'(\alpha^{(i)}\cdot x^{(i)} + \beta^{(i)})} = \alpha^{(i+1)}\cdot x^{(i)} + \beta^{(i+1)} \]
This means we get update rules of the form
\[ \beta^{(i+1)} \leftarrow \beta^{(i)} - \frac{\eta\Delta y^{(i)}}{f'(\alpha^{(i)}\cdot x^{(i)} + \beta^{(i)})} \]
and
\[ \alpha^{(i+1)} \leftarrow \alpha^{(i)} - \frac{\eta \Delta y^{(i)}}{f'(\alpha^{(i)}\cdot x^{(i)} + \beta^{(i)})} x^{(i)} \]
where \(\eta\) is the learning rate. Since this is an iterative solution, we can use this method to process the whole data without keeping the whole dataset in memory.
Fist I am going to need two utility functions: the first is an implementation of the ordinary dot product of two vectors, the second is an implementation of numerical differentiation of a function at a given point.
defun dot (xs ys) (reduce #'+ (map 'list #'* xs ys)))
(
defun diff (fn x &optional (epsilon 1d-2))
(let ((delta (/ epsilon 2)))
(* (- (funcall fn (+ x delta))
(funcall fn (- x delta)))
(/ epsilon)))) (
DOT DIFF
For the regression function, I am going to assume the dataset is a stream, i.e. given by a function (with an internal state) that accepts no arguments and every time this function is called it returns a new data point until it exhaust all points and starts returning nil.
defun regression (data fn &optional (eta 1d-2))
(let* ((alpha (loop repeat (length (funcall data)) collect (- 1.0 (random 2.0)))))
(do ((point (funcall data) (funcall data)))
(null point) alpha)
((let* ((x (cons 1.0 (cdr point)))
(car point))
(y (
(calculated (dot alpha x))
(derivative (diff fn calculated))- (funcall fn calculated) y))
(delta (/ delta derivative)))
(update (loop for i from 0 below (length alpha) do
(decf (elt alpha i) (* eta update (elt x i)))))))) (
REGRESSION
The following function creates an experiment dataset:
defun generate-data (n m a b fn delta)
(let ((res nil)
(- b a)))
(len (dotimes (i n res)
(let* ((x (loop repeat m collect (+ (random len) a)))
(+ (funcall fn x) (- (random delta) (/ delta 2.0)))))
(y (push (cons y x) res))))) (
GENERATE-DATA
Here \(n\) stands for the number of points created, \(m\) stands for number of features, \(a\) and \(b\) are the bounds for the features, \(fn\) is our function, and \(\delta\) is for (uniform) error term. However, the dataset is a list. We must turn this list into a stream:
defun streamify (data)
(let ((xs (copy-list data)))
(lambda ()
(let ((x (car xs)))
(setf xs (cdr xs))
( x))))
STREAMIFY
Now, we are ready to write the experiment code:
defun experiment (param a b n fn eta delta)
(let* ((data (generate-data n (1- (length param)) a b (lambda (x) (funcall fn (dot param (cons 1.0 x)))) delta))
(
(xs (streamify data))
(alpha (regression xs fn eta))0)
(n 0))
(r-square dolist (point data)
(let* ((x (cdr point))
(car point))
(y (funcall fn (dot alpha (cons 1.0 x)))))
(z (incf n)
(incf r-square (expt (- z y) 2))))
(format nil "param = ~{~4,4f~^, ~}~%alpha = ~{~4,4f~^, ~}~%R2=~4,4f N=~d~%" param alpha r-square n))) (
EXPERIMENT
Our first experiment is a standard regression:
1.0 -1.0 2.0 3.2 -1.2 0.8) -3.0 2.0 250000 (lambda (x) x) 0.025 0.05) (experiment '(
= 1.0000, -1.0000, 2.0000, 3.2000, -1.2000, .8000
param = 1.0003, -.9984, 1.9983, 3.1977, -1.2001, .8003
alpha 58.1951 N=250000 R2=
Pretty good. Now, with logistic function:
defun logit (x) (/ 1.0 (+ 1.0 (exp (- x)))))
(
1.2 -1.8 2.1 3.4) 0.0 1.0 12000 #'logit 0.08 0.03) (experiment '(
LOGIT= 1.2000, -1.8000, 2.1000, 3.4000
param = 1.0552, -1.6287, 2.0517, 3.0794
alpha 2.8681 N=12000 R2=
Still, pretty good.