Online Regression

Description of the problem

Assume that we have a functional relation between two variables of the form

\[ y \sim f(\alpha\cdot x + \beta) + \epsilon \]

where $\epsilon$ is our error term, $f$ is a known function, and $x$ is a numerical vector. The parameters $\alpha$ and $\beta$ are unknown. Classical numerical multivariate regression uses $f(x)=x$. If we have a collection of data points ${(y^{(i)},x^{(i)})}_{i\in I}$, can we find the best approximating $\alpha$ and $\beta$?

Let me add another constraint: assume we do not have access to the whole data set, and we are allowed to make only one pass. Can we design an online algorithm to solve this problem?

A stochastic solution

Assume we have a data point $(y^{(i)},x^{(i)})$, and let us assume we already have an estimate $(\alpha^{(i)},\beta^{(i)})$. The error in our computation is

\[ \Delta y^{(i)} = y^{(i)} - f(\alpha^{(i)}\cdot x^{(i)} + \beta^{(i)}) \]

so that we have

\[ y^{(i)} - \Delta y^{(i)} = f(\alpha^{(i)}\cdot x^{(i)} +\beta^{(i)}) \]

Assume for now that $f$ is invertible with an inverse $g(x)$. Then

\[ g(y^{(i)} - \Delta y^{(i)}) \approx \alpha^{(i)}\cdot x^{(i)} + \beta^{(i)} - g'(y^{(i)})\Delta y^{(i)} \]

On the other hand, the derivative of the inverse function $g(x)$ can be calculated from $f(x)$

\[ = \alpha^{(i)}\cdot x^{(i)} +\beta^{(i)} - \frac{\Delta y^{(i)}}{f'(\alpha^{(i)}\cdot x^{(i)} + \beta^{(i)})} = \alpha^{(i+1)}\cdot x^{(i)} + \beta^{(i+1)} \]

This means we get update rules of the form

\[ \beta^{(i+1)} \leftarrow \beta^{(i)} - \frac{\eta\Delta y^{(i)}}{f'(\alpha^{(i)}\cdot x^{(i)} + \beta^{(i)})} \]

and

\[ \alpha^{(i+1)} \leftarrow \alpha^{(i)} - \frac{\eta \Delta y^{(i)}}{f'(\alpha^{(i)}\cdot x^{(i)} + \beta^{(i)})} x^{(i)} \]

where $\eta$ is the learning rate. Since this is an iterative solution, we can use this method to process the whole data without keeping the whole dataset in memory.

An implementation

Fist I am going to need two utility functions: the first is an implementation of the ordinary dot product of two vectors, the second is an implementation of numerical differentiation of a function at a given point.

(defun dot (xs ys) (reduce #'+ (map 'list #'* xs ys)))

(defun diff (fn x &optional (epsilon 1d-2))
  (let ((delta (/ epsilon 2)))
    (*  (- (funcall fn (+ x delta))
           (funcall fn (- x delta)))
        (/ epsilon))))

DOT
DIFF

For the regression function, I am going to assume the dataset is a stream, i.e. given by a function (with an internal state) that accepts no arguments and every time this function is called it returns a new data point until it exhaust all points and starts returning nil.

(defun regression (data fn &optional (eta 1d-2))
  (let* ((alpha (loop repeat (length (funcall data)) collect (- 1.0 (random 2.0)))))
    (do ((point (funcall data) (funcall data)))
    ((null point) alpha)
      (let* ((x (cons 1.0 (cdr point)))
         (y (car point))
         (calculated (dot alpha x))
         (derivative (diff fn calculated))
         (delta (- (funcall fn calculated) y))
         (update (/ delta derivative)))
    (loop for i from 0 below (length alpha) do
         (decf (elt alpha i) (* eta update (elt x i))))))))

REGRESSION

The following function creates an experiment dataset:

(defun generate-data (n m a b fn delta)
  (let ((res nil)
    (len (- b a)))
    (dotimes (i n res)
      (let* ((x (loop repeat m collect (+ (random len) a)))
         (y (+ (funcall fn x) (- (random delta) (/ delta 2.0)))))
    (push (cons y x) res)))))

GENERATE-DATA

Here $n$ stands for the number of points created, $m$ stands for number of features, $a$ and $b$ are the bounds for the features, $fn$ is our function, and $\delta$ is for (uniform) error term. However, the dataset is a list. We must turn this list into a stream:

(defun streamify (data)
  (let ((xs (copy-list data)))
    (lambda ()
      (let ((x (car xs)))
        (setf xs (cdr xs))
        x))))

STREAMIFY

Now, we are ready to write the experiment code:

(defun experiment (param a b n fn eta delta)
  (let* ((data (generate-data n (1- (length param)) a b (lambda (x) (funcall fn (dot param (cons 1.0 x)))) delta))
     (xs (streamify data))
     (alpha (regression xs fn eta))
     (n 0)
     (r-square 0))
    (dolist (point data)
      (let* ((x (cdr point))
         (y (car point))
         (z (funcall fn (dot alpha (cons 1.0 x)))))
    (incf n)
    (incf r-square (expt (- z y) 2))))
    (format nil "param = ~{~4,4f~^, ~}~%alpha = ~{~4,4f~^, ~}~%R2=~4,4f N=~d~%" param alpha r-square n)))

EXPERIMENT

Our first experiment is a standard regression:

(experiment '(1.0 -1.0 2.0 3.2 -1.2 0.8) -3.0 2.0 250000 (lambda (x) x) 0.025 0.05)

param = 1.0000, -1.0000, 2.0000, 3.2000, -1.2000, .8000
alpha = 1.0003, -.9984, 1.9983, 3.1977, -1.2001, .8003
R2=58.1951 N=250000

Pretty good. Now, with logistic function:

(defun logit (x) (/ 1.0 (+ 1.0 (exp (- x)))))

(experiment '(1.2 -1.8 2.1 3.4) 0.0 1.0 12000 #'logit 0.08 0.03)

LOGIT
param = 1.2000, -1.8000, 2.1000, 3.4000
alpha = 1.0552, -1.6287, 2.0517, 3.0794
R2=2.8681 N=12000

Still, pretty good.

Older Posts

[2025-02-23] Counting Matroids

[2025-02-12] Sampling from a Random Variable

[2025-01-29] Markov Numbers

[2024-12-24] Number of isomorphism classes of simple graph (continued)

[2024-12-22] Counting Isomorphism Classes of Graphs

[2024-11-25] Connected Components of Graphs

[2024-11-24] Counting connected components of a graph

[2024-11-18] Counting Isomorphism Classes of m-ary Trees

[2024-11-16] Number of Isomorphism Classes of Ternary Trees

[2024-11-12] Hosoya Index of Balanced Binary Trees

[2024-11-11] Hosoya Index of a Graph

[2024-10-29] The Clique Number of a Simple Graph

[2024-10-28] The Size of Maximally Independent Subsets in a Graph

[2023-11-03] Graph Algorithms in JGraphT with Common Lisp

[2023-10-28] An Implementation of Pandas’ cut and qcut in Lisp

[2023-07-24] A Collatz-like Conjecture for the Projective Line

[2023-03-06] Twin Primes, Cousin Primes, Sexy Primes, and Prime Triplets

[2023-03-02] Set of All Partitions of a Finite Set

[2023-02-14] Non-crossing Partitions and Dyck Words

[2023-02-13] Non-crossing Linear Chords

[2023-02-04] Clojure/Python Interop Examples

[2023-01-14] Graph Algorithms in Clojure with JGraphT

[2022-03-29] 2D-Random Walk

[2022-03-28] Trade Deficit vs Exchange Rate Curve

[2022-03-16] Working with World Bank Data in Python

[2022-03-09] Working with European Central Bank data in python (revisited)

[2022-01-24] A Clique Analysis of Quakers in early modern Britain (1500-1700)

[2021-12-05] Boyer–Moore and Misra-Gries Algorithms in Clojure

[2021-09-12] Tension in Text Plotted

[2021-09-02] Statistical Distributions using Apache Commons Math in Clojure

[2021-08-31] Reduce with Intermediate Results in Common Lisp

[2021-08-21] Multivariate Regression Implemented in Clojure

[2021-05-29] Using Neural Networks to Detect Graph Properties

[2021-04-17] Fast Null-Space Calculation via LU-Decomposition

[2021-02-24] Stoer-Wagner Algorithm in Clojure

[2021-02-19] Calculating Vertex Covers in Clojure

[2021-02-18] Listing All Paths in a Graph

[2021-02-14] Strict Dyck Words and Fibonacci Numbers

[2021-02-14] Kruskal’s Algorithm in Common Lisp

[2021-02-13] Kruskal's Algorithm Implemented in Clojure

[2021-02-10] An integer dynamical system of integers

[2021-02-08] Binary Symmetrization

[2021-01-28] Prüfer Encoding and Decoding of a Tree in Clojure

[2021-01-27] Counting Cycle-Free Paths in a Graph

[2021-01-27] Counting Connected Labeled Graphs

[2020-12-18] Counting Graphs with a Prescribed Degree Sequence

[2020-12-13] Havel–Hakimi Algorithm in Clojure

[2020-12-12] Havel–Hakimi Algorithm in Common-Lisp

[2020-10-23] The Quadratic Casimir Element

[2020-07-04] Collatz Sequence in Binary

[2020-07-02] A Lazy Sequence of Primes in Clojure

[2020-06-10] Yet Another Fizz-Buzz in Common Lisp

[2020-05-12] ECB Data with Clojure and Vega-Lite

[2020-05-06] Processing ECB Data with Common Lisp

[2020-04-17] Next Permutation in the Lexicographical Ordering

[2020-04-13] Turkish Hyphenation in Common Lisp

[2020-04-01] Using JavaPlex with Clojure

[2019-11-05] Constricted Arithmetic Progressions

[2019-11-03] The Number of Arithmetic Progressions of Integers

[2019-05-06] Bron-Kerbosch Algorithm in Clojure

[2019-05-01] An Implementation of Ford-Fulkerson Algorithm in Clojure

[2019-04-22] Document Summarization via Nonnegative Matrix Factorization

[2019-04-20] Latent Semantic Analysis in Clojure

[2019-04-13] K-Nearest Neighbors Algorithm in Clojure

[2019-04-06] K-Means Implemented in Clojure

[2019-03-19] Prüfer Encoding/Decoding of a Tree in Common Lisp

[2019-03-05] Gale-Shaply Algorithm in Common Lisp

[2019-03-02] Calculating The Correct Rank of a Matrix

[2018-12-04] Feed-forward and back-propagation in neural networks as left- and right-fold

[2018-10-31] Nonnegative Matrix Decomposition in Clojure

[2018-10-30] Non-negative Matrix Decomposition in Scala

[2018-08-30] Working with European Central Bank Data in Scala

[2018-07-30] Perverse Sequences

[2018-05-28] Online Perceptron in Common Lisp

[2018-05-26] Online Perceptron

[2018-05-18] Online Regression

[2018-05-06] Knut’s Algorithm-S in Common Lisp

[2018-02-28] Irreducible Dyck Words

[2018-02-19] Optimization with GNU Scientific Library for Lisp

[2018-02-10] Van Eck’s Sequence

[2018-02-09] Hiring networks in mathematics

[2018-02-08] Linus Sequence

[2018-02-05] Egyptian Fractions

[2018-02-01] Listing all Young Tableaux

[2018-01-23] Collatz sequence (yet again)

[2018-01-15] Hofstadter's Q sequence

[2018-01-09] Farey Sequence

[2018-01-09] Catalan's Triangle

[2018-01-06] The Shoelace Formula for the Area of a Polygon

[2017-10-01] Working with European Central Bank Data in Python

[2017-09-27] Expected Value of the Diameter of a Tree

[2017-09-26] Using Quandl with kixi.stats on Clojure

[2017-09-22] Using Quandl with Common Lisp

[2017-08-05] Solving Linear Equations in Natural Numbers

[2017-07-31] Transitive Closure of a Directed Graph or a Relation

[2017-07-20] Steenrod-Milnor and Tournament Sequences

[2017-07-15] A lower bound on the radius of a graph

[2017-07-08] All partitions of an integer

[2017-07-06] Some Hasse Diagrams

[2017-07-04] Shuffles

[2017-07-03] Kaprekar Sequence

[2017-07-01] Lattice of Dyck Words

[2017-06-28] The poset of connected subgraphs of a connected graph

[2017-06-21] Calculating the Diameter and the Radius of a Graph Using Tropic Linear Algebra

[2017-06-19] Generating random regular graphs

[2017-06-14] Estimating the maximum element of a large list

[2017-06-09] A Stochastic Gradient Descent Implementation in Clojure

[2017-06-06] A topology problem

[2017-04-22] Listing duplicate files

[2017-03-14] My First Idris Proof

[2016-12-02] Distinguishing hash functions (part II)

[2016-12-01] Distinguishing hash functions

[2016-10-20] Hofstadter-Conway $10,000 sequence

[2016-08-18] A Solution for Problem 171 of 4Clojure

[2016-08-16] Puzzles and Group Theory

[2016-08-13] Using Weka within Lisp

[2016-07-12] Funniest and Unfunniest Jokes in the Jester Dataset

[2016-07-05] Generating Uniformly Random Connected Graphs

[2016-06-16] The Robinson-Schensted Algorithm

[2016-06-01] Conjugate Partitions

[2016-04-27] Using Word2Vec from Clojure

[2016-04-24] Using Word2Vec from Common Lisp

[2016-04-18] A Migration Analysis

[2016-04-11] Basic Data Analysis with CL without Frameworks

[2016-03-25] Parallel map-reduce in Common Lisp

[2016-02-22] Text Summarization and Topic Analysis

[2016-01-27] Set Covering Problem

[2016-01-25] Kolmogorov-Smirnov Test

[2016-01-20] Eigen-values of the Laplacian and Connected Components of a Graph

[2015-12-12] Dual Graphs

[2015-10-26] Longest Increasing Subsequence Revisited

[2015-10-16] Document Summarization via Markov Chains

[2015-10-07] Computational Literary Analysis

[2015-09-30] Library of Babel in Common Lisp

[2015-09-28] Merging Association Lists in Common Lisp

[2015-07-22] Cheapest Paths via Tropic Matrices

[2015-07-21] Hidden Markov Models via Tropic Matrices

[2015-07-08] A non-technical post

[2015-06-28] An implementation of the Viterbi algorithm in Common Lisp

[2015-05-28] Greatest Common Divisor of Two Rational Numbers

[2015-05-21] Partitions of Equal Measure Whatever the Measure May Be

[2015-05-14] Finding Cliques in a Graph

[2015-05-12] Set Cover Problem

[2015-05-03] Threading Macros in Common Lisp

[2015-05-03] Happy Numbers

[2015-05-01] Collatz Primes

[2015-04-23] Splitting Streams

[2015-04-06] Hamming Distance and Double Hashing

[2015-04-05] Hamming Distance and Hashing Functions

[2015-04-05] Hamming Derivative of Hashing Functions

[2015-04-02] A Topology Problem

[2015-03-21] Curve Fitting is a Gram-Schmidt Reduction

[2015-03-08] Maximum number of characters using keystrokes A, Ctrl+A, Ctrl+C and Ctrl+V

[2015-03-06] Eccentricity, Radius and Diameter in a Graph, Revisited

[2015-03-01] Graphs and Entropy

[2015-02-22] Math PhD Hiring Network (Part 3)

[2015-02-19] Math PhD Hiring Network (Part 2)

[2015-02-18] Math PhD Hiring Network (Part 1)

[2015-02-17] Faculty Networks and Inequality in Hiring Practices in Universities

[2015-02-10] Functional Streams in Lisp Explained

[2015-02-05] Collatz-type Conjectures (Continued)

[2015-02-04] Collatz-type Conjectures (Continued)

[2015-01-31] Collatz-type Conjectures (Continued)

[2015-01-30] Collatz-type Conjectures

[2015-01-28] Experiments with Infinite Recursive Sequences (continued)

[2015-01-17] Experiments with Infinite Recursive Sequences

[2015-01-10] Goldbach Pairs

[2015-01-02] Collatz Lengths (Continued)

[2015-01-01] Functional Streams

[2014-12-27] Polarization in the US Congress

[2014-12-18] Partition a sequence

[2014-11-28] Uniformly Random Permutations

[2014-11-22] An Implementation of Ford-Fulkerson Algorithm in Common Lisp

[2014-11-17] Tropic Calculation of Cheapest Paths

[2014-11-05] Longest common subsequence of two sequences

[2014-10-30] Counting Spanning Trees of a Graph

[2014-10-26] Longest Increasing Subsequence

[2014-10-24] The Number of Inversions in a Sequence

[2014-10-22] Hashes and Entropy

[2014-10-09] Estimating Cardinality with Constant Memory Complexity

[2014-09-30] Landau's Function

[2014-09-29] A Problem on Substitution Ciphers and Group Theory

[2014-09-28] A Morse Code Translator

[2014-09-23] A Memoization Macro for Common Lisp

[2014-09-21] Reducers are Monoid Morphisms

[2014-09-18] Number of isomorphism classes of binary trees

[2014-09-07] CONS is your friend

[2014-08-22] A Zipf's Law Simulation

[2014-08-07] Generating Uniformly Random Trees

[2014-07-09] A Solution for Project Euler #463

[2014-06-12] Entropy of truncated MD5 hashing

[2014-06-08] Hexadecimal digits of π

[2014-02-11] Information content of n-grams

[2014-02-08] Turkish Sentiment Analysis Using Thesaurus Distance

[2014-02-01] Sentiment analysis using word distances

[2014-01-27] Phase transitions in entropy

[2013-12-13] Optimal length of n-grams

[2013-12-10] Counting strings that contain intervals of same letter repetitions

[2013-12-02] Patterns Separating Large Texts

[2013-11-23] Collatz Sequences (Continued)

[2013-11-11] Entropy and approximately one-to-one maps

[2013-10-23] Tree Isomorphism

[2013-10-15] Self Organizing Maps

[2013-09-15] Euler Project #401

[2013-09-15] An additively recursive definition of the Moebius function

[2013-09-11] An Unsuccessful Attempt for Solving Euler Project #401

[2013-09-04] Uniform Sampling from Parametrized Submanifolds in Scala

[2013-09-04] Uniform Sampling from Parametrized Submanifolds

[2013-08-30] Randomly Generated Points Obeying a Distribution

[2013-08-25] Simulated Annealing in Lisp

[2013-08-21] Eigenvalues and Eigenvectors in GSLL

[2013-08-16] Reservoir Sampling

[2013-08-11] Gibbs sampling in lisp compared with C

[2013-08-10] Logistic Regression in lisp

[2013-08-10] Linear Discriminant Analysis in R

[2013-07-17] A Gradient Descent Implementation in Lisp

[2013-07-01] k-Nearest Neighbor Classification Algorithm Implemented in Lisp

[2013-05-19] Newton-Raphson Method

[2013-05-07] Levenshtein Distance

[2013-04-15] Cut points in a graph

[2013-04-01] Experiments on Collatz Lengths (Continued)

[2013-02-18] The sound of the torsion parts of homotopy groups of spheres

[2013-02-12] Monadic Units

[2013-02-07] Distribution of Collatz Lengths (continued)

[2013-02-03] Distribution of Collatz Lengths

[2013-01-31] Quotients of polynomial algebras

[2013-01-12] Path ideals

[2013-01-10] McCarthy91 Terminates

[2013-01-09] Finding all paths in a directed graph

[2013-01-04] A Simple Monte-Carlo Integration Implementation in Lisp

[2012-12-30] A simple problem in Kolmogorov-Chaitin complexity

[2012-12-29] From walks to paths

[2012-12-16] Higher order functions, functors and monads

[2012-12-13] Eccentricity, Radius and Diameter in an Undirected Graph

[2012-11-29] Untitled

[2012-11-25] Strictly Increasing Labels of Directed Graphs

[2012-11-19] Strictly Increasing Labellings of Directed Graphs

[2012-11-17] Nilpotent elements in an artinian algebra

[2012-11-04] Local rings, idempotents and non-invertible elements

[2012-10-18] An implementation of the fixed-radius near neighbor clustering algorithm in lisp

[2012-10-15] Reducing directed graphs

[2012-10-10] An implementation of the k-means clustering algorithm in lisp

[2012-10-08] A comparison of different map functions in lisp

[2012-10-03] Source code entropy

[2012-09-28] Collisions in random walks

[2012-09-26] Transitive closure of a directed graph

[2012-09-26] Solving linear equations in ℕ

[2012-09-26] Listing partitions

[2012-09-26] Inverting formal power series

[2012-09-26] Hasse subgraph of a directed graph

The Kitchen Sink and Other Oddities

Online Regression

Description of the problem

A stochastic solution

An implementation

Older Posts