Reducers are Monoid Morphisms

Last week I wrote about Guy Steele’s talk “foldl and foldr are considered slightly harmful” which I first heard from Rich Hickey. Steele’s conclusion was that for truly parallel data structures and algorithms cons was not a good basis. I disagree, and still think that one can use cons in an effective way to do what Steele wants to do.

Yesterday, I was watching Rich Hickey’s talk on Transducers he gave at StrangeLoop. This again reminded of me his talk on Reducers. Being a mathematician, and not a programmer proper, I have to make sense of things in my own way. Writing is one of them, and I hope this might be of interest for some.

Reducing associative containers

An associative container is a linearly ordered collection of objects of a certain type, or mixed types, such as a sequence or a linked list or a vector. A reducing function is a unital associative binary operation. That is all we need to implement a serial fold:

(defun serial-fold (sequence reducer &optional accumulator)
   (cond ((null sequence) accumulator)
         ((atom sequence) (funcall reducer accumulator sequence))
         (t (serial-fold (cdr sequence) 
                          reducer 
                          (funcall reducer accumulator (car sequence))))))

SERIAL-FOLD

(serial-fold '(9 8 7 6 5 4 3 2 1) #'+ 0)

45

(serial-fold '(9 8 7 6 5 4 3 2 1) (lambda (x y) (cons y x)) nil)

(1 2 3 4 5 6 7 8 9)

I cheated in the second example: notice that we don’t need associativity because we processed the list sequentially and in order both in time and in space. If we were to process the container in parallel by splitting into sublists and put them together using the same reducer function, then even though the first example would still return 45, the second example could return something else. This is because the cons operator is neither unital nor associative.

In any case, for a mathematician, someone whose mind is bent towards making abstractions, an associative container can be thought as an element of a free monoid generated by a type of elements, or mixed types of elements. I am not the only, or not even the first, person making this connection. See for example Dan Piponi’s post on monoids in Haskell, or better yet Steele’s talk.

Truly parallel data structures and `map-reduce`

As I said earlier, I came to Steele’s talk on fold through Hickey’s talk on Reducers. Hickey has been thinking about concurrent programming for a long time, and you can see the effects of this in the solid design of Clojure. The problem he was elaborating on his talk on Reducers was concurrent folding on large collections. Steele, Hickey and others before them, observed that a well-designed map-reduce does actually solve that problem. By the way, the earliest version of the argument, that fold is universal in some sense, I could find was by Sheard and Feragas, “A Fold for all Seasons.” I highly recommend it.

In any case, Hickey and Steele got it right: one has to use a version of map-reduce that Steele describes instead of a sequential fold for a concurrent processing of data collected in a container.

Here is a simplified version of the implementation I gave for map-reduce last week

(defun map-reduce (tree reducer mapper)
   (if (atom tree) 
      (funcall mapper tree)
      (funcall reducer (map-reduce (car tree) reducer mapper)
                       (map-reduce (cdr tree) reducer mapper))))

MAP-REDUCE

In this version, map-reduce doesn’t care how deep the call is in the tree and both reducer and mapper work the same way regardless of the depth. Here, I also make the assumption that if you call the mapper function with nil it must return the unit element for the reducer function.

(map-reduce 
   '((1 . 2) ((3 . 4) . 5) . 6)
   (lambda (&optional x y)
      (cond ((null y) x)
            ((null x) y)
            (t (cons x y))))
   (lambda (x)
      (if (not (null x))
          (* x x))))

((1 . 4) ((9 . 16) . 25) . 36)

Notice that I used a modified version of cons as the reducer essentially by making this version unital with nil being the unit, i.e. if you ucons any object with nil you get the same object back. I am going to need this function quite a bit.

(defun ucons (&rest x) 
   (cond ((null x) nil)
         ((null (cadr x)) (car x))
         ((null (car x)) (cadr x))
         (t (cons (car x) (cadr x)))))

UCONS

Reducers are morphisms of monoids

Following Antoine de Saint Exupery’s dictum that “perfection is reached not when there is nothing left to add, but when there is nothing left to take away,” before we go any further, let me simplify the map-reduce function a bit more:

(defun map-reduce (tree reducer)
   (if (atom tree) 
      (funcall reducer tree)
      (funcall reducer (map-reduce (car tree) reducer)
                       (map-reduce (cdr tree) reducer))))

MAP-REDUCE

In this version we make the following assumptions:

If we call reducer on an empty container we get the unit for the reducer function. Remember reducer is a unital associative operation.
If we call reducer on a single object, we must get a mapped version of the object.
If we call reducer on two objects, it does combining using a unital associative operation.

A mathematician would say, what I described above is a monoid morphism from a free monoid (linked lists can be thought as elements of a free monoid) to a another monoid. For example consider the function

(defun a-monoid-morphism (&rest x)
   (cond ((null x) 0)                  
         ((null (cadr x)) (car x))     
         (t (+ (car x) (cadr x)))))

A-MONOID-MORPHISM

This is a monoid morphism from the free monoid generated by number objects where the monoid operation is concatenation, into the monoid of natural numbers where the associative operation is + and the unit is 0. Now, when we feed this function to map-reduce

(map-reduce 
   '(9 8 7 6 5 4 3 2 1)
   #'a-monoid-morphism)

45

Here the list (9 8 7 6 5 4 3 2 1) represents an element in the free monoid generated by integers and the reducer sends this element to the corresponding element in the monoid of natural numbers.

As a matter of fact, any foldable container, not just linked lists will do. These also include non-unital, even nonassociative containers. A non-unital and nonassociative binary operation is called a magma. We usually call containers with this shape as binary trees. The map-reduce I implemented above works with them too.

I need a random tree for testing purposes.

just-a-tree

(((5 . 1) ((3 . 5) . 4) 10 . 0) (9 1 1 . 3) . 7)

(map-reduce just-a-tree #'a-monoid-morphism)

49

Here is a proper morphism of magmas:

(defun a-magma-morphism (&rest x)
   (labels ((mapper (u) (if (listp u) u (* u u))))
      (if (null (cadr x)) 
         (mapper (car x))
         (ucons (car x) (cadr x)))))

A-MAGMA-MORPHISM

(map-reduce just-a-tree #'a-magma-morphism)

(((25 . 1) ((9 . 25) . 16) 100 . 0) (81 1 1 . 9) . 49)

Here we have a morphism of magmas from the free magma generated by integers into the same magma where we send each generator x to x².

Using intermediate collections vs composition

Consider the following two morphisms of magmas

(defun m-reduce (oper)
   (lambda (&rest x)
      (let ((unit (funcall oper)))
         (cond ((null x) unit)
               ((null (cadr x)) (if (car x) (funcall oper (car x)) unit))
               ((null (car x)) (if (cadr x) (funcall oper (cadr x)) unit))
               (t (funcall oper (car x) (cadr x)))))))

M-REDUCE

(map-reduce just-a-tree (m-reduce #'+))

49

and

(defun m-map (f)
   (lambda (&rest x)
      (if (cadr x)
         (ucons (car x) (cadr x))
         (if (listp (car x)) (car x) (funcall f (car x))))))

M-MAP

(map-reduce just-a-tree (m-map (lambda (x) (* x x))))

(((25 . 1) ((9 . 25) . 16) 100 . 0) (81 1 1 . 9) . 49)

We are going to use the first to convert binary operators into magma morphisms while second is for unary operators otherwise known as ordinary functions. Notice that m-map begets a filtering function if we pass the following

(defun collect-if (pred)
   (lambda (x) 
      (cond ((listp x) x)
            ((funcall pred x) x)
            (t nil))))

COLLECT-IF

(map-reduce 
   just-a-tree
   (m-map (collect-if #'oddp)))

(((5 . 1) 3 . 5) (9 1 1 . 3) . 7)

The main problem occurs when we want to compose these operators. Even after this conceptual simplification composition is still not efficient, because we still have to create unnecessary intermediate collections when we compose

(map-reduce 
   (map-reduce 
      just-a-tree
      (m-map (collect-if #'oddp)))
   (m-reduce #'+))

35

because we apply the filter first and we create a temporary collection, and then apply a reduce operation adding the resulting numbers. In the language of monoids, we perform the calculation \[ f(g(a\cdot b)) = f(g(a)\cdot g(b)) = f(g(a))\cdot f(g(b)) \] However, composition of two monoid or magma morphisms is another monoid or magma morphism. Hence we can skip the middle step.

I need a composition operator for monoid morphisms:

(defun m-compose (f &rest g)
   (if g
      (lambda (&rest x)
         (cond ((null x) (funcall f))
               ((null (cadr x)) (funcall f (funcall (apply #'m-compose g) (car x))))
               (t (funcall f (car x) (cadr x)))))
      f))

M-COMPOSE

Let me test this:

(map-reduce 
   just-a-tree
   (m-compose 
      (m-reduce #'+)
      (m-map (lambda (x) (* x x)))
      (m-map (collect-if (lambda (x) (not (zerop x)))))))

317

An analysis

I had two versions of the same workload on a large random binary tree with approximately 10 million leaves:

(map-reduce
   (map-reduce
      (map-reduce 
         large-tree
         (m-map (collect-if #'evenp)))
      (m-map (lambda (x) (* x x))))
   (m-reduce #'+))

and

(map-reduce
   large-tree
   (m-compose
      (m-reduce #'+)
      (m-map (lambda (x) (* x x)))
      (m-map (collect-if #'evenp))))

I ran the code 5 times each. The averages are as follows:

Lisp	Time (sec)	Memory (1e9 B)
SBCL	3.1 vs 2.3	0.9 vs 0.9
CCL	6.4 vs 8.2	0.8 vs 1.4
ECL	44.3 vs 32.9	3.4 vs 4.1

I can’t see a drastic difference neither in running time or memory allocation for SBCL or ECL, but CCL does something wierd. Then again, I am not a professional coder. There might be some tricks to coax better performance out of the composition as opposed to using intermediate collections. I’ll have to implement this in another language to compare performances.