The Kitchen Sink and Other Oddities

Atabey Kaygun

From Magmas and Catamorphisms to Coalgebras of Computation

In a previous post I described how the syntax of applicative calculi can be organized by magmas and catamorphisms. The guiding idea was simple: application is binary, so application syntax naturally gives binary trees. A program is built by constructors, and a syntax-directed transformation is a fold over that inductive structure.

That is the algebraic half of the story.

In this post I want to describe the coalgebraic half. Algebra builds syntax. Coalgebra unfolds behaviour. A catamorphism consumes an abstract syntax tree. A coalgebraic semantics observes a running program as a state and asks: what can this state do next?

The distinction matters. A lambda term, an SKI term, or an iota term can be viewed syntactically as a tree. But a program is not merely a tree. Once reduced to a computational state, the program can receive arguments, expose a head form, and continue to another state. That transition structure is not best described by a fold. It is best described by a coalgebra.

The goal is to make this precise enough to compare calculi — not just programs inside one calculus, but entire systems like the untyped lambda calculus, SKI, and the iota calculus — and to say in a clean structural sense when two such systems present the same observable behaviour.

From syntax to reduced states

A formal applicative language is a set of well-formed program trees together with a total binary application constructor. In the simplest combinatory cases, this is literally a free magma: SKI terms are the free magma on the three generators \(\{S, K, I\}\), and iota terms are the free magma on the single generator \(\{\iota\}\).

For example, an SKI term is generated by

M ::= S | K | I | M M

so the syntax is the free magma on the set \(\{S, K, I\}\).

In Scala one might model this as follows.

sealed trait SKI
object SKI {
  case object S extends SKI
  case object K extends SKI
  case object I extends SKI
  case class App(left: SKI, right: SKI) extends SKI
}
trait SKI
object SKI

The constructor App is the magmatic product. A fold over this syntax is a catamorphism.

def foldSKI[A](term: SKI)(s: A, k: A, i: A, app: (A, A) => A): A =
  term match {
    case SKI.S          => s
    case SKI.K          => k
    case SKI.I          => i
    case SKI.App(l, r)  => app(foldSKI(l)(s, k, i, app), foldSKI(r)(s, k, i, app))
  }
def foldSKI[A](term: SKI)(s: A, k: A, i: A, app: (A, A) => A): A

This is the right abstraction for syntax-directed operations: computing the size of a term, translating it, pretty-printing it, or compiling it.

For instance:

def size(term: SKI): Int =
  foldSKI(term)(1, 1, 1, (m: Int, n: Int) => 1 + m + n)

println(size(SKI.App(SKI.App(SKI.S, SKI.K), SKI.K)))
5
def size(term: SKI): Int

But execution is not merely a fold. To execute a term, we must reduce it. And reduction takes us out of the world of pure syntax into a different world: the world of reduced states.

For SKI, the reduction rules are:

I x       -> x
K x y     -> x
S x y z   -> x z (y z)

Formally, a reduced sublanguage is a subset \(R \subseteq L\) of the full language \(L\), together with a reduction retraction \(\beta : L \to R\) satisfying \(\beta(r) = r\) for every already-reduced \(r \in R\). The retraction is the chosen operational semantics: it sends any term to its reduced representative.

The retraction must be compatible with application in the sense that \(\beta(MN) = \beta(\beta(M)\,\beta(N))\) for all \(M, N \in L\). This condition says that reducing the pieces and then applying gives the same result as applying and then reducing. It makes \(\beta\) a magma morphism from the raw syntax to the reduced state space.

Given a compatible retraction, the reduced magma \((\mathcal{R}, \cdot_\beta)\) has the transported product \(r \cdot_\beta s = \beta(rs)\). This is reduced application: apply, then normalize.

In Scala:

trait ReducedCalculus[R] {
  def app(r: R, s: R): R
}
trait ReducedCalculus

Here app(r, s) means reduced application — \(r \cdot_\beta s\) — not raw syntax formation.

Programs as states and the head/currying coalgebra

Once we have reduced states, we can stop thinking of a program as an inert syntax tree. A reduced program is now a state of a machine.

From such a state, there are two things we can ask.

First, what is currently visible? This is the head observation: a map \(\mathrm{hd} : \mathcal{R} \to B\) from reduced states to a set \(B\) of head labels. In SKI the labels might be \(\{\mathtt{HS}, \mathtt{HK}, \mathtt{HI}, \mathtt{HApp}\}\), recording which combinator or application form sits at the top.

Second, what happens if I feed it an argument? This is the transition function \(s \mapsto r \cdot_\beta s\), which sends any reduced argument to the next reduced state.

Together these give the head/currying coalgebra

\[\gamma_{\mathcal{C}} : \mathcal{R} \to B \times \mathcal{R}^{\mathcal{R}}, \qquad \gamma_{\mathcal{C}}(r) = \bigl(\mathrm{hd}(r),\; s \mapsto r \cdot_\beta s\bigr).\]

This is a coalgebra for the Moore functor \(F(X) = B \times X^{\mathcal{R}}\). Each state exposes an observable output in \(B\) and a transition function indexed by reduced inputs in \(\mathcal{R}\). The coalgebraic story begins after reduction. A reduced program is a state. It exposes a head and determines how it evolves under each possible argument. The terminal semantics of the Moore coalgebra \(\Theta_{\mathcal{C}} : \mathcal{R} \to B^{\mathcal{R}^*}\) records the complete curried head behaviour of a program. Behavioural equivalence \(\equiv_{\mathrm{hd}}\) is the kernel of this map.

Head observed applicative calculi

A head-observed applicative calculus is a tuple \(\mathcal{C} = (\mathcal{L}, \mathcal{R}, \beta, B, \mathrm{hd})\) assembling all these pieces: a formal language, a reduced sublanguage, a compatible retraction, a set of head labels, and the head map.

The category \(\mathbf{Calc}_{\mathrm{hd}}\) of head-observed applicative calculi, equipped with the class \(\mathcal{W}_{\mathrm{hd}}\) of behavioural weak equivalences, admits a Gabriel–Zisman localization \(\mathbf{Ho}_{\mathrm{hd}}(\mathbf{Calc})\). Two calculi are isomorphic in this localized category precisely when they present the same terminal head behaviours. This gives a clean conceptual division:

That division is the basis for comparing calculi. Lambda calculus, SKI, and iota have different syntax, but their standard translations are behavioural weak equivalences, making them isomorphic objects of \(\mathbf{Ho}_{\mathrm{hd}}(\mathbf{Calc})\). The algebraic side explains how programs are built and transformed. The coalgebraic side explains how reduced programs are observed and compared.

In Scala the coalgebra structure becomes:

trait HeadCoalgebra[R, B] {
  def head(r: R): B
  def step(r: R, arg: R): R

  def observe(r: R): (B, R => R) =
    (head(r), (arg: R) => step(r, arg))
}
trait HeadCoalgebra

A catamorphism consumes finite syntax. This coalgebra unfolds future behaviour.

A very small SKI evaluator

To keep the example concrete, let us implement a tiny SKI reducer.

sealed trait Term
object Term {
  case object S extends Term
  case object K extends Term
  case object I extends Term
  case class App(left: Term, right: Term) extends Term
}

import Term._
trait Term
object Term
import Term._

We can reduce by repeatedly contracting the outermost SKI redexes.

def reduceOnce(t: Term): Option[Term] =
  t match {
    case App(I, x) =>
      Some(x)

    case App(App(K, x), _) =>
      Some(x)

    case App(App(App(S, x), y), z) =>
      Some(App(App(x, z), App(y, z)))

    case App(l, r) =>
      reduceOnce(l).map(App(_, r))
        .orElse(reduceOnce(r).map(App(l, _)))

    case _ =>
      None
  }
def reduceOnce(t: Term): Option[Term]

Then normalize with a fuel bound to avoid nontermination.

def normalize(t: Term, fuel: Int = 1000): Term =
  if (fuel <= 0) t
  else
    reduceOnce(t) match {
      case Some(u) => normalize(u, fuel - 1)
      case None    => t
    }
def normalize(t: Term, fuel: Int): Term

The reduced product \(r \cdot_\beta s = \beta(rs)\) is now:

def reducedApp(r: Term, s: Term): Term =
  normalize(App(r, s))
def reducedApp(r: Term, s: Term): Term

This is the transition part of the coalgebra: \(\mathtt{step}(r, s) = \mathtt{reducedApp}(r, s)\).

Head labels

We next choose what we want to observe. For SKI, a simple head observation records whether the reduced term begins with \(S\), \(K\), \(I\), or an application-headed form.

sealed trait Head
object Head {
  case object HS   extends Head
  case object HK   extends Head
  case object HI   extends Head
  case object HApp extends Head
}
trait Head
object Head

A simple head function is:

def termHead(t: Term): Head =
  normalize(t) match {
    case S        => Head.HS
    case K        => Head.HK
    case I        => Head.HI
    case App(_,_) => Head.HApp
  }
def termHead(t: Term): Head

Now the head/currying coalgebra is:

object SKICoalgebra extends HeadCoalgebra[Term, Head] {
  def head(r: Term): Head = termHead(r)
  def step(r: Term, arg: Term): Term = reducedApp(r, arg)
}
object SKICoalgebra

Given a reduced state \(r\), the coalgebra exposes its current head and a transition function. For example, applying \(SKK\) to \(I\):

val r0 = normalize(App(App(S, K), K))
val (h, next) = SKICoalgebra.observe(r0)
println(s"head of SKK: $h")
println(s"SKK applied to I: ${next(I)}")
head of SKK: HApp
SKK applied to I: I
val r0: Term = App(App(S,K),K)
val h: Head = HApp
val next: Term => Term = HeadCoalgebra$$Lambda$1129/0x00007fb51c578000@21e5feb

where h is the current head and next is the function sending an argument to the next state.

This is the computational meaning of currying. A multi-argument program is observed one argument at a time. For example, the combinator \(K\) has the behaviour \(K\,x\,y \to x\). Coalgebraically, this is not one ternary operation. It is a sequence of unary transitions:

K      --x-->   K x
K x    --y-->   x

The coalgebra unfolds this process.

Terminal head semantics

The one-step coalgebra only tells us the current head and the next-state function. To know the complete observable behaviour of a state, we unfold it along every finite list of arguments.

Formally, the terminal head semantics of a calculus \(\mathcal{C}\) is the unique coalgebra morphism

\[\Theta_{\mathcal{C}} : \mathcal{R} \to B^{\mathcal{R}^*}\]

into the terminal coalgebra for the Moore functor \(F(X) = B \times X^{\mathcal{R}}\). Here \(\mathcal{R}^*\) denotes the set of finite lists of reduced arguments, and \(B^{\mathcal{R}^*}\) is the set of all functions from such lists to head labels.

The terminal coalgebra \(B^{\mathcal{R}^*}\) is equipped with the structure that sends a function \(\theta : \mathcal{R}^* \to B\) to \((\theta(\varnothing),\; s \mapsto \partial_s\theta)\), where \((\partial_s\theta)(s_1, \ldots, s_n) = \theta(s, s_1, \ldots, s_n)\). The morphism \(\Theta_{\mathcal{C}}\) is given explicitly by

\[\Theta_{\mathcal{C}}(r)(s_1, \ldots, s_n) = \mathrm{hd}(r \cdot_\beta s_1 \cdot_\beta \cdots \cdot_\beta s_n)\]

with \(\Theta_{\mathcal{C}}(r)(\varnothing) = \mathrm{hd}(r)\). This records, for every finite argument list, the head observable after applying \(r\) to that list.

In Scala, for a fixed finite list, we can compute this finite observation.

def runArgs(r: Term, args: List[Term]): Term =
  args.foldLeft(normalize(r)) { case (state, arg) =>
    reducedApp(state, normalize(arg))
  }

def observeAfter(r: Term, args: List[Term]): Head =
  termHead(runArgs(r, args))
def runArgs(r: Term, args: List[Term]): Term
def observeAfter(r: Term, args: List[Term]): Head

For example:

val kxy = observeAfter(K, List(I, K))
println(s"K I K -> observed head: $kxy")
// K I K -> I, so the observed head is HI.
K I K -> observed head: HI
val kxy: Head = HI

The full terminal behaviour \(\Theta_{\mathcal{C}}(r)\) is not a finite object in general. It is the function that knows the answer for every finite list of arguments.

Computationally, we approximate it by testing finite argument lists.

def behaviourSample(r: Term, tests: List[List[Term]]): Map[List[Term], Head] =
  tests.map(args => args -> observeAfter(r, args)).toMap
def behaviourSample(r: Term, tests: List[List[Term]]): Map[List[Term],Head]

This gives a finite approximation to the terminal head profile.

Behavioural equivalence

Two reduced states are head-indistinguishable, written \(r \equiv_{\mathrm{hd}} t\), when they have the same complete head profile:

\[\Theta_{\mathcal{C}}(r) = \Theta_{\mathcal{C}}(t).\]

Unwound, this means \(\mathrm{hd}(r \cdot_\beta s_1 \cdot_\beta \cdots \cdot_\beta s_n) = \mathrm{hd}(t \cdot_\beta s_1 \cdot_\beta \cdots \cdot_\beta s_n)\) for every finite list of reduced arguments \(s_1, \ldots, s_n \in \mathcal{R}\), including the empty list, which gives \(\mathrm{hd}(r) = \mathrm{hd}(t)\).

The relation \(\equiv_{\mathrm{hd}}\) is the kernel of \(\Theta_{\mathcal{C}}\): it is an equivalence relation, and it is stable under right application by a fixed argument — if \(r \equiv_{\mathrm{hd}} t\) then \(r \cdot_\beta s \equiv_{\mathrm{hd}} t \cdot_\beta s\) for every \(s \in \mathcal{R}\). When the head observation is additionally applicatively extensional (meaning \(r \equiv_{\mathrm{hd}} r'\) and \(s \equiv_{\mathrm{hd}} s'\) together imply \(r \cdot_\beta s \equiv_{\mathrm{hd}} r' \cdot_\beta s'\)), the quotient \(\mathcal{R}/{\equiv_{\mathrm{hd}}}\) inherits a quotient magma structure. This quotient is the behavioural image of \(\Theta_{\mathcal{C}}\): the minimized behaviour space determined by the chosen head observations.

In Scala we cannot test all lists, but we can test finite samples.

def behaviourallyEqualOn(r: Term, t: Term, tests: List[List[Term]]): Boolean =
  tests.forall(args => observeAfter(r, args) == observeAfter(t, args))
def behaviourallyEqualOn(r: Term, t: Term, tests: List[List[Term]]): Boolean

Let us compare \(I\) with a more complicated term that behaves like \(I\).

In SKI, the term \(SKK\) behaves as identity: \(S\,K\,K\,x \to K\,x\,(K\,x) \to x\).

Define:

val SKK: Term = App(App(S, K), K)
val SKK: Term = App(App(S,K),K)

Note that \(I\) and \(SKK\) differ when given no arguments — \(I\) has head \(\mathtt{HI}\) while \(SKK\) has head \(\mathtt{HApp}\), since it has not yet reduced to a primitive form. They are head-indistinguishable only after at least one argument is supplied, where both reach the same state. So we test only with at least one argument.

val tests =
  List(
    List(I),
    List(K),
    List(S),
    List(App(K, I)),
    List(App(App(S, K), K)),
    List(I, K),
    List(K, I)
  )

val sameOnTests = behaviourallyEqualOn(I, SKK, tests)
println(s"I and SKK behaviourally equal on tests: $sameOnTests")
I and SKK behaviourally equal on tests: true
val tests: List[List[Product with Term with java.io.Serializable]] = List(List(I), List(K), List(S), List(App(K,I)), List(App(App(S,K),K)), List(I, K), List(K, I))
val sameOnTests: Boolean = true

The terms \(I\) and \(SKK\) are not the same syntax. They are not the same tree. But they have the same curried head profile on every argument list of length at least one. This is the key shift: syntax equality compares trees; reduction equality compares reduced representatives; behaviour equality compares all future observations.

A category of calculi and its localization

The point of this coalgebraic view is not only to compare two programs inside one calculus. It also lets us compare whole calculi. The goal is a category \(\mathbf{Calc}_{\mathrm{hd}}\) whose objects are head-observed applicative calculi and whose morphisms are structure-preserving translations — maps on raw syntax, reduced states, and head labels, compatible with application, reduction, and head observation. The behavioural quotient functor \(Q_{\mathrm{hd}} : \mathbf{Calc}_{\mathrm{hd}} \to \mathbf{Set}\) sends each calculus \(\mathcal{C}\) to its quotient \(\mathcal{R}/{\equiv_{\mathrm{hd}}}\).

A morphism \(F : \mathcal{C} \to \mathcal{D}\) is a behavioural weak equivalence when \(Q_{\mathrm{hd}}(F)\) is a bijection on behavioural quotients. The class \(\mathcal{W}_{\mathrm{hd}}\) of such morphisms makes \((\mathbf{Calc}_{\mathrm{hd}}, \mathcal{W}_{\mathrm{hd}})\) a relative category. Gabriel–Zisman localization then yields the homotopy category

\[\mathbf{Ho}_{\mathrm{hd}}(\mathbf{Calc}) = \mathbf{Calc}_{\mathrm{hd}}[\mathcal{W}_{\mathrm{hd}}^{-1}]\]

in which two calculi become isomorphic precisely when they present the same head-observable reduced behaviours. The localization satisfies the expected universal property: every functor out of \(\mathbf{Calc}_{\mathrm{hd}}\) that sends behavioural weak equivalences to isomorphisms factors uniquely through the localization functor \(L : \mathbf{Calc}_{\mathrm{hd}} \to \mathbf{Ho}_{\mathrm{hd}}(\mathbf{Calc})\).

The lambda calculus, SKI, and the iota calculus have different syntax. But they can present the same applicative behaviours. A translation between calculi should preserve application, reduction, head observation, and behavioural indistinguishability.

For example, bracket abstraction translates lambda terms to SKI terms. Conversely, SKI combinators have lambda interpretations: \(I := \lambda x.\, x\), \(K := \lambda x.\,\lambda y.\, x\), and \(S := \lambda x.\,\lambda y.\,\lambda z.\, xz(yz)\). With compatible reductions and head observations, bracket abstraction and its inverse induce bijections on behavioural quotients, making them behavioural weak equivalences. The same holds for the encoding of SKI into the iota calculus. Combining these gives

\[\mathcal{C}_\lambda \cong \mathcal{C}_{\mathrm{SKI}} \cong \mathcal{C}_\iota \quad \text{in } \mathbf{Ho}_{\mathrm{hd}}(\mathbf{Calc}).\]

This is stronger and cleaner than saying merely that one syntax can encode another. It says that these three syntactic presentations determine the same behaviour space after quotienting by head indistinguishability. Different syntax, same terminal behaviour.

Recursion as diagonal closure

There is another point where the coalgebraic perspective clarifies the structure.

Every reduced program \(a\) determines a left-application map \(L_a(x) = a \cdot_\beta x\). A fixed point of \(a\) is a program \(p\) such that \(a \cdot_\beta p = p\). This is the abstract form of recursion.

A reduced magma \((\mathcal{R}, \cdot_\beta)\) is diagonally closed if for every \(a \in \mathcal{R}\) there exists \(d_a \in \mathcal{R}\) satisfying

\[d_a \cdot_\beta x = a \cdot_\beta (x \cdot_\beta x)\]

for every \(x \in \mathcal{R}\). Given such a \(d_a\), define \(p_a = d_a \cdot_\beta d_a\). Then

\[p_a = d_a \cdot_\beta d_a = a \cdot_\beta (d_a \cdot_\beta d_a) = a \cdot_\beta p_a,\]

so \(p_a\) is a fixed point of \(a\). This is a magma-level formulation of Lawvere’s diagonal argument: in any diagonally closed reduced magma, every representable endomap has a fixed point. It is the abstract reason that untyped lambda calculus, SKI, and iota support general recursion — each of them admits a diagonal operator, and \(Y\)-like combinators are concrete instances of the construction \(p_a = d_a \cdot_\beta d_a\).

Coalgebraically, a fixed point equation \(a \cdot_\beta p = p\) turns into an equation of terminal behaviours. Writing \(\partial_p \theta\) for the derivative \((\partial_p \theta)(s_1, \ldots, s_n) = \theta(p, s_1, \ldots, s_n)\), the equation becomes

\[\Theta_{\mathcal{C}}(p) = \partial_p\, \Theta_{\mathcal{C}}(a).\]

This says that the complete head profile of \(p\) is obtained by feeding \(p\) itself back into the behaviour of \(a\). Recursion is an algebraic fixed point internally, but it becomes a self-referential behaviour externally.

Where catamorphisms stop and coalgebras begin

The previous post was about magmas and catamorphisms. Syntax is inductive; syntax-directed computation is a fold. That remains true. Parsing, pretty-printing, compilation, translation, and de Bruijn conversion are naturally catamorphic.

But execution is different. A reduced program is not merely a tree to fold. It is a state whose future is probed by arguments. The correct structure is a coalgebra: a map from states to observations and next-state functions.

So the two halves fit together as follows. On the algebraic side we have

while on the coalgebraic side we have

Or more compactly: catamorphisms consume syntax; anamorphisms unfold behaviour.

A final Scala sketch

Here is the whole idea in a compact interface. A head-observed applicative calculus packages the raw language, the embedding of reduced states back into raw terms, the reduction retraction, and the head observation.

trait ApplicativeCalculus[Raw, Red, B] {
  def embed(r: Red): Raw
  def reduce(raw: Raw): Red
  def rawApp(m: Raw, n: Raw): Raw

  def reducedApp(r: Red, s: Red): Red =
    reduce(rawApp(embed(r), embed(s)))

  def head(r: Red): B

  def step(r: Red, s: Red): Red =
    reducedApp(r, s)

  def observe(r: Red): (B, Red => Red) =
    (head(r), (s: Red) => step(r, s))

  def observeAfter(r: Red, args: List[Red]): B =
    head(args.foldLeft(r)(step))
}
trait ApplicativeCalculus

The method reducedApp computes \(r \cdot_\beta s = \mathtt{reduce}(\mathtt{rawApp}(\mathtt{embed}(r), \mathtt{embed}(s)))\), the transported product of the reduced magma. The method observe is the head/currying coalgebra \(\gamma_{\mathcal{C}} : \mathcal{R} \to B \times \mathcal{R}^{\mathcal{R}}\). The method observeAfter computes finite pieces of the terminal semantics \(\Theta_{\mathcal{C}} : \mathcal{R}^* \to B\).

Two calculi \(\mathcal{C}\) and \(\mathcal{D}\) can be compared through morphisms in \(\mathbf{Calc}_{\mathrm{hd}}\). A morphism carries maps on raw terms, reduced states, and head labels, preserving rawApp, reduce, and head. It is a behavioural weak equivalence when it induces a bijection on the quotients \(\mathcal{R}_{\mathcal{C}}/{\equiv_{\mathrm{hd}}}\) and \(\mathcal{R}_{\mathcal{D}}/{\equiv_{\mathrm{hd}}}\). In the localized category \(\mathbf{Ho}_{\mathrm{hd}}(\mathbf{Calc})\), behavioural weak equivalences become isomorphisms, and the chain \(\mathcal{C}_\lambda \cong \mathcal{C}_{\mathrm{SKI}} \cong \mathcal{C}_\iota\) expresses the fact that abstraction, a finite combinator basis, and a one-combinator basis are three syntactic presentations of the same untyped recursive applicative behaviours.

Of course, in an actual implementation one cannot compare all functions \(\mathcal{R}^* \to B\). One approximates by finite tests, or proves equivalence mathematically. But the abstraction is still useful: it tells us what the exact semantic object is.