The Kitchen Sink and Other Oddities

Atabey Kaygun

Applicative Languages, Behaviour, and Diagonal Closure

In the previous post I described applicative calculi through a coalgebraic lens. The central object was a reduced state together with a head observation and a transition function. A program exposed a head label, accepted an argument, moved to a new reduced state, and unfolded into a terminal behaviour tree.

That was a useful picture, but it mixed two things that should be separated.

Reduction is one thing. Observation is another. In the current version I will keep the reduction structure and derives behaviour from it. Head labels and Moore coalgebras are no longer part of the basic definition. They may be added later, if one wants a particular observation scheme, but they should not be built into the notion of calculus itself.

The new guiding idea is simpler:

A calculus is a language, a sublanguage of reduced terms, and a partial reduction map.

Everything else follows from that.

What changed

The old formulation started from something like \(\gamma : R \longrightarrow B \times R^R\), where \(B\) was a set of head labels. The component \(R \to R^R\) recorded how a reduced state responds to an argument, while the component \(R\to B\) recorded the current head observation.

The new formulation drops \(B\) and drops \(\gamma\) from the definition. A calculus is now a triple \(\mathcal C=(L,R,\beta)\).

Here \(L\subseteq A^*\) is a formal language over an alphabet \(A\), \(R\subseteq L\) is the sublanguage of reduced terms, and \(\beta:L\rightharpoonup R\) is a partial reduction map. The map \(\beta\) splits the inclusion \(R\subseteq L\), so \(\beta(r)=r\) for every reduced term \(r\in R\). It is also compatible with application: for every \(a,b\in L\), the four partial expressions

\[ \beta(ab),\qquad \beta(\beta(a)b),\qquad \beta(a\beta(b)),\qquad \beta(\beta(a)\beta(b)) \]

are strongly Kleene equal.

This condition is the whole point. It says that reducing the function part, reducing the argument part, or reducing both before applying does not change the reduced result whenever the expressions are defined.

The reduced application operation is therefore not extra structure. It is induced by \(\beta\): \(r\cdot_\beta s:=\beta(rs)\). Thus \(R\) becomes a partial applicative magma. The language \(L\) is raw syntax. The sublanguage \(R\) is the state space. The map \(\beta\) is the passage from raw syntax to reduced state.

This is the main difference from the older coalgebraic version. Behaviour is no longer an independent coalgebra. Behaviour is the family of derived right-application maps \(\rho_r(s):=\beta(rs)\). The notation \(\rho_r\) is still useful, but \(\rho\) is no longer part of the definition.

Formal languages instead of free magmas

Another change is that \(L\) is no longer required to be a free magma or even a submagma. In many examples, raw application syntax is tree-like, and a free magma is the right object. But lambda calculus, control calculi, thunk-force calculi, and encoded syntaxes all have more structure than a bare binary tree.

The new definition only asks for a formal language \(L\subseteq A^*\). Application is represented by concatenation where concatenation makes sense. If \(a,b\in L\), the word \(ab\) may or may not belong to \(L\). Reduction is partial, and application is partial at the reduced level.

This partiality is not a defect. It is exactly what one expects from real calculi. Some terms do not normalize. Some concatenations are ill-sorted. Some strategy-dependent evaluations are not defined. Some translations introduce administrative syntax. The framework is built to tolerate this.

In code, the new interface is closer to the following:

trait ApplicativeLanguageCalculus[L, R] {
  def reduce(x: L): Option[R]
  def embed(r: R): L
  def concat(x: L, y: L): Option[L]

  def app(r: R, s: R): Option[R] =
    concat(embed(r), embed(s)).flatMap(reduce)
}

The method app is not primitive. It is reduced application: concatenate first, reduce second. The compatibility axiom says that reduction commutes with this process in the expected partial sense.

Examples

The usual examples still fit.

For SKI, let \(L\) be the language of SKI terms, encoded as words or parsed trees. Let \(R\) be the chosen normal forms, and let \(\beta\) send a normalizing term to its normal form. The rules are

I x       -> x
K x y     -> x
S x y z   -> x z (y z)

On a domain where normal forms exist and are unique, this gives an applicative language calculus.

For the iota calculus, the language is generated by a single combinator \(\iota\), with the usual encoding of \(S\) and \(K\). Again, normalizing terms reduce to normal forms, and reduced application is obtained by applying and normalizing.

For the untyped lambda calculus, one chooses a word-coding of terms. To avoid alpha-conversion noise, one may use de Bruijn indices. The reduced terms may be beta-normal forms, weak-head normal forms, or another chosen class of canonical representatives. The reduction map is partial, because not every untyped lambda term normalizes.

This partiality is essential. The framework does not pretend that the untyped lambda calculus has a total normalizer. It records the normalizing or strategy-normalizing fragment one chooses.

The same applies to Parigot’s \(\lambda\mu\)-calculus. The syntax is richer: terms, commands, continuation names, and control constructs. But the formal-language viewpoint handles this by tags in the alphabet. Ill-sorted expressions simply fall outside the domain of \(\beta\). The lambda calculus embeds as the intuitionistic fragment, while the control constructs extend the behaviour rather than merely re-presenting it.

Lazy variants fit in two different ways. One may add explicit thunk and force constructors, say \(\theta\) and \(\varphi\), together with rules such as \(\varphi(\theta t)\to t\). Or one may encode forcing applicatively, for example by setting \(\tau(\omega)=\omega I\).

The first construction extends the syntax. The second uses only the existing applicative structure. They are behaviourally equivalent only under additional hypotheses. They should not be declared equivalent just because they are both called lazy.

Morphisms of calculi

A morphism of calculi should preserve application after reduction. There is no separate category of raw morphisms in the current version.

If \(\mathcal C=(L,R,\beta)\) and \(\mathcal D=(L',R',\beta')\), a morphism \(F:\mathcal C\to \mathcal D\) is a function \(F:L\to L'\) with \(F(R)\subseteq R'\), compatible with reduction, \(F(\beta(x))\simeq \beta'(F(x))\), and compatible with reduced application, \(F(\beta(xy))\simeq \beta'(F(x)F(y))\)$.

Thus the restriction \(F|_R:R\to R'\) is a homomorphism of partial applicative magmas.

This is another important simplification. In the older approach it was tempting to separate syntax maps, reduction maps, and coalgebra maps. The new category keeps only the maps that respect the reduced applicative structure.

Behaviour without head labels

Since \(\rho_r(s)=\beta(rs)\) is derived, behaviour can be defined directly.

A reduced term \(r\in R\) determines its one-step response function

\[ \rho_r:R\rightharpoonup R, \qquad \rho_r(s)=\beta(rs). \]

Two reduced terms are one-step behaviourally equivalent if these partial functions agree. But one-step equivalence is not enough. After applying an argument, one may apply another argument, and so on.

The full behavioural profile of \(r\) is the partial function \(\Theta_r:R^*\rightharpoonup R\) defined by iterated reduced application:

\[ \Theta_r(s_1,\ldots,s_n) =(((r\cdot_\beta s_1)\cdot_\beta s_2)\cdots)\cdot_\beta s_n \]

whenever the right side is defined.

Reduced terms \(r,t\in R\) are behaviourally equivalent, written \(r\equiv_{\mathrm{be}}t\), when their complete finite response profiles agree: \(\Theta_r=\Theta_t\).

This is the replacement for the old head-observed equivalence. Instead of comparing head-labelled trees, we compare all finite reduced-application experiments.

The old coalgebraic semantics asked:

What heads do I see after all possible argument lists?

The new behavioural semantics asks:

What reduced states do I reach after all possible argument lists?

The new relation is therefore more intrinsic. It does not depend on a chosen observation alphabet.

Localization

Once calculi have behavioural quotients, one can compare whole calculi by inverting the translations that preserve behaviour.

A behaviour-respecting morphism is a morphism that sends behaviourally equivalent reduced terms to behaviourally equivalent reduced terms. Such a morphism induces a map on behavioural quotients:

\[ R/{\equiv_{\mathrm{be}}}\longrightarrow R'/{\equiv_{\mathrm{be}}}. \]

A behavioural weak equivalence is a behaviour-respecting morphism that induces a bijection on behavioural quotients.

The Gabriel–Zisman localization

\[ \mathrm{Ho}_{\mathrm{be}}(\mathbf{Calc}) =\mathcal W_{\mathrm{be}}^{-1}\mathbf{Calc}_{\mathrm{be}} \]

is the category in which behavioural weak equivalences become isomorphisms.

This is the correct categorical home for statements such as \(\mathcal C_\lambda \simeq \mathcal C_{\mathrm{SKI}} \simeq \mathcal C_\iota\). These calculi are not literally the same. They do not have the same syntax. Their translations are not inverse as raw functions. But after one passes to behavioural quotients, bracket abstraction, lambda interpretation, and the iota encoding give inverse behaviour classes, provided compatible reductions are chosen.

The control calculus \(\lambda\mu\) is different. The lambda calculus embeds into it as a fragment, but \(\lambda\mu\) has continuation behaviour not present in the ordinary lambda calculus. Thus the inclusion of the lambda fragment should not be treated as a behavioural equivalence unless one restricts to a subcalculus whose behaviours are already represented by lambda terms.

This distinction is one of the main benefits of the localized category. It separates conservative changes of syntax from genuine extensions of behaviour.

Diagonal closure

The next structural property is diagonal closure. A calculus is strictly diagonally closed if every reduced term \(a\in R\) has a reduced term \(d_a\in R\) satisfying

\[ d_a\cdot_\beta x = a\cdot_\beta(x\cdot_\beta x) \]

wherever the expressions are defined. In words, \(d_a\) represents the map \(x\longmapsto a(xx)\). This is exactly the operation needed for the fixed-point construction. If \(d_a\) exists and \(d_a\cdot_\beta d_a\) is defined, then \(f_a:=d_a\cdot_\beta d_a\) satisfies \(f_a=a\cdot_\beta f_a\). So strict diagonal closure gives fixed points.

There is also a weaker behavioural version. A calculus is behaviourally diagonally closed if the equation above holds only after passing to behavioural equivalence:

\[ d_a\cdot_\beta x \equiv_{\mathrm{be}} a\cdot_\beta(x\cdot_\beta x). \]

Strict diagonal closure implies behavioural diagonal closure. The converse need not hold. Behavioural diagonal closure says that the duplicating context exists up to finite applicative testing, not necessarily on the nose as a reduced term identity.

SKI is strictly diagonally closed because it contains enough combinators to build the usual diagonal operator. With

\[ I=SKK, \qquad \Delta=SII, \qquad d_a=S(Ka)\Delta, \]

the term \(d_a\) acts as \(x\mapsto a(xx)\). The iota calculus inherits the same property through its encoding of SKI. Lambda calculus has the usual diagonal abstraction \(\lambda x.a(xx)\) whenever this abstraction belongs to the chosen reduced/normalizing fragment or to an appropriate partial domain.

This is the algebraic core of recursion.

Normal forms, partial magmas, and a hierarchy

The simplified definition makes it possible to place many calculi into a single hierarchy:

\[ \text{normal-form calculi} \supset \text{partial applicative calculi} \supset \text{behaviorally diagonal calculi} \supset \text{strictly diagonal calculi} \supset \text{combinatorially complete calculi}. \]

The outermost class consists of normal-form systems: languages with chosen representatives. Not every normal-form system is applicative. To be applicative, the normal-form map must be compatible with application.

A partial applicative calculus is one whose reduced states form a partial magma under reduced application. Behaviourally diagonal calculi have duplication up to behavioural equivalence. Strictly diagonal calculi have duplication on the nose. Combinatorially complete calculi contain enough combinators, such as \(S\) and \(K\), to represent all applicative contexts of the relevant kind.

The inclusions should not be read as equivalences. Each step adds real structure.

For example, a convergent string rewriting system gives a normal-form calculus. It becomes an applicative language calculus only if normal forms are stable under application in the sense required by \(\beta\). A group with a normal-form language gives a multiplication-and-reduce example. A partial combinatory algebra gives an example near the right-hand side of the hierarchy if it has enough combinators to support diagonalization.

This is another difference from the older post. The old coalgebraic presentation was tailored to computational states and observations. The new presentation also captures ordinary algebraic normal-form systems. It shows that applicative calculi are not merely lambda-like systems. They are normal-form presentations of partial applicative magmas.

Where the new version lands

The new picture is more economical than the old one. The old picture was:

\[ \text{syntax}\quad\to\quad\text{reduction}\quad\to\quad\text{head/currying coalgebra}\quad\to\quad\text{terminal behaviour}. \]

The new picture is:

\[ \text{language }L \quad\xrightarrow{\ \beta\ } \text{reduced states }R \quad\leadsto\quad \text{derived applicative behaviour}. \]

The coalgebraic language is not gone. It has been demoted to an optional semantics. If one wants head observations, one may add them. If one wants Böhm trees, one may add them. If one wants terminal coalgebras, one may build them. But the calculus itself is just the normal-form presentation \((L,R,\beta)\).

That simplification changes the emphasis of the theory. A calculus is now a way of presenting a partial applicative magma by raw syntax and reduction. Behaviour is what reduced terms do under all finite lists of reduced arguments. Localization identifies calculi that present the same behaviour. Diagonal closure detects whether the calculus can internalize duplication and therefore recursion. Resource invariants explain why some calculi cannot do this.

Different syntax, same behaviour, only after reduction. That is the point.