The previous post ended with a stripped-down picture: \(L \xrightarrow{\ \beta\ } R\) where \(L\) is raw syntax, \(R\) is the reduced state space, and \(\beta\) is the reduction map. Now, I’ll take this as the starting point and ask what structure is already forced by this minimal data. The answer is surprisingly rich.
Once reduction is compatible with application, the reduced language \(R\) becomes a partial applicative magma where we have \(r\cdot_\beta s := \beta(rs)\). This product is not assumed to be associative. It is not a monoid multiplication. It is simply: apply first, then reduce. The compatibility axiom for \(\beta\) says that this operation is referentially transparent: replacing a term by its reduced representative does not change the reduced outcome of an application. This is the first conceptual upgrade. Reduction is not merely normalization. It is the mechanism that makes raw syntax behave through reduced representatives.
For example, in the \(\lambda\)-calculus, \((\lambda x.x)u\) has a syntactic application shape, but its intended reduced behavior is \(u\). In our language this is simply \(\beta((\lambda x.x)u)=\beta(u)\). If \(u\in R\), this becomes \(\beta((\lambda x.x)u)=u\). That is not an additional semantic principle. It is exactly the kind of referential transparency that the definition of an applicative language calculus is designed to capture.
The second upgrade is that every reduced term determines a partial function on reduced terms. For each \(a\in R\), define \(\lambda_a(x):=\beta(ax)\). This gives a map \(\lambda:R\to R^R\) given by \(a\mapsto \lambda_a\). We call the functions in the image of \(\lambda\) the functions computable in the calculus:
\[ \mathrm{Comp}_{\mathcal C}(R):=\operatorname{im}(\lambda). \]
Thus a reduced term is not only a state. It is also a program that computes a partial function on reduced inputs. This gives a clean definition of behavioral equivalence:
\[ a\equiv_{\mathrm{be}}b \quad\Longleftrightarrow\quad \lambda_a=\lambda_b. \]
Two reduced terms are behaviorally equivalent precisely when they compute the same partial function on reduced inputs. This replaces the previous finite-response-profile definition. We can still recover finite response profiles by iterating reduced application, but the left regular map is the simpler primitive object.
The map \(\lambda:R\to R^R\) is the standard behavioral coalgebra of an applicative calculus. It observes a reduced term by asking how it acts on all reduced inputs. There are other natural coalgebras of the same type. The right regular coalgebra is
\[ \rho:R\to R^R, \qquad \rho_a(x)=\beta(xa). \]
It observes how \(a\) behaves as an argument. The diagonal coalgebra is
\[ \delta:R\to R^R, \qquad \delta_a(x)=\beta(a(xx)). \]
It observes what happens when the input is duplicated before being passed to \(a\). These coalgebras are not parsing devices. The magma already records syntax. They are behavioral tests on the reduced state space. The left regular coalgebra asks:
What function does this reduced term compute?
The right regular coalgebra asks:
How does this reduced term behave when placed on the right?
The diagonal coalgebra asks:
What happens when an input is duplicated before being fed to this term?
This is where the coalgebraic viewpoint becomes useful again, but in a more disciplined form. The state space is always \(R\). We are not adding an external observation alphabet. We are studying maps \(R\to R^R\) derived from reduced application.
Diagonal closure becomes especially transparent in this language. A calculus is strictly diagonally closed if every diagonal behavior is computable by some reduced term. In other words, we now ask \(\delta(R)\subseteq \operatorname{im}(\lambda)\). Equivalently, for every \(a\in R\), there is a reduced term \(d_a\in R\) such that \(\lambda_{d_a}(x)=\delta_a(x)\) for every reduced input \(x\). Expanding the definitions, this says \(\beta(d_a x)=\beta(a(xx))\). So \(d_a\) computes the function \(x\longmapsto \beta(a(xx))\).
This is the exact structural content of diagonal closure: the calculus can internalize the operation of duplicating an input and feeding the result to \(a\). The fixed-point construction then becomes a one-line argument. Suppose \(f=\lambda_a\) is computable and \(d\) computes the diagonal map \(x\mapsto f(\beta(xx))\). Set \(p:=\beta(dd)\). Then
\[ p=\lambda_d(d)=f(\beta(dd))=f(p). \]
Thus diagonal closure gives fixed points. This is the abstract form of the usual self-application argument behind fixed-point combinators. The point is that we do not need to assume a full SK basis. We only need the diagonal maps to be computable.
I will separate two transparency principles.
Strict referential transparency is the compatibility of \(\beta\) with application. If two raw terms have the same reduced representative, replacing one by the other does not change the reduced result of any application. This is built into the definition of applicative language calculus.
Behavioral referential transparency is weaker and subtler. Behavioral equivalence is defined by
\[ a\equiv_{\mathrm{be}}b \quad\Longleftrightarrow\quad \lambda_a=\lambda_b. \]
But this equivalence need not automatically be a congruence for the reduced product. Behavioral referential transparency is the condition that
\[ a\equiv_{\mathrm{be}}a', \qquad b\equiv_{\mathrm{be}}b' \]
implies
\[ \beta(ab)\equiv_{\mathrm{be}}\beta(a'b'). \]
When this holds, the behavioral quotient \(R/{\equiv_{\mathrm{be}}}\) again carries a reduced partial magma structure. This matters because many arguments should hold only up to behavior, not necessarily on the nose. Behavioral diagonal closure, for instance, says that the diagonal maps are computable after passing to behavioral equivalence.
In SKI, the usual rules are
I x -> x
K x y -> x
S x y z -> x z (y z)
Choose a reduced language \(R_{\mathrm{SKI}}\) of normal forms and let \(\beta\) send a normalizing term to its normal form. Then the reduced product is \(r\cdot_\beta s=\beta(rs)\). The left regular coalgebra sends a reduced SKI term \(a\) to the partial function \(x\longmapsto \beta(ax)\). SKI is strictly diagonally closed. Let \(I=SKK\) and \(\Delta=SII\). Then \(\Delta x\) reduces to \(xx\). For each \(a\), the term \(d_a:=S(Ka)\Delta\) computes \(x\longmapsto a(xx)\). Therefore the diagonal coalgebra factors through the left regular coalgebra. This is the clean reason SKI has fixed points in the present framework. It is not because the syntax has a special tree shape. It is because the diagonal behavior is internally computable.
For the \(\lambda\)-calculus, there are several possible choices of reduced language. One may take \(R\) to be a language of finite normal forms, when they exist. Then \(\beta\) is partial: non-normalizing terms are sent to the base point. Reduced application is still meaningful: \(r\cdot_\beta s:=\beta(rs)\). Even if \(r\) and \(s\) are reduced, the raw application \(rs\) may not be reduced. For example, if \(r=\lambda x.x\), then\(r\cdot_\beta s=\beta((\lambda x.x)s)=s\) whenever \(s\in R\).
One may instead take \(R\) to be a coinductive language of Böhm trees. In that presentation, \(\beta\) is Böhm reduction: \(\beta(M)=\mathrm{BT}(M)\). Then the Böhm tree is not an extra coalgebra floating outside the calculus. It is the reduced state itself. The ordinary tree destructor is native structure on \(R\), while the behavioral coalgebra remains
\[ \lambda:T\mapsto (U\mapsto \beta(TU)). \]
This resolves a possible confusion. Böhm trees are not a second parsing of finite \(\lambda\)-syntax. The magma already gives the finite application tree. A Böhm-tree presentation changes the reduced language: reduced states are now coinductive observational normal forms.
This framework also turns behavioral equivalence into a categorical comparison. A morphism of calculi should preserve reduced application. A behavioral weak equivalence is a morphism that induces an isomorphism on behavioral quotients:
\[ R/{\equiv_{\mathrm{be}}} \longrightarrow R'/{\equiv_{\mathrm{be}}}. \]
By inverting these weak equivalences one obtains a Gabriel–Zisman localization \(\mathrm{Ho}_{\mathrm{be}}(\mathbf{Calc})\). This is the category in which different presentations of the same behavior become isomorphic. For example, \(\lambda\) calculus, SKI, and iota should be compared in this localized category. Their raw syntaxes are not the same. Their translations are not inverse as functions on syntax. But under compatible reductions, they may present the same behavioral quotient.
By contrast, \(\lambda\mu\) is not merely another syntax for \(\lambda\) calculus. It adds continuation behavior. Its inclusion of the lambda fragment need not be a behavioral weak equivalence unless one restricts to a subcalculus whose behaviors are already represented in \(\lambda\) calculus. This is one of the main benefits of the localized viewpoint. It separates conservative changes of presentation from genuine extensions of behavior.
We can now organize types of calculi into a hierarchy of the form
\[ \mathrm{Calc}^{\mathrm{cc}} \subseteq \mathrm{Calc}^{\Delta} \subseteq \mathrm{Calc}^{\mathrm{be}\Delta} \subseteq \mathrm{Calc} \subseteq \mathbf{NFCalc}. \]
Reading from right to left:
The inclusions are not equivalences. A normal-form system need not be applicative. An applicative calculus need not internalize diagonal maps. Behavioral diagonal closure need not be strict diagonal closure. Strict diagonal closure need not provide a full SK basis. Thus the hierarchy separates normalization, application compatibility, behavioral quotienting, diagonal computability, and full combinatory completeness.
We can now see why some calculi cannot be diagonally closed. Suppose there is an additive invariant \(w:R\to M\) with values in a cancellative commutative monoid, such that \(w(r\cdot_\beta s)=w(r)+w(s)\) whenever the reduced product is defined. Then diagonal closure forces this invariant to collapse on the domain of a diagonalizer. Intuitively, the diagonal map duplicates an input: \(x\longmapsto a(xx)\). Any invariant that counts resources should see two copies of \(x\) on the right but only one application of a diagonalizer on the left. If the invariant is genuinely additive and nonconstant, this is impossible on a sufficiently large domain. This explains why free magmas and resource-sensitive calculi fail diagonal closure. Diagonal closure is a contraction principle. It is not a harmless closure property of syntax.
The simplified definition from the previous post remains as \(L\xrightarrow{\ \beta\ }R\). But from this data we get: \(r\cdot_\beta s:=\beta(rs)\) and \(\lambda_a(x):=\beta(ax)\) which gives us
\[ a\equiv_{\mathrm{be}}b \quad\Longleftrightarrow\quad \lambda_a=\lambda_b, \]
where \(\delta_a(x):=\beta(a(xx))\) and \(\delta(R)\subseteq\operatorname{im}(\lambda)\) as the coalgebraic formulation of diagonal closure.
So I will change the slogan changes slightly. In the previous post I said:
Different syntax, same behaviour, only after reduction.
Now, I will say
Behaviour is the left regular coalgebra of reduced application, and recursion appears exactly when the diagonal coalgebra is computable.