Typed lambda calculus

Notice: Experimental page format This is a page from the latest site version, rolled out only on a select few preview pages (as of late 2025). Some things many be out of place or broken; sorry about that!
type wiki
created 2025-03-16 22:42
modified 2025-09-10 02:34
Images
placeholder
Sketches
placeholder

TODO:

  • Dependent sum types

Typed lambda calculi are type systems with (anonymous) function abstraction from Lambda calculus.

Perhaps worth noting that both (untyped) lambda calculi and type systems are fundamentally isolated formal systems. The latter generally wrap up the notion of typing, and the former is canonically defined without the inclusion of any such mechanism (lambda terms are abstractions that apply to anything). Typed lambda calculi make up a class of type systems that “bring in” lambda terms axiomatically, along with the rules of introduction/application/reduction that we formalize in the general, untyped lambda calculus setting.

Lambda cube

The λ\lambda-cube is a framework that captures different binding behaviors in type systems as “movements” along dimensions over a 3D cube. Each vertex of the cube yields a particular type system with the respective binding behavior present as a typing rule, i.e., each system is associated with a point in {0,1}3\{0,1\}^3, and a 11 in a given dimension implies the presence of that typing rule.

Lambda cube, with the type systems 7

The dimensions correspond to different kinds of binding mechanisms between terms and types:

  • xx-axis: introduces dependent types, types that can depend on terms

  • yy-axis: introduces polymorphism, terms that can depend on types

  • zz-axis: introduces type constructors, types that can depend on (other) types

Starting from the simply typed lambda calculus λ\lambda_\rightarrow at (0,0,0)(0,0,0),

  • We move along the xx-axis to (1,0,0)(1,0,0) to get λΠ\lambda\Pi (also referred to as λP\lambda P), a first-order dependent type system.

  • We move along the yy-axis to (0,1,0)(0,1,0) to get λ2\lambda 2 (also referred to as System F), a polymorphic type system.

  • We move along the zz-axis to (0,0,1)(0,0,1) to get λω\lambda\underline{\omega}, a type system including type constructors.

These represent type systems that introduce each of the typing mechanisms in isolation. Systems at other vertices are those with combinations of these mechanisms. At (1,1,1)(1,1,1), we reach the calculus of constructions λC\lambda C, where types and terms can depend on types and terms. We dive into each of these systems and their respective typing mechanisms below.

Typing mechanisms

The lambda cube captures a few common typing mechanisms. Below we discuss these in greater detail. Simply put,

  • Dependent typing: types that depend on terms

  • Polymorphic typing: terms that depend on types

  • Type constructors: types that depend on types

Dependent typing (λΠ\lambda\Pi)

Π\Pi-type

The type system λΠ\lambda\Pi introduces dependent typing, facilitated primarily by the Π\Pi type (the dependent product type). This type captures the notion of a function whose return value’s type (as well as the value itself, of course) can vary with the argument value. That is to say, there’s no fixed codomain; both the output term and its type are effectively dynamically selected by the value of the input.

In particular, if B:AUB: A\rightarrow \mathcal{U}1, the type of a dependent function can be written x:AB(x)\prod_{x:A}B(x), capturing the fact such a term will map to a type determined by B(x)B(x). For example, if we write Vec(R,n)\text{Vec}(\mathbb{R}, n) to represent a real-valued nn-tuple, then n:NVec(R,n)\prod_{n:\mathbb{N}}\text{Vec}(\mathbb{R}, n) is how we “parameterize” it as a dependent type.

It’s worth noting that when B:AUB:A\rightarrow\mathcal{U} is a constant function (all a:Aa:A map to the same type), the dependent product type acts exactly the same as the usual function type. While we don’t fundamentally use this notation when defining function types2, it clarifies (for me) the use of notation seen elsewhere in mathematics3. We’re saying here that the type x:AB\prod_{x:A}B (no dependence on xx in BB) is equivalent to ABA\rightarrow B; they’re two different ways to write the same thing. Further, this product type can be re-written explicitly in the usual sense as

B×B××B,B\times B\times\cdots\times B,

but how is this the “type” of a function (since we’re claiming x:AB\prod_{x:A}B and ABA\rightarrow B are the same)? We can think of a concrete function as a long tuple where each tuple position corresponds to a particular input value in the domain (captured as an indexed family, say) and “chooses” an output value corresponding to that input4. That is, we’re thinking of functions in the more explicit, extensional, binary relation sense. In any case, such a tuple can be seen as an element of the product space written above; the concrete tuple represents a concrete function term, and its type can be seen as a product of the types associated with each output. All this to say: this isn’t a product space serving as a domain or codomain for some class of functions (which is how I originally got confused here), but is itself the space/type of all functions, when we think of particular functions as tuples of output values.

This lends itself back to the original context within which the notation was introduced: to express dependent types. When BB does vary with x:Ax:A, we can explicitly write our dependent function type as

Ba1×Ba2×Ba3×,B_{a_1}\times B_{a_2}\times B_{a_3}\times\cdots,

where ai:Aa_i : A ranges over all terms of type AA. This is a dependent product, and captures the space of possible dependent functions: terms that themselves map from all inputs ai:Aa_i:A to terms bi:B(ai)b_i:B(a_i). Here we simply get the extra flexibility to specify (loosely) a particular domain (type) for each input value (abstraction-bound variable). We formalize the construction of such a term with the introduction rule

Γ,x:AB:Γ(x:AB):\frac{\Gamma,x:A\vdash B:*}{\Gamma\vdash\left(\prod_{x:A}B\right):*}

which is really a simple statement that formally recognizes x:AB\prod_{x:A}B as a type (with “::*” read as “is of type type”). To use the example from before, n:NVec(R,n)\prod_{n:\mathbb{N}}\text{Vec}(\mathbb{R}, n) could be the type of a function that maps from an integer nn to the unit vector of length nn. Note how Vec(R,m)\text{Vec}(\mathbb{R}, m) is the type of a vector for a specific value of mm, but not any mm. So creating a function that produces a vector with a variable length (dependent on the function’s input) can be formed via the notion of a dependent type. Note how the introduction of this typing mechanism doesn’t formally increase the computational power of the type system (untyped lambda calculus is Turing complete already), but instead facilitates more logical expressiveness, e.g., how precisely we can encode logical constraints over terms. Without dependent types, for instance, we could still have a function that produces variable-length vectors, but from the perspective of statically verifying the logical correctness of further statements involving such functions, it becomes much more difficult (if not impossible) to make certain guarantees about the value or type of involved terms.

The type of dependent functions like this, in the general sense, is referred to as a Π\Pi type, often called the dependent function type (with expansion below on the use of the term “dependent product type” from before). But System λΠ\lambda\Pi also includes a type that’s dual to Π\Pi types: the Σ\Sigma type, also called the dependent pair type.

Σ\Sigma-type

Using similar notation from before, we write Σ\Sigma types as Σx:AB(x)\Sigma_{x:A}B(x). This captures the notion of ordered pairs where the second term’s type can depend on the value of the first. That is, if (a,b):x:AB(x)(a, b):\sum_{x:A}B(x), then a:Aa:A and b:B(a)b:B(a): the type of bb gets to depend on aa’s value. As before, we can expand this and try to make sense of its full syntactical implications:

x:AB(x)=Ba1+Ba2+Ba3+,\sum_{x:A}B(x) = B_{a_1} + B_{a_2} + B_{a_3}+ \cdots,

where ai:Aa_i : A ranges over all terms of type AA. Right away I find this a little confusing: I see arbitrarily many types being summed over (or for at least as many terms x:Ax:A), and yet this is the type of a pair of terms. With our dependent function type, our terms themselves were functions that have to account for all possible inputs x:Ax:A, so it seemed justified that we’d need to have a representation for each x:Ax:A in the associated type definition. But here, it’s a little surprising to see we compose more than just two types in order to capture the pair type.

Intuitively, all that’s happening here is the ++ is behaving like an “or” operator. Since bb’s type can depend on any term a:Aa:A, it does make sense that we’d need every possible choice in our type definition; it’s just that any term must “commit” to only one of them. So we need them all like before, but while ×\times indicates joint involvement (our term will use them all), ++ captures the notion of a choice among them (our term will only pick one, but it could be any of them).

In some sense (and I do mean that vaguely), this is actually more general than the Π\Pi type. In that case, our type captures the idea of assigning an output type to every possible input term, giving us terms (functions) that must actually make such a decision for all inputs, i.e., assigning a B(a)B(a) for every x:Ax:A. But here, we’re doing that exact same thing…just without the requirement to do it for all inputs at once. In a sense we “free up” the input as well, and our terms get to pick whichever input (along with the associated, dependent output type) they want, rather than needing to do it for all of them (doing it for all inputs leaves the only freedom in the output side of things; the function doesn’t get to pick specific inputs it should be defined over). So our pair is like any individual input-output slice of a full dependent function.

Interpreting Σ\Sigma types as spaces

We can sometimes think of types extensionally, representing spaces of all their possible constituent terms. Given the extra “degree of freedom” we described above, it actually feels harder to shape that corresponding space for Σ\Sigma types, or at least to capture how we should think about it. In the Π\Pi type setting, I know my function terms pick a type B(a)B(a) for every input aa (for some choice of BB), and all my concrete function terms under that BB will map any given input aa to a term b:B(a)b : B(a). I basically “stretch out” all those B(a)B(a) and I’ve got my space’s structure: all terms in that space are maps with the same output types when queried at the same inputs. The presence of the “term conditioning” from aa sort of disappears in the implicit order of my B(a)B(a); the type AA is expressed implicitly through the family of B(a)B(a)’s. We actually have to deal with terms of the type AA in order to “see” what types of BB will be, so it makes sense that we can’t exactly have a term a:Aa : A show up in the type’s definition. For Σ\Sigma types, we do a similar thing by packing in BB types in such a way that the dependence on terms a:Aa : A is made implicit. But our space is effectively more locally variable, and all of our terms don’t share the same array of output types like we saw with Π\Pi types (which acts as kind of a common thread, like a “stitch” in the space that captures the common structure of terms). Instead, our terms are even more “tightly dependent” and sort of stubbornly dissimilar, making it hard to capture the structure of the common space all terms share. So I resign myself to taking just that very structure: the (implicit) coupling of a term a:Aa:A with its paired type B(a)B(a), and we bag all the possible concrete couplings up into a set like {(a,b)a:A,b:B(a)}\{(a, b)|a:A,b:B(a)\}.

In general, I think I let this be more a sticking point than it needed to be, but my inability to let go of little confusions or frustrations sometimes leads to greater insight. I may end up reusing this outside this scope, but the point is that writing the “sum”

x:AB(x)=Ba1+Ba2+Ba3+,\sum_{x:A}B(x) = B_{a_1} + B_{a_2} + B_{a_3}+ \cdots,

is not really all that accurate. It should be thought of as a disjoint union, which explicitly lets us tag those types while bagging them all up like we want. That is, it just preserves the identity of the types B(a)B(a) using the aa that produced them, and those are the pairs that can make up our type space. So more like

x:AB(x)=({a1}×B(a1))+({a2}×B(a2))+,\sum_{x:A}B(x) = (\{a_1\}\times B(a_1)) + (\{a_2\}\times B(a_2))+ \cdots,

This still leaves B(ai)B(a_i) as a type, but we can interpret that as a set in the generic way, i.e., {bb:B(ai)}\{b | b:B(a_i)\}. That gives you a formal set product {(ai,b)b:B(ai)}\{(a_i, b) | b:B(a_i)\} for each ai:Aa_i:A, and when you let ++ mean \cup, you get your big, fully expanded disjoint union. That is your formal set-based interpretation; it’s clear how we can dynamically expand only the properly dependent pairs of values.

So when I said this space feels like has less structure, I actually still feel that’s an accurate depiction. Even if the little set-based jumps we make to expand the space are easy to follow, it’s harder to hook into common structure between these objects. You have the paired part of them, yes, but beyond that we literally just tag dependent types explicitly. It’s not some tightly wound, reduced structure, or even as like rigid as the functions with Π\Pi types where we at least have all objects stretching across the input space. Nope: we just slap a name tag on the types and put them in the bag, sort of the bare minimum to ensure they can be distinguished. And that very process of tagging is really the only structure we can even grab onto afterward, to group up “stubbornly dissimilar” terms (to reuse the phrase from earlier, which I again feel is accurate). That, I suppose, is really just a consequence of great flexibility: we’re literally defining a compositional type that is more or less an unimposing bag of other types. That’s as unstructured as you get when it comes to wrangling collections of objects: you’re adding no new structure or transforming things into a common shape. Perhaps that’s why it’s so slimy to me, so hard to just let be. It’s like I’m expecting more to be there, more structure to understand, when there in actual fact is none, on purpose.

Note the syntactic analogies to addition/multiplication/exponentiation, when the dependent function BB is constant:

  • The dependent pair Σx:AB\Sigma_{x:A}B can also be written A×BA \times B, which can be interpreted in the usual product space sense, where to each point in AA we attach an instance of BB. To be precise:
    • When B(x)B(x) varies with x:Ax:A, we can treat as as an indexed family of types/sets, indexed by AA

    • The dependent pair is then analogous to a disjoint union:

      Σa:AB(a)=a:AB(a)=a:A{(a,b)b:B(a)}\Sigma_{a:A}B(a) = \bigsqcup_{a:A}B(a) = \bigcup_{a:A}\{(a,b)|b:B(a)\}

    • When B(x)=BB(x)=B (i.e., is constant, no dependence on AA), the disjoint union is simply to the usual Cartesian product:

      Σa:AB=a:AB=A×B\Sigma_{a:A}B = \bigsqcup_{a:A}B = A\times B

    • Note how Σ\Sigma or \bigsqcup actually do some work in “breaking up” and “rearranging” BB; a set like {(a,B(a))a:A}\{(a, B(a)) | a:A\} that might attempt to attach to each aAa\in A the full type/space B(a)B(a) is only half way to something useful (and is just the graph of BB).5

    Accompanying visual for coproducts/dependent pair types
  • The dependent product Πx:AB\Pi_{x:A}B can also be written BAB^A, which can be interpreted as “multiplying” instances of BB for each element/term of AA. As mentioned above, when BB is static, this simply recovers the notion of an infinite product space housing functions f:ABf: A \rightarrow B. Otherwise, the product is composed of conditioned spaces produced by invoking B(x)B(x) for each x:Ax: A, and we usually don’t employ the compact representation BAB^A (it doesn’t really capture the BB’s dependence on AA, but could be implicitly understood).

    When we write out the product explicitly, we have a sequence of concrete products like we saw with the dependent pair, where each B(a)B(a) is a fixed set making up one part of a nested product:

    Πx:aB(x)=B(a1)×B(a2)×=Σb1:B(a1)[Σb2:B(a2)[Σbn1:B(an1)B(an)]]\begin{aligned} \Pi_{x:a}B(x) &= B(a_1) \times B(a_2) \times \cdots \\ &= \Sigma_{b_1:B(a_1)}\left[\Sigma_{b_2:B(a_2)}\cdots\left[\Sigma_{b_{n-1}:B(a_{n-1})}B(a_{n})\right]\right] \end{aligned}

    Here we see increasing levels of abstraction taking place with repeated application, again analogous to multiplication representing repeated addition, exponentiation representing repeated multiplication, etc.

Polymorphic typing (λ2\lambda 2)

Polymorphic types allow term definitions to depend on a specific type. For example, instead of saying something like

λx.x:αα,\lambda x.x:\alpha\rightarrow\alpha,

which can be interpreted as an identity function mapping from/to terms of type α\alpha (for a particular choice of of α\alpha), we instead have

λx.x:α.αα,\lambda x.x:\forall\alpha.\alpha\rightarrow\alpha,

which defines an identity function polymorphic in α\alpha, meaning any (every) choice of α\alpha is allowed. For the former, the type αα\alpha\rightarrow\alpha describes a monomorphic “family” of types since α\alpha can be anything. But the term will always involve a particular choice, e.g., be the identity function from/to terms of int or bool types. In the polymorphic case, we never require such specificity: the identity function must work for all types (as in, we could pick any one of them). This is kind of like saying we get one term that is “type aware,” whereas in the monomorphic case we’d actually have to define multiple (separate) terms if we wanted to work with different choices of α\alpha.6

System F is a second-order typed lambda calculus, formalizing parametric polymorphism. In simply typed lambda calculus, this isn’t exactly a native element of the type theory, so universally quantified type variables are formalized. For example, in the case of our identity function, we have

Λα.λxα.x:α.αα,\vdash \Lambda\alpha. \lambda x^\alpha. x : \forall\alpha.\alpha\rightarrow\alpha,

where α\alpha is a type variable, and Λ\Lambda refers to a type-level function. This is a judgment that says, as a function of type α\alpha, the identity function of a bound variable xx of type α\alpha has type αα\alpha\rightarrow\alpha. The entire left-hand expression, including the outer Λ\Lambda term, is of the single “most general type,” involving a universally quantified type variable, α.αα\forall\alpha.\alpha\rightarrow\alpha. It is the most general in that every other type we might feasibly assign to the term can be arrived at via substitution of this one, i.e., are less general.

In System F, we actually formalize the universally quantified type variable as a single type by a typing rule:

Γ,α  typeM:σΓΛα.M:α.  σ\frac{\Gamma,\alpha\;\text{type}\vdash M:\sigma}{\Gamma\vdash\Lambda\alpha.M:\forall\alpha.\;\sigma}

Here we’re saying, when MM is of type σ\sigma and we take α\alpha to be a type variable, then Λα.M\Lambda\alpha.M is of type α.  σ\forall\alpha.\;\sigma. Notice how it’s not fundamentally a function type or anything; this is us actually defining a new type and its syntax, axiomatically. The quantifier \forall is not purely symbolic, however; it is actually signifying quantification over all types

We also have the rule describing the application of Λ\Lambda terms:

ΓM:α.  σΓMτ:σ[α:=τ]\frac{\Gamma\vdash M:\forall\alpha.\;\sigma}{\Gamma\vdash M\tau : \sigma[\alpha:=\tau]}

which just says how when a term τ\tau is applied to a polymorphic term MM, it’s as if we’re instantiating the whole template under that type, and replace any dependence on α\alpha in σ\sigma with τ\tau. Bringing back the identity example from above, we have

(Λα.λxα.x)(Bool):BoolBool\vdash (\Lambda\alpha. \lambda x^\alpha. x)(\text{Bool}) : \text{Bool}\rightarrow\text{Bool}

Bool\text{Bool} is applied to the Λ\Lambda term, effectively:

  • Stripping off the outer Λα\Lambda\alpha

  • Binding xx in λ\lambda to type Bool\text{Bool} (although the RHS type signature really does this as well)

  • By the application rule, for the type side we strip off α\forall\alpha and replace the references in term (which was αα\alpha\rightarrow\alpha) with Bool\text{Bool}

So we effectively end up with the term λx.x:BoolBool\lambda x.x:\text{Bool}\rightarrow\text{Bool}. We can think of the polymorphic term as a producer of a family of such concrete terms; the polymorphic term itself is of a second-order, producing a first-order, monomorphic term after application.

(See also: Type variable)

Type constructors (λω\lambda\underline{\omega})

In System FωF \underline{\omega}, we introduce type construction. This is a mechanism that effectively brings types into the term space, loosely speaking: types are allowed to be part of terms, e.g., taken as inputs to a function. Such terms themselves have types, which we call kinds.

Type constructors facilitate the creation of new types from existing ones. They can be formally seen as nn-ary type operators, returning a type from nn types passed as arguments. Note that we typically write this in a curried manner as needed, like other functions of many variables. As such, type operators can be seen as functions in a higher-order type theory (namely a simply typed lambda calculus) with one basic type *, representing the type of all types in the underlying language.

For instance, if s:σs: \sigma, then σ:\sigma: *. Both σ\sigma and * are types, albeit at different “levels;” we generally distinguish them as “proper types” and “kinds,” respectively.

Kinds are effectively declarations of type constructor arity (rather than the “type of a type”). Given just a single type in the “kind system” is allowed, first-order type operators are simply curried functions of proper types and look like * \rightarrow \cdots \rightarrow *:

  • * is the kind of all proper types (also called “nullary types,” i.e., the constant result of a type constructor with no inputs). This is pronounced “type,” e.g., “σ:\sigma:* indicates σ\sigma is of type type.”

    To be clear, even though “*” means “type,” it is part of a kind system. We say “nat\text{nat} is a type” (judgment “nat Type\vdash \text{nat Type}” in the base-level type theory) and “* is a kind” (judgment “  Type\vdash *\;\text{Type}” in the higher-level type/kind theory).

  • *\rightarrow * is the kind of a unary type constructor, such as that of a list type, e.g., lista:\text{list} a: *\rightarrow * where aa is the type of the list elements.

  • *\rightarrow *\rightarrow * is the kind of a binary type constructor, e.g., the function type constructor. That is, something like (λστ.στ):(\lambda\sigma\tau.\sigma\rightarrow\tau) : *\rightarrow *\rightarrow * as a judgment the kind system (although I don’t know if we really ever embrace such an explicit formation given functions are usually primitives, but I think it demonstrates the point).

A higher-order type constructor is one that itself maps from a type constructor to a proper type. This might have the kind ()(*\rightarrow *)\rightarrow *, for instance, and helps describe the kind of more complex constructs like monads.

Note that we used λω\lambda\omega to denote the type system including both type constructors and polymorphism.

Pure type systems

Pure type systems are typed lambda calculi that include an arbitrary number of sorts and dependencies between them. This generalizes the lambda cube, from which perspective we can view its “corners” as instances of pure type systems with just two sorts.

In particular, a pure type system is a triple (S,A,R)(\mathcal{S}, \mathcal{A}, \mathcal{R}):

  • S\mathcal{S} is the set of sorts

  • AS2\mathcal{A}\subseteq\mathcal{S}^2 is the set of axioms; an axiom (s1,s2)(s_1, s_2) is a foundational statement that s1:s2s_1:s_2

  • RS3\mathcal{R}\subseteq\mathcal{S}^3 is the set of rules

Pure type systems “break down” a wall that previously wasn’t really visible; simply typed lambda calculus, for instance, is fundamentally constructed around terms and types, and we don’t really question those two “classes” of items. Pure type systems open that up, extending the notion of typing (allowing lower-order terms to be classified by higher-order terms) to more expansive hierarchies7.

System U

Subtyping

Subtyping captures a notion of substitutability between types. If SS is a subtype of TT, we typically write this S<:TS <: T, implying that SS can be used in place of TT in any context where TT is expected. Type systems like System F<:F_{<:} formalize this properly (e.g., with subtype judgments and associated rules), but from what I can tell, most resources stay away from the weeds and discuss record types and LSP.

More formally, subtype applicability is determined by subsumption. For a type T\mathrm{T}, we generally adopt an extensional view T\mathbf{T} (the set of all and an intensional view T\mathit{T}

…leaving off here. Taking some issue with the int/ext views of types, since it’s more set-theoretic. Want to get this right, mix in LSP, contrast with inheritance, and maybe get a clear link out to a relevant page on func prog.

-> Then do universally/exist. quant types (below); I feel like I still have yet to nail this down. Then do the semantics bit, equational theory. This last piece should wrap us around to algebraic DTs. Then maybe hit combinators, finally monoids and some Haskell examples.

-> Now back to types as sets with the paper

Quantification

Quantified types facilitate greater expressive power, appealing to higher level typing mechanisms. Universal quantification enables polymorphism through generic types, and existential quantification yields abstract data types (through information hiding). Together, they facilitate parametric data abstraction.

Generally speaking, quantification is a means of abstraction, and is facilitated by binders. Binding operators act on variables of a certain kind, and place (bind) them inside a particular context. That context is made abstract as a result: an expression is produced that isn’t tied to any particular concrete value. The resulting abstraction can therefore be seen as a generic template, and is accompanied by a notion of application, wherein concrete values can be supplied to “fill in” the template and make the entire expression concrete. Fundamentally we’re introducing a higher level means of operation, a construction above terms, a producer of terms. Lambda calculus is so powerful because we let this higher level construct be a term itself; it doesn’t exist strictly in a higher plane but is “flattened back out” to the same level as concrete terms. This enables abstractions to be arbitrarily nested (terms can be abstractions, and abstractions are defined over terms), and ultimately represent all computable functions (lambda calculus is Turing complete).

In the case of function abstraction, λ\lambda binds term variables: λx.c(x):AB\lambda x. c(x) : A\rightarrow B binds a term x:Ax:A in a context c(x):Bc(x):B. As we’ve already seen with polymorphic typing in λ2\lambda 2, type-level functions are introduced with the Λ\Lambda operator, which binds type variables in term space. Resulting terms are generic functions Λα.λxα.c(x)\Lambda\alpha. \lambda x^\alpha. c(x), and have a universally quantified type α.αα\forall\alpha.\alpha\rightarrow\alpha. This is fundamentally a new kind of term, i.e., functions that can effectively be parameterized by a type, and we’ve got a new type definition to go along with it.

Existentially quantified types are a bit more nuanced. For starters, existential quantification isn’t accompanied by a fundamentally new kind of binding operator in term space. Instead, it’s purely a mechanism for abstraction in type space, typically used to hide certain type-level specifics. The statement x:α.t(α)x : \exists\alpha. t(\alpha) allows us to specify xx only in terms of an exposed interface tt, i.e., some higher level type structure that, when parameterized by some type α\alpha, will represent xx’s real type. The whole point is that we can get away with not knowing or specifying α\alpha, and the binding operator \exists encapsulates that detail behind the visible structure seen in tt. Such a type is clearly amenable to high-level specifications of abstract data types: we can declare terms that verifiably observe certain type signatures and are invariant to any involved concrete types. Note that terms with existentially quantified types are introduced and eliminated with pack and unpack operators, respectively. These are fundamentally different than an operator like Λ\Lambda; the latter introduces an irreducible new primitive (type abstraction), whereas pack/unpack are more like rules for working with existentially quantified terms (and can be fundamentally represented via universal quantification).

While the notions of universal and existential quantification are similar in that types are being abstracted over, they say very different things. Informally, x:α.t(α)x : \exists\alpha. t(\alpha) lets us say that xx has type t(α)t(\alpha) for some particular α\alpha. This is quite different from x:α.t(α)x : \forall\alpha. t(\alpha), which suggests that xx is an abstraction that can handle all types α\alpha. In the former, xx has no such “awareness” of choosing a type for α\alpha; a specific choice of α\alpha is involved but we’re blind to its actual value, suggesting it could be any one choice. In the latter, xx is explicitly a generic construct that can handle all choices of α\alpha. There’s no notion of “plugging in” some particular choice of α\alpha to get xx’s “true” type; instead, xx is explicitly the thing (a generic function) that can handle any choice for α\alpha.

More formally, it’s important to recognize that both universal and existential quantification do ultimately represent new, singular types. They’re not just loose characterizations of families of types or possible type choices (as we might be inclined to interpret them), but formal types with typing rules, just like everything else. I often find myself forgetting this since \exists and \forall look to be loose wrappers on other types; it’s tempting to just think of them extensionally, as possible values for the types they “wrap.” This is perhaps a perfectly reasonable and intuitive thing to do (encouraged, even), but they are nevertheless still themselves considered formal types.

Bounded quantification

Bounded quantification is quantification (universal or existential) with subtyping.

Encoding datatypes

The power of lambda calculus becomes quickly apparent when we discuss repeated application of higher-order abstractions. One way to begin formalizing this is via Church encoding, and the Church-Turing thesis shows that any computable operator can be represented under this scheme.

Recursion

When defining recursive functions, we might reach for a familiar self-referential form. For example, with the factorial function:

fact = fun (n: Int) if n=0 then 1 else n * fact(n-1)

As a term, this is improperly defined: fact refers to itself before it is fully defined. We can imagine needing to parse a definition left to right before the LHS identifier is considered valid, but we encounter fact before we get all the way through, at which point fact is not a valid alias. In any case, we do this all the time in most programming languages, it’s just that it’s a convenient expression for something more formal. That something “lifts” the function into a functional:

fact = λf. λ(n: Int). if n=0 then 1 else n * f(n-1)

This is not intrinsically recursive: it is a function that takes another function f as input, embeds it in another function, and returns that. That is, something like the general form λf. λx. t(f(s(x))), which says f is a function that operates on a term s(x), and t is some transformation of f after it’s applied to s(x). For a quick example,

-- let s just be the identity function
s = λx. x

-- let t(f, x) mirror the factorial setup
t = λf. λx. x*f(x-1)

-- so our general term:
λf. λx. t(f(s(x)), x)

-- turns into
λf. λx. x*f(x-1)

-- and then if we define some function g
g = λy. y*y

-- we can apply our form to it like
( λf. λx. x*f(x-1) )( g )
-> λx. x*g(x-1)
-> λx. x*((x-1)*(x-1))

-- this final term is a function which we can reapply as a "new g"
h = λx. x*((x-1)*(x-1))

-- on the left is our general term, apply to h
( λf. λx. x*f(x-1) )( h )
-> λx. x*( (x-1)*( (x-2)*(x-2) ) )

What we’re seeing here is that each repeated application of the functional – using the last output function as input to the functional, producing the next function – approximates part of the full recursive behavior. The last term above just accounts for two recursive steps. That is, it’s like we just iteratively pack a snapshot of the function into itself one step at a time, and once we “call” the outer thing for some input x, we recursively apply the function as usual (by virtual of the fact we’ve already packed in all the nested calls). The point is that such a term actually explicitly packs the function unwrapping into a term so we don’t run into the definition issues, i.e., the usual implicit syntactic sugar we get away with in most programming languages. All of that function composition happens at runtime, dynamically, rather than being one huge compositional thing we’ve pre-expanded and can evaluate in one go. The last term in the above code block represents such an explicit term after two manual composition steps, which is clearly only the beginning of the lengths we’d need to go to get the full recursive thing (spoiler: we need to do this arbitrarily many times).

Note that this isn’t even the factorial function (I chose g(x)=x*x), but if we map this onto that setting, the equivalent would be a final two-step function λx. x*(x-1)*(x-2). You can see how x is allowed to be any value (assuming x: Int) which doesn’t guarantee we actually “recurse to a base case.” If we plug in x=10, for instance, we just get 10*9*8, which is not 10! (actually we do not get a convergent output whatsoever if we don’t reach the base case…more on that below). Typically, full recursive behavior is allowed to recurse arbitrarily many times until a base case is reached. But to ensure we can even do that for arbitrary inputs x, we need to have packed in arbitrarily many recursive steps in our expanded term. How can we possibly do that, given x itself may be arbitrarily large?

Our full recursive function is therefore equivalent to the limit of this process, i.e., the thing we approach as we repeat composition an arbitrary number of times. With more and more nested composition, we get closer to the thing, and in the limit, we embody the notion of infinite nested composition. Any further composition therefore changes nothing: that thing is already a fixed point. More on this below.

Because I’ve lost myself several times as I work this out, I think it may be helpful to spell things out slowly while I’m here in the weeds. This is almost entirely to help me now (the concept really is that slippery), but will hopefully be good starting point if I find myself back in this place in the future. Here’s how I’m seeing things now, step by step, rehashing many of the points above:

  • You’re trying to define the factorial function, say in a usual, “intuitive,” implicitly recursive way. You start with

    λ(n: Int). if n=0 then 1 else n * f(n-1)

    which establishes your base case and the recursive dependence on the subproblem for n-1, captured by calling yourself as f. But here we’ve merely used f to stand in for our function…how are we meant to actually get ourselves squeezed into f? This sort of already feels like the whole recursive function as we’d want to define it (in a common programming language). The problem is I can’t formally refer to this term in itself in the usual way. I explained this pretty straightforwardly above: there’s no way for f, the name we want to use to refer to the whole outer term, to be made available inside its own scope before that scope is even defined. It simply doesn’t make sense; there is no f at the time we want to use it.

  • So what do we do? Again, this signature already feels kind of right: we want our recursive function to take an integer as input. So we want to preserve that element of the structure without bulking up the term per se, but we need to add something in order to facilitate a meaningful self-reference as f inside the term. We can do this by parametrizing f like this:

    λf. λ(n: Int). if n=0 then 1 else n * f(n-1)

    This gives us a functional term where we now need to first supply a function f in order to “get back” to our expected signature. So this thing is not our desired function, but it’s a mechanism to build it. To be clear, the question we now are trying to answer is how we can even take a “snapshot” of our function and use it as f. This is where I really encountered that first feeling of limbo: I didn’t have a great sense for what we could even grab onto to put there.

  • Where to go from here? How to set f? We take the above form and some first, “lowest” function \bot (a canonical undefined that effectively “nukes” the output if reached rather than the base case) and apply it, giving us

    f0 = λ(n: Int). if n=0 then 1 else n * ⊥(n-1)

    So f0 is back to the direct function signature we’re after, having chosen a particular f to “embed.” We can then repeat this:

    f1 = λ(n: Int). if n=0 then 1 else n * f0(n-1)

    and so on, where f<n> represents a partial factorial function that includes n nested compositions. Note that f<n> is only well-defined for integers 0 to n given our function; for inputs greater than n, we never reach the base case and diverge (becoming undefined; we’ll basically end up making a call like ⊥(x) for some x > 0, blowing everything up). Additionally note that the n in f<n> only controls the depth of composition rather than directly restricting the input to the underlying function (I’ve had the tendency to interpret it as like improving the “coverage” of our input space, and while in this case it sort of does that indirectly, in general it does not determine the kind of inputs that produce concrete outputs).

  • With the above mechanism, we can let n tend toward infinity, which will step us ever closer to recovering an arbitrarily deep capacity for recursion. With some finite n, however, we can always supply inputs >n that cause our output to diverge in the function f<n> (i.e., never hit the base case), which differs from a true, fully dynamic recursive definition. We therefore look toward the limiting term, the term that can recurse dynamically. Such a term therefore cannot change under any further applications of composition (it already embodies a notion of “infinite composition;” to do it again would be like trying to take +1\infty+1. Put another way, the two mirrors are fully snapped parallel, and you have a term that simply never deals in finite levels of composition: you can’t “unravel” it with a finite number of applications), and is thus a fixed point.
    • We explain a bit more explicitly below, but it’s as if we’ve produced f∞. The internal reference to itself can’t be something finite; if you tried, similar to our above point, you’d have something like f<∞-1>, which is just f∞ again. So it literally has its full self inside itself: the internal thing isn’t smaller, and the outer composition isn’t larger. As confusing as it is (certainly given our construction from increasingly large finite composition steps, where f<n> is indeed larger than its internal use of f<n-1>), they’re the same.

  • To be clear about what we mean by fixed point: we’re saying that our true, fully recursive factorial function is the fixed point of the functional term

    G = λf. λ(n: Int). if n=0 then 1 else n * f(n-1)

    That is, this “builder term” G has our target recursive function as a fixed point. If our target function is called fact, we are therefore saying that G(fact) = fact: fact is the thing that already fully embeds/references itself. You cannot “squeeze” another level of composition into that f reference by calling G again (like we did above, in sequence, for finite levels).

    If this still feels slimy (and it certainly is for me; I literally find myself able to get it one second and lose it the next), go ahead and actually perform the application:

    G(fact) = λ(n: Int). if n=0 then 1 else n * fact(n-1)

    What we’re saying is: when we embed our fully recursive function fact into G, the function we get out is just fact again. You can’t “outsmart” it or “outwrap” it, as counterintuitive as it may be, since it feels you could always take another composition that the term you’re using can’t be “aware of,” that it can’t anticipate that you’re going to do it again. But no, it can be aware of further composition and it’ll have no effect: it literally includes its full self. That final point is really the best characterization in my opinion, and if you just can’t be satisfied with the formal argument, sit with that phrase until it sinks in.8

  • So how can we systematically get to this fixed point “right away,” as if we just defined things implicitly in the first place? The fixed-point, or Y, combinator. This is a higher-order functional, i.e., a function taking a functional, and it returns the fixed point function of that functional. We refer to this as fix\text{fix} or YY. In our above example, we’d say

    fix G = fact
    
    -- or equivalently
    Y G = fact
    
    -- and we have
    fix G = G(fix G) = G(G(fix G)) = ... = fact

    We actually define YY as

    Y=λf.(λx.f(x  x))(λx.f(x  x))Y = \lambda f. (\lambda x. f(x\;x)) (\lambda x. f(x\;x))

    When we apply YY to some functional gg, we can expand to find Yg=g(Yg)Y g = g (Y g). YY is a construct that literally builds a fully recursive function out of the wrapper gg. Intuitively, it basically unpacks the internal logic from gg and puts it back into itself (although I don’t feel that confident about it becoming anything too concrete; reducing the thing still looks like YgY g).

Combinators

Combinators are simply closed λ\lambda-expressions. In combinatory logic, we capture the full power of lambda calculus, but without the notion of abstraction (or at least the ability to construct new abstractions). Instead, we can take a few closed, axiomatic abstractions (called combinators), and application can be used to construct certain kinds of new functions. Composition of these combinators can then replicate any function from lambda calculus. This simplification removes much of the complexity (although also the convenience) of abstraction, and was originally introduced to eliminate quantified variables from other logic systems, effectively presenting an alternative means of capturing the same functionality from even more primitive operations. This makes it more of an interesting demonstration rather than a practical choice for a language due to its verbosity; here the SKI system is a prime example.

Semantics

Curry-Howard correspondence

The Curry-Howard correspondence relates computer programs to mathematical proofs. It is a formal link between typed lambda expressions and statements in mathematical logic, an isomorphism between the proof systems. The logic-to-type theory analogs are as follows:

  • Formula     \iff Type

  • Proof     \iff Term

  • Implication (    \implies)     \iff Function (\rightarrow)

  • Conjunction (\land)     \iff Product type (×\times)

  • Disjunction (\lor)     \iff Sum type (++)

  • Universal quantification (\forall)     \iff Dependent product type (Π\Pi)

  • Existential quantification (\exists)     \iff Dependent sum type (Σ\Sigma)

So we can craft types that are analogous to logical formulas/statements, and type inhabitance (i.e., creating a term of the type) corresponds to a notion of proof for that logical statement. Expanding this for the specific analogs above:

  • t:Tt:T; the proposition/type TT holds due to the proof/term tt

  • f:STf:S\rightarrow T; the function ff inhabits the function type STS\rightarrow T, which can be thought as a map from any proof of proposition SS to a proof of proposition TT. This embodies the notion that TT can be shown to hold if SS holds, precisely what we mean by implication.

  • (a,b):Σx:AB(x)(a,b):\Sigma_{x:A}B(x); the pair term (a,b)(a,b) provides some “witness” a:Aa:A such that B(a)B(a) holds. To be clear, there are two elements here: a proof in the usual sense above, with b:B(a)b:B(a) (where B(a)B(a) is a concrete proposition for a fixed aa that is inhabited by a term bb), as well as a provided a:Aa:A such that we actually inhabit B(a)B(a). The pair (a,b)(a,b) is therefore evidence that there exists some xx such that B(x)B(x) holds, precisely what we mean by aA,B(a)\exists a\in A, B(a).

  • f:Πx:AB(x)f:\Pi_{x:A}B(x); the dependent function term ff provides witness terms x:Ax:A such that every B(x)B(x) holds. We naturally think of dependent functions as maps with output terms with types that can depend on the input value. Therefore, the existence of some inhabiting function ff implies we’ve mapped from all input terms x:Ax:A to output terms y:B(x)y:B(x). That is, we effectively have a collection of pairs, each of which is similar to what we saw with Σ\Sigma types, with witnesses and proofs for all parameterizations of B(x)B(x). This is precisely what we mean by aA,B(a)\forall a\in A, B(a).

Order

  • Lattice of types: from Top to Bot

  • Order theory and hierarchies of types

  • Notion of least general type:
    • Note how, for a given term, we can say things like c : a -> a and c : b. The former is more specific, characterizing it as a function from/to a particular type, while the latter…actually I don’t get this, not really. For starters, we’re letting a and b be free type variables here, so if I think about how I’d actually write them for any concrete types, including as a UQ type, I don’t know how these are both correct. With Hindly-Milner, we see some discussion around generics also being able to inhabit specific typed variants (look at the Wikipedia discussion under “Type order”). But I don’t get this, since the term is a generic, at least as I’m used to, which makes it a distinct thing even if it can appear to be a particular typed version.


  1. This is a function from terms of type AA to “terms” of type U\mathcal{U}, where U\mathcal{U} is actually a type universe whose “terms” are themselves types. So BB maps from terms a:Aa:A to types. BB characterizes a family of types if we think about it extensionally (not actually needed, but helps evoke how I’m thinking about it at the moment), i.e., the type space that is the function’s image.
    ↩︎
  2. As far as I’ve seen, we just fundamentally take function types as given and never really need to throw some extra means of canonically specifying regular functions, which I guess is why I’ve not seen this prior to dependent type systems. Nevertheless, I like looking at this as an alternative way to color how I think about functions.
    ↩︎
  3. Here I’m referred to notion BAB^A to define function spaces, where BB is a codomain and AA is a domain. We can expand this to look just like our dependent product, even when BB does not dependent on any terms x:Ax:A. Intuitively, we’re just saying we have a “copy” of BB for every input x:Ax:A, and a point in the resulting product space is a choice of output for every input (i.e., a function).
    ↩︎
  4. For example, we could call (1,2,3)(1, 2, 3) a function in the context of an indexed family ff that is the identity function. ff maps a 0-indexed tuple position to the same value, and we associate with that input the output value in that tuple position. So the function is explicitly captured by

    {(f(0),1),(f(1),2),(f(2),3)}={(0,1),(1,2),(2,3)}\{(f(0), 1), (f(1), 2), (f(2), 3)\} = \{(0, 1), (1, 2), (2, 3)\}
    ↩︎
  5. This has been an annoying sticking point for me. I keep wanting to assign the whole space to each point aAa\in A, i.e., producing a collection of pairs (a,B(a))(a, B(a)), since that feels like it might be just as effective. But that really is just another way to write the function BB itself (again, just its graph), and doesn’t “unpack” the items of each B(a)B(a) into a new, “flattened” space.
    ↩︎
  6. This is a particular topic I’ve struggled with, one that has left me without a clear foundation for some time. When we say something like λx.x:αα\lambda x.x : \alpha\rightarrow\alpha, α\alpha is serving as a “type variable,” allowing for any choice of α\alpha. But even though α\alpha isn’t anything particular in the definition, it’s a placeholder for something that will be. In the polymorphic case, α\alpha also serves as a placeholder in the same way, but we include a term that explicitly “loops” over every type. We bind that type variable with the universal quantifier α\forall\alpha and make sure the term “handles them all,” explicitly. That is to say, it’s tied to none of them, and α\alpha isn’t free the way it is in the monomorphic case. Bottom line: the monomorphic type is defined around another fixed but unspecified type α\alpha and we’ll end up with a term that only works as a map from/to one fixed type (we’ll have to supply a choice for α\alpha when we instantiate). In the polymorphic case, that function is defined to work with all types; there’s not even a choice to be made for α\alpha during instantiation, since the function should be able to operate under a term of any type that gets passed in.
    ↩︎
  7. This got me (however counter-intuitively) questioning the notion of typing at all. What is the rule that is typing itself? We seem to take the notion of typing as a given, like an action that’s baked in. But why, and fundamentally what are we left with when it’s not present? After flailing for a minute, I realized we just get back to untyped lambda calculus. I’ve been so completely focused on types for so long that I kind of forgot what it’d even mean not to have them. But turns it out it’s not all that confusing, really, and untyped settings can generally just be seen as special cases of typed ones (where there’s implicitly just one type).
    ↩︎
  8. Maybe worth the explicit reminder that fact must know about GG for composition to have no effect. GG injects a function ff into its template structure, and the fixed point fact is completely built around that scaffolding. It’s not something strictly “pure” or independent of GG, and GG is therefore powerless to change it: it’s the construct that embodies infinite application of GG, and that’s why it has no effect.

    The +1\infty+1 analogy here is perfectly correct, but somehow even that often feels finite, like you’re still stacking an extra item to what’s ultimately just a very tall pile. Therefore it doesn’t do a great job of offloading the burden of understanding infinite composition; you have to break that annoyingly sticky finite thinking if you want to the fixed point to feel familiar. It’s the exact same analogy, but an infinitely extending spiral may feel a little nicer, or at least just a bit easier to visualize:

    Infinite spiraling
    When we wind up an additional cycle (compose once more), we get the same structure.
    ↩︎