Typed lambda calculus

Notice: Experimental page format

This is a page from the latest site version, rolled out only on a select few preview pages (as of late 2025). Some things many be out of place or broken; sorry about that!

type	wiki
created	2025-03-16 22:42
modified	2025-09-10 02:34

Images

placeholder

Sketches

placeholder

TODO:

Dependent sum types

Typed lambda calculi are type systems with (anonymous) function abstraction from Lambda calculus.

Perhaps worth noting that both (untyped) lambda calculi and type systems are fundamentally isolated formal systems. The latter generally wrap up the notion of typing, and the former is canonically defined without the inclusion of any such mechanism (lambda terms are abstractions that apply to anything). Typed lambda calculi make up a class of type systems that “bring in” lambda terms axiomatically, along with the rules of introduction/application/reduction that we formalize in the general, untyped lambda calculus setting.

Lambda cube

The $\lambda$ -cube is a framework that captures different binding behaviors in type systems as “movements” along dimensions over a 3D cube. Each vertex of the cube yields a particular type system with the respective binding behavior present as a typing rule, i.e., each system is associated with a point in $\{0,1\}^3$ , and a $1$ in a given dimension implies the presence of that typing rule.

The dimensions correspond to different kinds of binding mechanisms between terms and types:

$x$ -axis: introduces dependent types, types that can depend on terms
$y$ -axis: introduces polymorphism, terms that can depend on types
$z$ -axis: introduces type constructors, types that can depend on (other) types

Starting from the simply typed lambda calculus $\lambda_\rightarrow$ at $(0,0,0)$ ,

We move along the $x$ -axis to $(1,0,0)$ to get $\lambda\Pi$ (also referred to as $\lambda P$ ), a first-order dependent type system.
We move along the $y$ -axis to $(0,1,0)$ to get $\lambda 2$ (also referred to as System F), a polymorphic type system.
We move along the $z$ -axis to $(0,0,1)$ to get $\lambda\underline{\omega}$ , a type system including type constructors.

These represent type systems that introduce each of the typing mechanisms in isolation. Systems at other vertices are those with combinations of these mechanisms. At $(1,1,1)$ , we reach the calculus of constructions $\lambda C$ , where types and terms can depend on types and terms. We dive into each of these systems and their respective typing mechanisms below.

Typing mechanisms

The lambda cube captures a few common typing mechanisms. Below we discuss these in greater detail. Simply put,

Dependent typing: types that depend on terms
Polymorphic typing: terms that depend on types
Type constructors: types that depend on types

Dependent typing ( $\lambda\Pi$ )

$\Pi$ -type

The type system $\lambda\Pi$ introduces dependent typing, facilitated primarily by the $\Pi$ type (the dependent product type). This type captures the notion of a function whose return value’s type (as well as the value itself, of course) can vary with the argument value. That is to say, there’s no fixed codomain; both the output term and its type are effectively dynamically selected by the value of the input.

In particular, if $B: A\rightarrow \mathcal{U}$ ¹, the type of a dependent function can be written $\prod_{x:A}B(x)$ , capturing the fact such a term will map to a type determined by $B(x)$ . For example, if we write $\text{Vec}(\mathbb{R}, n)$ to represent a real-valued $n$ -tuple, then $\prod_{n:\mathbb{N}}\text{Vec}(\mathbb{R}, n)$ is how we “parameterize” it as a dependent type.

It’s worth noting that when $B:A\rightarrow\mathcal{U}$ is a constant function (all $a:A$ map to the same type), the dependent product type acts exactly the same as the usual function type. While we don’t fundamentally use this notation when defining function types², it clarifies (for me) the use of notation seen elsewhere in mathematics³. We’re saying here that the type $\prod_{x:A}B$ (no dependence on $x$ in $B$ ) is equivalent to $A\rightarrow B$ ; they’re two different ways to write the same thing. Further, this product type can be re-written explicitly in the usual sense as

$B\times B\times\cdots\times B,$

but how is this the “type” of a function (since we’re claiming $\prod_{x:A}B$ and $A\rightarrow B$ are the same)? We can think of a concrete function as a long tuple where each tuple position corresponds to a particular input value in the domain (captured as an indexed family, say) and “chooses” an output value corresponding to that input⁴. That is, we’re thinking of functions in the more explicit, extensional, binary relation sense. In any case, such a tuple can be seen as an element of the product space written above; the concrete tuple represents a concrete function term, and its type can be seen as a product of the types associated with each output. All this to say: this isn’t a product space serving as a domain or codomain for some class of functions (which is how I originally got confused here), but is itself the space/type of all functions, when we think of particular functions as tuples of output values.

This lends itself back to the original context within which the notation was introduced: to express dependent types. When $B$ does vary with $x:A$ , we can explicitly write our dependent function type as

$B_{a_1}\times B_{a_2}\times B_{a_3}\times\cdots,$

where $a_i : A$ ranges over all terms of type $A$ . This is a dependent product, and captures the space of possible dependent functions: terms that themselves map from all inputs $a_i:A$ to terms $b_i:B(a_i)$ . Here we simply get the extra flexibility to specify (loosely) a particular domain (type) for each input value (abstraction-bound variable). We formalize the construction of such a term with the introduction rule

$\frac{\Gamma,x:A\vdash B:*}{\Gamma\vdash\left(\prod_{x:A}B\right):*}$

which is really a simple statement that formally recognizes $\prod_{x:A}B$ as a type (with “ $:*$ ” read as “is of type type”). To use the example from before, $\prod_{n:\mathbb{N}}\text{Vec}(\mathbb{R}, n)$ could be the type of a function that maps from an integer $n$ to the unit vector of length $n$ . Note how $\text{Vec}(\mathbb{R}, m)$ is the type of a vector for a specific value of $m$ , but not any $m$ . So creating a function that produces a vector with a variable length (dependent on the function’s input) can be formed via the notion of a dependent type. Note how the introduction of this typing mechanism doesn’t formally increase the computational power of the type system (untyped lambda calculus is Turing complete already), but instead facilitates more logical expressiveness, e.g., how precisely we can encode logical constraints over terms. Without dependent types, for instance, we could still have a function that produces variable-length vectors, but from the perspective of statically verifying the logical correctness of further statements involving such functions, it becomes much more difficult (if not impossible) to make certain guarantees about the value or type of involved terms.

The type of dependent functions like this, in the general sense, is referred to as a $\Pi$ type, often called the dependent function type (with expansion below on the use of the term “dependent product type” from before). But System $\lambda\Pi$ also includes a type that’s dual to $\Pi$ types: the $\Sigma$ type, also called the dependent pair type.

$\Sigma$ -type

Using similar notation from before, we write $\Sigma$ types as $\Sigma_{x:A}B(x)$ . This captures the notion of ordered pairs where the second term’s type can depend on the value of the first. That is, if $(a, b):\sum_{x:A}B(x)$ , then $a:A$ and $b:B(a)$ : the type of $b$ gets to depend on $a$ ’s value. As before, we can expand this and try to make sense of its full syntactical implications:

$\sum_{x:A}B(x) = B_{a_1} + B_{a_2} + B_{a_3}+ \cdots,$

where $a_i : A$ ranges over all terms of type $A$ . Right away I find this a little confusing: I see arbitrarily many types being summed over (or for at least as many terms $x:A$ ), and yet this is the type of a pair of terms. With our dependent function type, our terms themselves were functions that have to account for all possible inputs $x:A$ , so it seemed justified that we’d need to have a representation for each $x:A$ in the associated type definition. But here, it’s a little surprising to see we compose more than just two types in order to capture the pair type.

Intuitively, all that’s happening here is the $+$ is behaving like an “or” operator. Since $b$ ’s type can depend on any term $a:A$ , it does make sense that we’d need every possible choice in our type definition; it’s just that any term must “commit” to only one of them. So we need them all like before, but while $\times$ indicates joint involvement (our term will use them all), $+$ captures the notion of a choice among them (our term will only pick one, but it could be any of them).

In some sense (and I do mean that vaguely), this is actually more general than the $\Pi$ type. In that case, our type captures the idea of assigning an output type to every possible input term, giving us terms (functions) that must actually make such a decision for all inputs, i.e., assigning a $B(a)$ for every $x:A$ . But here, we’re doing that exact same thing…just without the requirement to do it for all inputs at once. In a sense we “free up” the input as well, and our terms get to pick whichever input (along with the associated, dependent output type) they want, rather than needing to do it for all of them (doing it for all inputs leaves the only freedom in the output side of things; the function doesn’t get to pick specific inputs it should be defined over). So our pair is like any individual input-output slice of a full dependent function.

Interpreting

\Sigma

types as spaces

We can sometimes think of types extensionally, representing spaces of all their possible constituent terms. Given the extra “degree of freedom” we described above, it actually feels harder to shape that corresponding space for $\Sigma$ types, or at least to capture how we should think about it. In the $\Pi$ type setting, I know my function terms pick a type $B(a)$ for every input $a$ (for some choice of $B$ ), and all my concrete function terms under that $B$ will map any given input $a$ to a term $b : B(a)$ . I basically “stretch out” all those $B(a)$ and I’ve got my space’s structure: all terms in that space are maps with the same output types when queried at the same inputs. The presence of the “term conditioning” from $a$ sort of disappears in the implicit order of my $B(a)$ ; the type $A$ is expressed implicitly through the family of $B(a)$ ’s. We actually have to deal with terms of the type $A$ in order to “see” what types of $B$ will be, so it makes sense that we can’t exactly have a term $a : A$ show up in the type’s definition. For $\Sigma$ types, we do a similar thing by packing in $B$ types in such a way that the dependence on terms $a : A$ is made implicit. But our space is effectively more locally variable, and all of our terms don’t share the same array of output types like we saw with $\Pi$ types (which acts as kind of a common thread, like a “stitch” in the space that captures the common structure of terms). Instead, our terms are even more “tightly dependent” and sort of stubbornly dissimilar, making it hard to capture the structure of the common space all terms share. So I resign myself to taking just that very structure: the (implicit) coupling of a term $a:A$ with its paired type $B(a)$ , and we bag all the possible concrete couplings up into a set like $\{(a, b)|a:A,b:B(a)\}$ .

In general, I think I let this be more a sticking point than it needed to be, but my inability to let go of little confusions or frustrations sometimes leads to greater insight. I may end up reusing this outside this scope, but the point is that writing the “sum”

$\sum_{x:A}B(x) = B_{a_1} + B_{a_2} + B_{a_3}+ \cdots,$

is not really all that accurate. It should be thought of as a disjoint union, which explicitly lets us tag those types while bagging them all up like we want. That is, it just preserves the identity of the types $B(a)$ using the $a$ that produced them, and those are the pairs that can make up our type space. So more like

$\sum_{x:A}B(x) = (\{a_1\}\times B(a_1)) + (\{a_2\}\times B(a_2))+ \cdots,$

This still leaves $B(a_i)$ as a type, but we can interpret that as a set in the generic way, i.e., $\{b | b:B(a_i)\}$ . That gives you a formal set product $\{(a_i, b) | b:B(a_i)\}$ for each $a_i:A$ , and when you let $+$ mean $\cup$ , you get your big, fully expanded disjoint union. That is your formal set-based interpretation; it’s clear how we can dynamically expand only the properly dependent pairs of values.

So when I said this space feels like has less structure, I actually still feel that’s an accurate depiction. Even if the little set-based jumps we make to expand the space are easy to follow, it’s harder to hook into common structure between these objects. You have the paired part of them, yes, but beyond that we literally just tag dependent types explicitly. It’s not some tightly wound, reduced structure, or even as like rigid as the functions with $\Pi$ types where we at least have all objects stretching across the input space. Nope: we just slap a name tag on the types and put them in the bag, sort of the bare minimum to ensure they can be distinguished. And that very process of tagging is really the only structure we can even grab onto afterward, to group up “stubbornly dissimilar” terms (to reuse the phrase from earlier, which I again feel is accurate). That, I suppose, is really just a consequence of great flexibility: we’re literally defining a compositional type that is more or less an unimposing bag of other types. That’s as unstructured as you get when it comes to wrangling collections of objects: you’re adding no new structure or transforming things into a common shape. Perhaps that’s why it’s so slimy to me, so hard to just let be. It’s like I’m expecting more to be there, more structure to understand, when there in actual fact is none, on purpose.

Note the syntactic analogies to addition/multiplication/exponentiation, when the dependent function $B$ is constant:

The dependent pair $\Sigma_{x:A}B$ $Σ_{x : A} B$ can also be written $A \times B$ $A \times B$ , which can be interpreted in the usual product space sense, where to each point in $A$ $A$ we attach an instance of $B$ $B$ . To be precise:
- When $B(x)$ varies with $x:A$ , we can treat as as an indexed family of types/sets, indexed by $A$
- The dependent pair is then analogous to a disjoint union:
  
  $\Sigma_{a:A}B(a) = \bigsqcup_{a:A}B(a) = \bigcup_{a:A}\{(a,b)|b:B(a)\}$
- When $B(x)=B$ (i.e., is constant, no dependence on $A$ ), the disjoint union is simply to the usual Cartesian product:
  
  $\Sigma_{a:A}B = \bigsqcup_{a:A}B = A\times B$
- Note how $\Sigma$ or $\bigsqcup$ actually do some work in “breaking up” and “rearranging” $B$ ; a set like $\{(a, B(a)) | a:A\}$ that might attempt to attach to each $a\in A$ the full type/space $B(a)$ is only half way to something useful (and is just the graph of $B$ ).⁵
The dependent product $\Pi_{x:A}B$ can also be written $B^A$ , which can be interpreted as “multiplying” instances of $B$ for each element/term of $A$ . As mentioned above, when $B$ is static, this simply recovers the notion of an infinite product space housing functions $f: A \rightarrow B$ . Otherwise, the product is composed of conditioned spaces produced by invoking $B(x)$ for each $x: A$ , and we usually don’t employ the compact representation $B^A$ (it doesn’t really capture the $B$ ’s dependence on $A$ , but could be implicitly understood).

When we write out the product explicitly, we have a sequence of concrete products like we saw with the dependent pair, where each $B(a)$ is a fixed set making up one part of a nested product:

$\begin{aligned} \Pi_{x:a}B(x) &= B(a_1) \times B(a_2) \times \cdots \\ &= \Sigma_{b_1:B(a_1)}\left[\Sigma_{b_2:B(a_2)}\cdots\left[\Sigma_{b_{n-1}:B(a_{n-1})}B(a_{n})\right]\right] \end{aligned}$

Here we see increasing levels of abstraction taking place with repeated application, again analogous to multiplication representing repeated addition, exponentiation representing repeated multiplication, etc.

Polymorphic typing ( $\lambda 2$ )

Polymorphic types allow term definitions to depend on a specific type. For example, instead of saying something like

$\lambda x.x:\alpha\rightarrow\alpha,$

which can be interpreted as an identity function mapping from/to terms of type $\alpha$ (for a particular choice of of $\alpha$ ), we instead have

$\lambda x.x:\forall\alpha.\alpha\rightarrow\alpha,$

which defines an identity function polymorphic in $\alpha$ , meaning any (every) choice of $\alpha$ is allowed. For the former, the type $\alpha\rightarrow\alpha$ describes a monomorphic “family” of types since $\alpha$ can be anything. But the term will always involve a particular choice, e.g., be the identity function from/to terms of int or bool types. In the polymorphic case, we never require such specificity: the identity function must work for all types (as in, we could pick any one of them). This is kind of like saying we get one term that is “type aware,” whereas in the monomorphic case we’d actually have to define multiple (separate) terms if we wanted to work with different choices of $\alpha$ .⁶

System F is a second-order typed lambda calculus, formalizing parametric polymorphism. In simply typed lambda calculus, this isn’t exactly a native element of the type theory, so universally quantified type variables are formalized. For example, in the case of our identity function, we have

$\vdash \Lambda\alpha. \lambda x^\alpha. x : \forall\alpha.\alpha\rightarrow\alpha,$

where $\alpha$ is a type variable, and $\Lambda$ refers to a type-level function. This is a judgment that says, as a function of type $\alpha$ , the identity function of a bound variable $x$ of type $\alpha$ has type $\alpha\rightarrow\alpha$ . The entire left-hand expression, including the outer $\Lambda$ term, is of the single “most general type,” involving a universally quantified type variable, $\forall\alpha.\alpha\rightarrow\alpha$ . It is the most general in that every other type we might feasibly assign to the term can be arrived at via substitution of this one, i.e., are less general.

In System F, we actually formalize the universally quantified type variable as a single type by a typing rule:

$\frac{\Gamma,\alpha\;\text{type}\vdash M:\sigma}{\Gamma\vdash\Lambda\alpha.M:\forall\alpha.\;\sigma}$

Here we’re saying, when $M$ is of type $\sigma$ and we take $\alpha$ to be a type variable, then $\Lambda\alpha.M$ is of type $\forall\alpha.\;\sigma$ . Notice how it’s not fundamentally a function type or anything; this is us actually defining a new type and its syntax, axiomatically. The quantifier $\forall$ is not purely symbolic, however; it is actually signifying quantification over all types

We also have the rule describing the application of $\Lambda$ terms:

$\frac{\Gamma\vdash M:\forall\alpha.\;\sigma}{\Gamma\vdash M\tau : \sigma[\alpha:=\tau]}$

which just says how when a term $\tau$ is applied to a polymorphic term $M$ , it’s as if we’re instantiating the whole template under that type, and replace any dependence on $\alpha$ in $\sigma$ with $\tau$ . Bringing back the identity example from above, we have

$\vdash (\Lambda\alpha. \lambda x^\alpha. x)(\text{Bool}) : \text{Bool}\rightarrow\text{Bool}$

$\text{Bool}$ is applied to the $\Lambda$ term, effectively:

Stripping off the outer $\Lambda\alpha$
Binding $x$ in $\lambda$ to type $\text{Bool}$ (although the RHS type signature really does this as well)
By the application rule, for the type side we strip off $\forall\alpha$ and replace the references in term (which was $\alpha\rightarrow\alpha$ ) with $\text{Bool}$

So we effectively end up with the term $\lambda x.x:\text{Bool}\rightarrow\text{Bool}$ . We can think of the polymorphic term as a producer of a family of such concrete terms; the polymorphic term itself is of a second-order, producing a first-order, monomorphic term after application.

(See also: Type variable)

Type constructors ( $\lambda\underline{\omega}$ )

In System $F \underline{\omega}$ , we introduce type construction. This is a mechanism that effectively brings types into the term space, loosely speaking: types are allowed to be part of terms, e.g., taken as inputs to a function. Such terms themselves have types, which we call kinds.

Type constructors facilitate the creation of new types from existing ones. They can be formally seen as $n$ -ary type operators, returning a type from $n$ types passed as arguments. Note that we typically write this in a curried manner as needed, like other functions of many variables. As such, type operators can be seen as functions in a higher-order type theory (namely a simply typed lambda calculus) with one basic type $*$ , representing the type of all types in the underlying language.

For instance, if $s: \sigma$ , then $\sigma: *$ . Both $\sigma$ and $*$ are types, albeit at different “levels;” we generally distinguish them as “proper types” and “kinds,” respectively.

Kinds are effectively declarations of type constructor arity (rather than the “type of a type”). Given just a single type in the “kind system” is allowed, first-order type operators are simply curried functions of proper types and look like $* \rightarrow \cdots \rightarrow *$ :

$*$ is the kind of all proper types (also called “nullary types,” i.e., the constant result of a type constructor with no inputs). This is pronounced “type,” e.g., “ $\sigma:*$ indicates $\sigma$ is of type type.”

To be clear, even though “ $*$ ” means “type,” it is part of a kind system. We say “ $\text{nat}$ is a type” (judgment “ $\vdash \text{nat Type}$ ” in the base-level type theory) and “ $*$ is a kind” (judgment “ $\vdash *\;\text{Type}$ ” in the higher-level type/kind theory).
$*\rightarrow *$ is the kind of a unary type constructor, such as that of a list type, e.g., $\text{list} a: *\rightarrow *$ where $a$ is the type of the list elements.
$*\rightarrow *\rightarrow *$ is the kind of a binary type constructor, e.g., the function type constructor. That is, something like $(\lambda\sigma\tau.\sigma\rightarrow\tau) : *\rightarrow *\rightarrow *$ as a judgment the kind system (although I don’t know if we really ever embrace such an explicit formation given functions are usually primitives, but I think it demonstrates the point).

A higher-order type constructor is one that itself maps from a type constructor to a proper type. This might have the kind $(*\rightarrow *)\rightarrow *$ , for instance, and helps describe the kind of more complex constructs like monads.

Note that we used $\lambda\omega$ to denote the type system including both type constructors and polymorphism.

Pure type systems

Pure type systems are typed lambda calculi that include an arbitrary number of sorts and dependencies between them. This generalizes the lambda cube, from which perspective we can view its “corners” as instances of pure type systems with just two sorts.

In particular, a pure type system is a triple $(\mathcal{S}, \mathcal{A}, \mathcal{R})$ :

$\mathcal{S}$ is the set of sorts
$\mathcal{A}\subseteq\mathcal{S}^2$ is the set of axioms; an axiom $(s_1, s_2)$ is a foundational statement that $s_1:s_2$
$\mathcal{R}\subseteq\mathcal{S}^3$ is the set of rules

Pure type systems “break down” a wall that previously wasn’t really visible; simply typed lambda calculus, for instance, is fundamentally constructed around terms and types, and we don’t really question those two “classes” of items. Pure type systems open that up, extending the notion of typing (allowing lower-order terms to be classified by higher-order terms) to more expansive hierarchies⁷.

(See also: Pure type system (nLab))

System U

Subtyping

Subtyping captures a notion of substitutability between types. If $S$ is a subtype of $T$ , we typically write this $S <: T$ , implying that $S$ can be used in place of $T$ in any context where $T$ is expected. Type systems like System $F_{<:}$ formalize this properly (e.g., with subtype judgments and associated rules), but from what I can tell, most resources stay away from the weeds and discuss record types and LSP.

More formally, subtype applicability is determined by subsumption. For a type $\mathrm{T}$ , we generally adopt an extensional view $\mathbf{T}$ (the set of all and an intensional view $\mathit{T}$ …

…leaving off here. Taking some issue with the int/ext views of types, since it’s more set-theoretic. Want to get this right, mix in LSP, contrast with inheritance, and maybe get a clear link out to a relevant page on func prog.

-> Then do universally/exist. quant types (below); I feel like I still have yet to nail this down. Then do the semantics bit, equational theory. This last piece should wrap us around to algebraic DTs. Then maybe hit combinators, finally monoids and some Haskell examples.

-> Now back to types as sets with the paper

Quantification

Quantified types facilitate greater expressive power, appealing to higher level typing mechanisms. Universal quantification enables polymorphism through generic types, and existential quantification yields abstract data types (through information hiding). Together, they facilitate parametric data abstraction.

Generally speaking, quantification is a means of abstraction, and is facilitated by binders. Binding operators act on variables of a certain kind, and place (bind) them inside a particular context. That context is made abstract as a result: an expression is produced that isn’t tied to any particular concrete value. The resulting abstraction can therefore be seen as a generic template, and is accompanied by a notion of application, wherein concrete values can be supplied to “fill in” the template and make the entire expression concrete. Fundamentally we’re introducing a higher level means of operation, a construction above terms, a producer of terms. Lambda calculus is so powerful because we let this higher level construct be a term itself; it doesn’t exist strictly in a higher plane but is “flattened back out” to the same level as concrete terms. This enables abstractions to be arbitrarily nested (terms can be abstractions, and abstractions are defined over terms), and ultimately represent all computable functions (lambda calculus is Turing complete).

In the case of function abstraction, $\lambda$ binds term variables: $\lambda x. c(x) : A\rightarrow B$ binds a term $x:A$ in a context $c(x):B$ . As we’ve already seen with polymorphic typing in $\lambda 2$ , type-level functions are introduced with the $\Lambda$ operator, which binds type variables in term space. Resulting terms are generic functions $\Lambda\alpha. \lambda x^\alpha. c(x)$ , and have a universally quantified type $\forall\alpha.\alpha\rightarrow\alpha$ . This is fundamentally a new kind of term, i.e., functions that can effectively be parameterized by a type, and we’ve got a new type definition to go along with it.

Existentially quantified types are a bit more nuanced. For starters, existential quantification isn’t accompanied by a fundamentally new kind of binding operator in term space. Instead, it’s purely a mechanism for abstraction in type space, typically used to hide certain type-level specifics. The statement $x : \exists\alpha. t(\alpha)$ allows us to specify $x$ only in terms of an exposed interface $t$ , i.e., some higher level type structure that, when parameterized by some type $\alpha$ , will represent $x$ ’s real type. The whole point is that we can get away with not knowing or specifying $\alpha$ , and the binding operator $\exists$ encapsulates that detail behind the visible structure seen in $t$ . Such a type is clearly amenable to high-level specifications of abstract data types: we can declare terms that verifiably observe certain type signatures and are invariant to any involved concrete types. Note that terms with existentially quantified types are introduced and eliminated with pack and unpack operators, respectively. These are fundamentally different than an operator like $\Lambda$ ; the latter introduces an irreducible new primitive (type abstraction), whereas pack/unpack are more like rules for working with existentially quantified terms (and can be fundamentally represented via universal quantification).

While the notions of universal and existential quantification are similar in that types are being abstracted over, they say very different things. Informally, $x : \exists\alpha. t(\alpha)$ lets us say that $x$ has type $t(\alpha)$ for some particular $\alpha$ . This is quite different from $x : \forall\alpha. t(\alpha)$ , which suggests that $x$ is an abstraction that can handle all types $\alpha$ . In the former, $x$ has no such “awareness” of choosing a type for $\alpha$ ; a specific choice of $\alpha$ is involved but we’re blind to its actual value, suggesting it could be any one choice. In the latter, $x$ is explicitly a generic construct that can handle all choices of $\alpha$ . There’s no notion of “plugging in” some particular choice of $\alpha$ to get $x$ ’s “true” type; instead, $x$ is explicitly the thing (a generic function) that can handle any choice for $\alpha$ .

More formally, it’s important to recognize that both universal and existential quantification do ultimately represent new, singular types. They’re not just loose characterizations of families of types or possible type choices (as we might be inclined to interpret them), but formal types with typing rules, just like everything else. I often find myself forgetting this since $\exists$ and $\forall$ look to be loose wrappers on other types; it’s tempting to just think of them extensionally, as possible values for the types they “wrap.” This is perhaps a perfectly reasonable and intuitive thing to do (encouraged, even), but they are nevertheless still themselves considered formal types.

Bounded quantification

Bounded quantification is quantification (universal or existential) with subtyping.

Encoding datatypes

The power of lambda calculus becomes quickly apparent when we discuss repeated application of higher-order abstractions. One way to begin formalizing this is via Church encoding, and the Church-Turing thesis shows that any computable operator can be represented under this scheme.

Recursion

When defining recursive functions, we might reach for a familiar self-referential form. For example, with the factorial function:

fact = fun (n: Int) if n=0 then 1 else n * fact(n-1)

As a term, this is improperly defined: fact refers to itself before it is fully defined. We can imagine needing to parse a definition left to right before the LHS identifier is considered valid, but we encounter fact before we get all the way through, at which point fact is not a valid alias. In any case, we do this all the time in most programming languages, it’s just that it’s a convenient expression for something more formal. That something “lifts” the function into a functional:

fact = λf. λ(n: Int). if n=0 then 1 else n * f(n-1)

This is not intrinsically recursive: it is a function that takes another function f as input, embeds it in another function, and returns that. That is, something like the general form λf. λx. t(f(s(x))), which says f is a function that operates on a term s(x), and t is some transformation of f after it’s applied to s(x). For a quick example,

-- let s just be the identity function
s = λx. x

-- let t(f, x) mirror the factorial setup
t = λf. λx. x*f(x-1)

-- so our general term:
λf. λx. t(f(s(x)), x)

-- turns into
λf. λx. x*f(x-1)

-- and then if we define some function g
g = λy. y*y

-- we can apply our form to it like
( λf. λx. x*f(x-1) )( g )
-> λx. x*g(x-1)
-> λx. x*((x-1)*(x-1))

-- this final term is a function which we can reapply as a "new g"
h = λx. x*((x-1)*(x-1))

-- on the left is our general term, apply to h
( λf. λx. x*f(x-1) )( h )
-> λx. x*( (x-1)*( (x-2)*(x-2) ) )

What we’re seeing here is that each repeated application of the functional – using the last output function as input to the functional, producing the next function – approximates part of the full recursive behavior. The last term above just accounts for two recursive steps. That is, it’s like we just iteratively pack a snapshot of the function into itself one step at a time, and once we “call” the outer thing for some input x, we recursively apply the function as usual (by virtual of the fact we’ve already packed in all the nested calls). The point is that such a term actually explicitly packs the function unwrapping into a term so we don’t run into the definition issues, i.e., the usual implicit syntactic sugar we get away with in most programming languages. All of that function composition happens at runtime, dynamically, rather than being one huge compositional thing we’ve pre-expanded and can evaluate in one go. The last term in the above code block represents such an explicit term after two manual composition steps, which is clearly only the beginning of the lengths we’d need to go to get the full recursive thing (spoiler: we need to do this arbitrarily many times).

Note that this isn’t even the factorial function (I chose g(x)=x*x), but if we map this onto that setting, the equivalent would be a final two-step function λx. x*(x-1)*(x-2). You can see how x is allowed to be any value (assuming x: Int) which doesn’t guarantee we actually “recurse to a base case.” If we plug in x=10, for instance, we just get 10*9*8, which is not 10! (actually we do not get a convergent output whatsoever if we don’t reach the base case…more on that below). Typically, full recursive behavior is allowed to recurse arbitrarily many times until a base case is reached. But to ensure we can even do that for arbitrary inputs x, we need to have packed in arbitrarily many recursive steps in our expanded term. How can we possibly do that, given x itself may be arbitrarily large?

Our full recursive function is therefore equivalent to the limit of this process, i.e., the thing we approach as we repeat composition an arbitrary number of times. With more and more nested composition, we get closer to the thing, and in the limit, we embody the notion of infinite nested composition. Any further composition therefore changes nothing: that thing is already a fixed point. More on this below.

Because I’ve lost myself several times as I work this out, I think it may be helpful to spell things out slowly while I’m here in the weeds. This is almost entirely to help me now (the concept really is that slippery), but will hopefully be good starting point if I find myself back in this place in the future. Here’s how I’m seeing things now, step by step, rehashing many of the points above:

You’re trying to define the factorial function, say in a usual, “intuitive,” implicitly recursive way. You start with
```
λ(n: Int). if n=0 then 1 else n * f(n-1)
```
which establishes your base case and the recursive dependence on the subproblem for n-1, captured by calling yourself as f. But here we’ve merely used f to stand in for our function…how are we meant to actually get ourselves squeezed into f? This sort of already feels like the whole recursive function as we’d want to define it (in a common programming language). The problem is I can’t formally refer to this term in itself in the usual way. I explained this pretty straightforwardly above: there’s no way for f, the name we want to use to refer to the whole outer term, to be made available inside its own scope before that scope is even defined. It simply doesn’t make sense; there is no f at the time we want to use it.
So what do we do? Again, this signature already feels kind of right: we want our recursive function to take an integer as input. So we want to preserve that element of the structure without bulking up the term per se, but we need to add something in order to facilitate a meaningful self-reference as f inside the term. We can do this by parametrizing f like this:
```
λf. λ(n: Int). if n=0 then 1 else n * f(n-1)
```
This gives us a functional term where we now need to first supply a function f in order to “get back” to our expected signature. So this thing is not our desired function, but it’s a mechanism to build it. To be clear, the question we now are trying to answer is how we can even take a “snapshot” of our function and use it as f. This is where I really encountered that first feeling of limbo: I didn’t have a great sense for what we could even grab onto to put there.
Where to go from here? How to set f? We take the above form and some first, “lowest” function $\bot$ (a canonical undefined that effectively “nukes” the output if reached rather than the base case) and apply it, giving us
```
f0 = λ(n: Int). if n=0 then 1 else n * ⊥(n-1)
```
So f0 is back to the direct function signature we’re after, having chosen a particular f to “embed.” We can then repeat this:
```
f1 = λ(n: Int). if n=0 then 1 else n * f0(n-1)
```
and so on, where f<n> represents a partial factorial function that includes n nested compositions. Note that f<n> is only well-defined for integers 0 to n given our function; for inputs greater than n, we never reach the base case and diverge (becoming undefined; we’ll basically end up making a call like ⊥(x) for some x > 0, blowing everything up). Additionally note that the n in f<n> only controls the depth of composition rather than directly restricting the input to the underlying function (I’ve had the tendency to interpret it as like improving the “coverage” of our input space, and while in this case it sort of does that indirectly, in general it does not determine the kind of inputs that produce concrete outputs).
With the above mechanism, we can let n tend toward infinity, which will step us ever closer to recovering an arbitrarily deep capacity for recursion. With some finite n, however, we can always supply inputs >n that cause our output to diverge in the function f<n> (i.e., never hit the base case), which differs from a true, fully dynamic recursive definition. We therefore look toward the limiting term, the term that can recurse dynamically. Such a term therefore cannot change under any further applications of composition (it already embodies a notion of “infinite composition;” to do it again would be like trying to take $\infty+1$ $\infty + 1$ . Put another way, the two mirrors are fully snapped parallel, and you have a term that simply never deals in finite levels of composition: you can’t “unravel” it with a finite number of applications), and is thus a fixed point.
- We explain a bit more explicitly below, but it’s as if we’ve produced f∞. The internal reference to itself can’t be something finite; if you tried, similar to our above point, you’d have something like f<∞-1>, which is just f∞ again. So it literally has its full self inside itself: the internal thing isn’t smaller, and the outer composition isn’t larger. As confusing as it is (certainly given our construction from increasingly large finite composition steps, where f<n> is indeed larger than its internal use of f<n-1>), they’re the same.
To be clear about what we mean by fixed point: we’re saying that our true, fully recursive factorial function is the fixed point of the functional term
```
G = λf. λ(n: Int). if n=0 then 1 else n * f(n-1)
```
That is, this “builder term” G has our target recursive function as a fixed point. If our target function is called fact, we are therefore saying that G(fact) = fact: fact is the thing that already fully embeds/references itself. You cannot “squeeze” another level of composition into that f reference by calling G again (like we did above, in sequence, for finite levels).

If this still feels slimy (and it certainly is for me; I literally find myself able to get it one second and lose it the next), go ahead and actually perform the application:
```
G(fact) = λ(n: Int). if n=0 then 1 else n * fact(n-1)
```
What we’re saying is: when we embed our fully recursive function fact into G, the function we get out is just fact again. You can’t “outsmart” it or “outwrap” it, as counterintuitive as it may be, since it feels you could always take another composition that the term you’re using can’t be “aware of,” that it can’t anticipate that you’re going to do it again. But no, it can be aware of further composition and it’ll have no effect: it literally includes its full self. That final point is really the best characterization in my opinion, and if you just can’t be satisfied with the formal argument, sit with that phrase until it sinks in.⁸
So how can we systematically get to this fixed point “right away,” as if we just defined things implicitly in the first place? The fixed-point, or Y, combinator. This is a higher-order functional, i.e., a function taking a functional, and it returns the fixed point function of that functional. We refer to this as $\text{fix}$ or $Y$ . In our above example, we’d say
```
fix G = fact

-- or equivalently
Y G = fact

-- and we have
fix G = G(fix G) = G(G(fix G)) = ... = fact
```
We actually define $Y$ as

$Y = \lambda f. (\lambda x. f(x\;x)) (\lambda x. f(x\;x))$

When we apply $Y$ to some functional $g$ , we can expand to find $Y g = g (Y g)$ . $Y$ is a construct that literally builds a fully recursive function out of the wrapper $g$ . Intuitively, it basically unpacks the internal logic from $g$ and puts it back into itself (although I don’t feel that confident about it becoming anything too concrete; reducing the thing still looks like $Y g$ ).

Combinators

Combinators are simply closed $\lambda$ -expressions. In combinatory logic, we capture the full power of lambda calculus, but without the notion of abstraction (or at least the ability to construct new abstractions). Instead, we can take a few closed, axiomatic abstractions (called combinators), and application can be used to construct certain kinds of new functions. Composition of these combinators can then replicate any function from lambda calculus. This simplification removes much of the complexity (although also the convenience) of abstraction, and was originally introduced to eliminate quantified variables from other logic systems, effectively presenting an alternative means of capturing the same functionality from even more primitive operations. This makes it more of an interesting demonstration rather than a practical choice for a language due to its verbosity; here the SKI system is a prime example.

Semantics

Curry-Howard correspondence

The Curry-Howard correspondence relates computer programs to mathematical proofs. It is a formal link between typed lambda expressions and statements in mathematical logic, an isomorphism between the proof systems. The logic-to-type theory analogs are as follows:

Formula $\iff$ Type
Proof $\iff$ Term
Implication ( $\implies$ ) $\iff$ Function ( $\rightarrow$ )
Conjunction ( $\land$ ) $\iff$ Product type ( $\times$ )
Disjunction ( $\lor$ ) $\iff$ Sum type ( $+$ )
Universal quantification ( $\forall$ ) $\iff$ Dependent product type ( $\Pi$ )
Existential quantification ( $\exists$ ) $\iff$ Dependent sum type ( $\Sigma$ )

So we can craft types that are analogous to logical formulas/statements, and type inhabitance (i.e., creating a term of the type) corresponds to a notion of proof for that logical statement. Expanding this for the specific analogs above:

$t:T$ ; the proposition/type $T$ holds due to the proof/term $t$
$f:S\rightarrow T$ ; the function $f$ inhabits the function type $S\rightarrow T$ , which can be thought as a map from any proof of proposition $S$ to a proof of proposition $T$ . This embodies the notion that $T$ can be shown to hold if $S$ holds, precisely what we mean by implication.
$(a,b):\Sigma_{x:A}B(x)$ ; the pair term $(a,b)$ provides some “witness” $a:A$ such that $B(a)$ holds. To be clear, there are two elements here: a proof in the usual sense above, with $b:B(a)$ (where $B(a)$ is a concrete proposition for a fixed $a$ that is inhabited by a term $b$ ), as well as a provided $a:A$ such that we actually inhabit $B(a)$ . The pair $(a,b)$ is therefore evidence that there exists some $x$ such that $B(x)$ holds, precisely what we mean by $\exists a\in A, B(a)$ .
$f:\Pi_{x:A}B(x)$ ; the dependent function term $f$ provides witness terms $x:A$ such that every $B(x)$ holds. We naturally think of dependent functions as maps with output terms with types that can depend on the input value. Therefore, the existence of some inhabiting function $f$ implies we’ve mapped from all input terms $x:A$ to output terms $y:B(x)$ . That is, we effectively have a collection of pairs, each of which is similar to what we saw with $\Sigma$ types, with witnesses and proofs for all parameterizations of $B(x)$ . This is precisely what we mean by $\forall a\in A, B(a)$ .

Order

Lattice of types: from Top to Bot
Order theory and hierarchies of types
Notion of least general type:
- Note how, for a given term, we can say things like c : a -> a and c : b. The former is more specific, characterizing it as a function from/to a particular type, while the latter…actually I don’t get this, not really. For starters, we’re letting a and b be free type variables here, so if I think about how I’d actually write them for any concrete types, including as a UQ type, I don’t know how these are both correct. With Hindly-Milner, we see some discussion around generics also being able to inhabit specific typed variants (look at the Wikipedia discussion under “Type order”). But I don’t get this, since the term is a generic, at least as I’m used to, which makes it a distinct thing even if it can appear to be a particular typed version.

This is a function from terms of type $A$ to “terms” of type $\mathcal{U}$ , where $\mathcal{U}$ is actually a type universe whose “terms” are themselves types. So $B$ maps from terms $a:A$ to types. $B$ characterizes a family of types if we think about it extensionally (not actually needed, but helps evoke how I’m thinking about it at the moment), i.e., the type space that is the function’s image.
↩︎
As far as I’ve seen, we just fundamentally take function types as given and never really need to throw some extra means of canonically specifying regular functions, which I guess is why I’ve not seen this prior to dependent type systems. Nevertheless, I like looking at this as an alternative way to color how I think about functions.
↩︎
Here I’m referred to notion $B^A$ to define function spaces, where $B$ is a codomain and $A$ is a domain. We can expand this to look just like our dependent product, even when $B$ does not dependent on any terms $x:A$ . Intuitively, we’re just saying we have a “copy” of $B$ for every input $x:A$ , and a point in the resulting product space is a choice of output for every input (i.e., a function).
↩︎
For example, we could call $(1, 2, 3)$ a function in the context of an indexed family $f$ that is the identity function. $f$ maps a 0-indexed tuple position to the same value, and we associate with that input the output value in that tuple position. So the function is explicitly captured by
$\{(f(0), 1), (f(1), 2), (f(2), 3)\} = \{(0, 1), (1, 2), (2, 3)\}$
↩︎
This has been an annoying sticking point for me. I keep wanting to assign the whole space to each point $a\in A$ , i.e., producing a collection of pairs $(a, B(a))$ , since that feels like it might be just as effective. But that really is just another way to write the function $B$ itself (again, just its graph), and doesn’t “unpack” the items of each $B(a)$ into a new, “flattened” space.
↩︎
This is a particular topic I’ve struggled with, one that has left me without a clear foundation for some time. When we say something like $\lambda x.x : \alpha\rightarrow\alpha$ , $\alpha$ is serving as a “type variable,” allowing for any choice of $\alpha$ . But even though $\alpha$ isn’t anything particular in the definition, it’s a placeholder for something that will be. In the polymorphic case, $\alpha$ also serves as a placeholder in the same way, but we include a term that explicitly “loops” over every type. We bind that type variable with the universal quantifier $\forall\alpha$ and make sure the term “handles them all,” explicitly. That is to say, it’s tied to none of them, and $\alpha$ isn’t free the way it is in the monomorphic case. Bottom line: the monomorphic type is defined around another fixed but unspecified type $\alpha$ and we’ll end up with a term that only works as a map from/to one fixed type (we’ll have to supply a choice for $\alpha$ when we instantiate). In the polymorphic case, that function is defined to work with all types; there’s not even a choice to be made for $\alpha$ during instantiation, since the function should be able to operate under a term of any type that gets passed in.
↩︎
This got me (however counter-intuitively) questioning the notion of typing at all. What is the rule that is typing itself? We seem to take the notion of typing as a given, like an action that’s baked in. But why, and fundamentally what are we left with when it’s not present? After flailing for a minute, I realized we just get back to untyped lambda calculus. I’ve been so completely focused on types for so long that I kind of forgot what it’d even mean not to have them. But turns it out it’s not all that confusing, really, and untyped settings can generally just be seen as special cases of typed ones (where there’s implicitly just one type).
↩︎
Maybe worth the explicit reminder that fact must know about $G$ for composition to have no effect. $G$ injects a function $f$ into its template structure, and the fixed point fact is completely built around that scaffolding. It’s not something strictly “pure” or independent of $G$ , and $G$ is therefore powerless to change it: it’s the construct that embodies infinite application of $G$ , and that’s why it has no effect.

The $\infty+1$ analogy here is perfectly correct, but somehow even that often feels finite, like you’re still stacking an extra item to what’s ultimately just a very tall pile. Therefore it doesn’t do a great job of offloading the burden of understanding infinite composition; you have to break that annoyingly sticky finite thinking if you want to the fixed point to feel familiar. It’s the exact same analogy, but an infinitely extending spiral may feel a little nicer, or at least just a bit easier to visualize:

When we wind up an additional cycle (compose once more), we get the same structure.
↩︎

debug mode

key	value
id	39230
path	/home/smgr/Documents/notes/Typed_lambda_calculus.md
rpath	Typed_lambda_calculus.md
name	Typed_lambda_calculus.md
title	Typed lambda calculus
link	Typed_lambda_calculus
ftype	md
ctime	1757496869.29
mtime	1757496869.29
atime	1757496869.29
id_1	4651
name_1	Typed_lambda_calculus.md
type	wiki
yaml_text	title: Typed lambda calculus created: 2025-03-16 22:42 modified: 2025-09-10 02:34 datelink: [[2025-03-16]] type: wiki summary:
id_2	10799
name_fmt	Typed_lambda_calculus.md+html5
name_2	Typed_lambda_calculus.md
format	html5
content	TODO: Dependent sum types Typed lambda calculi are type systems with (anonymous) function abstraction from Lambda calculus. Perhaps worth noting that both (untyped) lambda calculi and type systems are fundamentally isolated formal systems. The latter generally wrap up the notion of typing, and the former is canonically defined without the inclusion of any such mechanism (lambda terms are abstractions that apply to anything). Typed lambda calculi make up a class of type systems that “bring in” lambda terms axiomatically, along with the rules of introduction/application/reduction that we formalize in the general, untyped lambda calculus setting. Lambda cube The $\lambda$ -cube is a framework that captures different binding behaviors in type systems as “movements” along dimensions over a 3D cube. Each vertex of the cube yields a particular type system with the respective binding behavior present as a typing rule, i.e., each system is associated with a point in $\{0,1\}^3$ , and a $1$ in a given dimension implies the presence of that typing rule. The dimensions correspond to different kinds of binding mechanisms between terms and types: $x$ -axis: introduces dependent types, types that can depend on terms $y$ -axis: introduces polymorphism, terms that can depend on types $z$ -axis: introduces type constructors, types that can depend on (other) types Starting from the simply typed lambda calculus $\lambda_\rightarrow$ at $(0,0,0)$ , We move along the $x$ -axis to $(1,0,0)$ to get $\lambda\Pi$ (also referred to as $\lambda P$ ), a first-order dependent type system. We move along the $y$ -axis to $(0,1,0)$ to get $\lambda 2$ (also referred to as System F), a polymorphic type system. We move along the $z$ -axis to $(0,0,1)$ to get $\lambda\underline{\omega}$ , a type system including type constructors. These represent type systems that introduce each of the typing mechanisms in isolation. Systems at other vertices are those with combinations of these mechanisms. At $(1,1,1)$ , we reach the calculus of constructions $\lambda C$ , where types and terms can depend on types and terms. We dive into each of these systems and their respective typing mechanisms below. Typing mechanisms The lambda cube captures a few common typing mechanisms. Below we discuss these in greater detail. Simply put, Dependent typing: types that depend on terms Polymorphic typing: terms that depend on types Type constructors: types that depend on types Dependent typing ( $\lambda\Pi$ ) $\Pi$ -type The type system $\lambda\Pi$ introduces dependent typing, facilitated primarily by the $\Pi$ type (the dependent product type). This type captures the notion of a function whose return value’s type (as well as the value itself, of course) can vary with the argument value. That is to say, there’s no fixed codomain; both the output term and its type are effectively dynamically selected by the value of the input. In particular, if $B: A\rightarrow \mathcal{U}$ ¹, the type of a dependent function can be written $\prod_{x:A}B(x)$ , capturing the fact such a term will map to a type determined by $B(x)$ . For example, if we write $\text{Vec}(\mathbb{R}, n)$ to represent a real-valued $n$ -tuple, then $\prod_{n:\mathbb{N}}\text{Vec}(\mathbb{R}, n)$ is how we “parameterize” it as a dependent type. It’s worth noting that when $B:A\rightarrow\mathcal{U}$ is a constant function (all $a:A$ map to the same type), the dependent product type acts exactly the same as the usual function type. While we don’t fundamentally use this notation when defining function types², it clarifies (for me) the use of notation seen elsewhere in mathematics³. We’re saying here that the type $\prod_{x:A}B$ (no dependence on $x$ in $B$ ) is equivalent to $A\rightarrow B$ ; they’re two different ways to write the same thing. Further, this product type can be re-written explicitly in the usual sense as $B\times B\times\cdots\times B,$ but how is this the “type” of a function (since we’re claiming $\prod_{x:A}B$ and $A\rightarrow B$ are the same)? We can think of a concrete function as a long tuple where each tuple position corresponds to a particular input value in the domain (captured as an indexed family, say) and “chooses” an output value corresponding to that input⁴. That is, we’re thinking of functions in the more explicit, extensional, binary relation sense. In any case, such a tuple can be seen as an element of the product space written above; the concrete tuple represents a concrete function term, and its type can be seen as a product of the types associated with each output. All this to say: this isn’t a product space serving as a domain or codomain for some class of functions (which is how I originally got confused here), but is itself the space/type of all functions, when we think of particular functions as tuples of output values. This lends itself back to the original context within which the notation was introduced: to express dependent types. When $B$ does vary with $x:A$ , we can explicitly write our dependent function type as $B_{a_1}\times B_{a_2}\times B_{a_3}\times\cdots,$ where $a_i : A$ ranges over all terms of type $A$ . This is a dependent product, and captures the space of possible dependent functions: terms that themselves map from all inputs $a_i:A$ to terms $b_i:B(a_i)$ . Here we simply get the extra flexibility to specify (loosely) a particular domain (type) for each input value (abstraction-bound variable). We formalize the construction of such a term with the introduction rule $\frac{\Gamma,x:A\vdash B:}{\Gamma\vdash\left(\prod_{x:A}B\right):}$ which is really a simple statement that formally recognizes $\prod_{x:A}B$ as a type (with “ $:$ ” read as “is of type type”). To use the example from before, $\prod_{n:\mathbb{N}}\text{Vec}(\mathbb{R}, n)$ could be the type of a function that maps from an integer $n$ to the unit vector of length $n$ . Note how $\text{Vec}(\mathbb{R}, m)$ is the type of a vector for a specific value of $m$ , but not any* $m$ . So creating a function that produces a vector with a variable length (dependent on the function’s input) can be formed via the notion of a dependent type. Note how the introduction of this typing mechanism doesn’t formally increase the computational power of the type system (untyped lambda calculus is Turing complete already), but instead facilitates more logical expressiveness, e.g., how precisely we can encode logical constraints over terms. Without dependent types, for instance, we could still have a function that produces variable-length vectors, but from the perspective of statically verifying the logical correctness of further statements involving such functions, it becomes much more difficult (if not impossible) to make certain guarantees about the value or type of involved terms. The type of dependent functions like this, in the general sense, is referred to as a $\Pi$ type, often called the dependent function type (with expansion below on the use of the term “dependent product type” from before). But System $\lambda\Pi$ also includes a type that’s dual to $\Pi$ types: the $\Sigma$ type, also called the dependent pair type. $\Sigma$ -type Using similar notation from before, we write $\Sigma$ types as $\Sigma_{x:A}B(x)$ . This captures the notion of ordered pairs where the second term’s type can depend on the value of the first. That is, if $(a, b):\sum_{x:A}B(x)$ , then $a:A$ and $b:B(a)$ : the type of $b$ gets to depend on $a$ ’s value. As before, we can expand this and try to make sense of its full syntactical implications: $\sum_{x:A}B(x) = B_{a_1} + B_{a_2} + B_{a_3}+ \cdots,$ where $a_i : A$ ranges over all terms of type $A$ . Right away I find this a little confusing: I see arbitrarily many types being summed over (or for at least as many terms $x:A$ ), and yet this is the type of a pair of terms. With our dependent function type, our terms themselves were functions that have to account for all possible inputs $x:A$ , so it seemed justified that we’d need to have a representation for each $x:A$ in the associated type definition. But here, it’s a little surprising to see we compose more than just two types in order to capture the pair type. Intuitively, all that’s happening here is the $+$ is behaving like an “or” operator. Since $b$ ’s type can depend on any term $a:A$ , it does make sense that we’d need every possible choice in our type definition; it’s just that any term must “commit” to only one of them. So we need them all like before, but while $\times$ indicates joint involvement (our term will use them all), $+$ captures the notion of a choice among them (our term will only pick one, but it could be any of them). In some sense (and I do mean that vaguely), this is actually more general than the $\Pi$ type. In that case, our type captures the idea of assigning an output type to every possible input term, giving us terms (functions) that must actually make such a decision for all inputs, i.e., assigning a $B(a)$ for every $x:A$ . But here, we’re doing that exact same thing…just without the requirement to do it for all inputs at once. In a sense we “free up” the input as well, and our terms get to pick whichever input (along with the associated, dependent output type) they want, rather than needing to do it for all of them (doing it for all inputs leaves the only freedom in the output side of things; the function doesn’t get to pick specific inputs it should be defined over). So our pair is like any individual input-output slice of a full dependent function. Interpreting $\Sigma$ types as spaces We can sometimes think of types extensionally, representing spaces of all their possible constituent terms. Given the extra “degree of freedom” we described above, it actually feels harder to shape that corresponding space for $\Sigma$ types, or at least to capture how we should think about it. In the $\Pi$ type setting, I know my function terms pick a type $B(a)$ for every input $a$ (for some choice of $B$ ), and all my concrete function terms under that $B$ will map any given input $a$ to a term $b : B(a)$ . I basically “stretch out” all those $B(a)$ and I’ve got my space’s structure: all terms in that space are maps with the same output types when queried at the same inputs. The presence of the “term conditioning” from $a$ sort of disappears in the implicit order of my $B(a)$ ; the type $A$ is expressed implicitly through the family of $B(a)$ ’s. We actually have to deal with terms of the type $A$ in order to “see” what types of $B$ will be, so it makes sense that we can’t exactly have a term $a : A$ show up in the type’s definition. For $\Sigma$ types, we do a similar thing by packing in $B$ types in such a way that the dependence on terms $a : A$ is made implicit. But our space is effectively more locally variable, and all of our terms don’t share the same array of output types like we saw with $\Pi$ types (which acts as kind of a common thread, like a “stitch” in the space that captures the common structure of terms). Instead, our terms are even more “tightly dependent” and sort of stubbornly dissimilar, making it hard to capture the structure of the common space all terms share. So I resign myself to taking just that very structure: the (implicit) coupling of a term $a:A$ with its paired type $B(a)$ , and we bag all the possible concrete couplings up into a set like $\{(a, b)\|a:A,b:B(a)\}$ . In general, I think I let this be more a sticking point than it needed to be, but my inability to let go of little confusions or frustrations sometimes leads to greater insight. I may end up reusing this outside this scope, but the point is that writing the “sum” $\sum_{x:A}B(x) = B_{a_1} + B_{a_2} + B_{a_3}+ \cdots,$ is not really all that accurate. It should be thought of as a disjoint union, which explicitly lets us tag those types while bagging them all up like we want. That is, it just preserves the identity of the types $B(a)$ using the $a$ that produced them, and those are the pairs that can make up our type space. So more like $\sum_{x:A}B(x) = (\{a_1\}\times B(a_1)) + (\{a_2\}\times B(a_2))+ \cdots,$ This still leaves $B(a_i)$ as a type, but we can interpret that as a set in the generic way, i.e., $\{b \| b:B(a_i)\}$ . That gives you a formal set product $\{(a_i, b) \| b:B(a_i)\}$ for each $a_i:A$ , and when you let $+$ mean $\cup$ , you get your big, fully expanded disjoint union. That is your formal set-based interpretation; it’s clear how we can dynamically expand only the properly dependent pairs of values. So when I said this space feels like has less structure, I actually still feel that’s an accurate depiction. Even if the little set-based jumps we make to expand the space are easy to follow, it’s harder to hook into common structure between these objects. You have the paired part of them, yes, but beyond that we literally just tag dependent types explicitly. It’s not some tightly wound, reduced structure, or even as like rigid as the functions with $\Pi$ types where we at least have all objects stretching across the input space. Nope: we just slap a name tag on the types and put them in the bag, sort of the bare minimum to ensure they can be distinguished. And that very process of tagging is really the only structure we can even grab onto afterward, to group up “stubbornly dissimilar” terms (to reuse the phrase from earlier, which I again feel is accurate). That, I suppose, is really just a consequence of great flexibility: we’re literally defining a compositional type that is more or less an unimposing bag of other types. That’s as unstructured as you get when it comes to wrangling collections of objects: you’re adding no new structure or transforming things into a common shape. Perhaps that’s why it’s so slimy to me, so hard to just let be. It’s like I’m expecting more to be there, more structure to understand, when there in actual fact is none, on purpose. Note the syntactic analogies to addition/multiplication/exponentiation, when the dependent function $B$ is constant: The dependent pair $\Sigma_{x:A}B$ can also be written $A \times B$ , which can be interpreted in the usual product space sense, where to each point in $A$ we attach an instance of $B$ . To be precise: When $B(x)$ varies with $x:A$ , we can treat as as an indexed family of types/sets, indexed by $A$ The dependent pair is then analogous to a disjoint union: $\Sigma_{a:A}B(a) = \bigsqcup_{a:A}B(a) = \bigcup_{a:A}\{(a,b)\|b:B(a)\}$ When $B(x)=B$ (i.e., is constant, no dependence on $A$ ), the disjoint union is simply to the usual Cartesian product: $\Sigma_{a:A}B = \bigsqcup_{a:A}B = A\times B$ Note how $\Sigma$ or $\bigsqcup$ actually do some work in “breaking up” and “rearranging” $B$ ; a set like $\{(a, B(a)) \| a:A\}$ that might attempt to attach to each $a\in A$ the full type/space $B(a)$ is only half way to something useful (and is just the graph of $B$ ).⁵ The dependent product $\Pi_{x:A}B$ can also be written $B^A$ , which can be interpreted as “multiplying” instances of $B$ for each element/term of $A$ . As mentioned above, when $B$ is static, this simply recovers the notion of an infinite product space housing functions $f: A \rightarrow B$ . Otherwise, the product is composed of conditioned spaces produced by invoking $B(x)$ for each $x: A$ , and we usually don’t employ the compact representation $B^A$ (it doesn’t really capture the $B$ ’s dependence on $A$ , but could be implicitly understood). When we write out the product explicitly, we have a sequence of concrete products like we saw with the dependent pair, where each $B(a)$ is a fixed set making up one part of a nested product: $\begin{aligned} \Pi_{x:a}B(x) &= B(a_1) \times B(a_2) \times \cdots \\ &= \Sigma_{b_1:B(a_1)}\left[\Sigma_{b_2:B(a_2)}\cdots\left[\Sigma_{b_{n-1}:B(a_{n-1})}B(a_{n})\right]\right] \end{aligned}$ Here we see increasing levels of abstraction taking place with repeated application, again analogous to multiplication representing repeated addition, exponentiation representing repeated multiplication, etc. (See also: Building blocks of dep. type theory) Polymorphic typing ( $\lambda 2$ ) Polymorphic types allow term definitions to depend on a specific type. For example, instead of saying something like $\lambda x.x:\alpha\rightarrow\alpha,$ which can be interpreted as an identity function mapping from/to terms of type $\alpha$ (for a particular choice of of $\alpha$ ), we instead have $\lambda x.x:\forall\alpha.\alpha\rightarrow\alpha,$ which defines an identity function polymorphic in $\alpha$ , meaning any (every) choice of $\alpha$ is allowed. For the former, the type $\alpha\rightarrow\alpha$ describes a monomorphic “family” of types since $\alpha$ can be anything. But the term will always involve a particular choice, e.g., be the identity function from/to terms of `int` or `bool` types. In the polymorphic case, we never require such specificity: the identity function must work for all types (as in, we could pick any one of them). This is kind of like saying we get one term that is “type aware,” whereas in the monomorphic case we’d actually have to define multiple (separate) terms if we wanted to work with different choices of $\alpha$ .⁶ System F is a second-order typed lambda calculus, formalizing parametric polymorphism. In simply typed lambda calculus, this isn’t exactly a native element of the type theory, so universally quantified type variables are formalized. For example, in the case of our identity function, we have $\vdash \Lambda\alpha. \lambda x^\alpha. x : \forall\alpha.\alpha\rightarrow\alpha,$ where $\alpha$ is a type variable, and $\Lambda$ refers to a type-level function. This is a judgment that says, as a function of type $\alpha$ , the identity function of a bound variable $x$ of type $\alpha$ has type $\alpha\rightarrow\alpha$ . The entire left-hand expression, including the outer $\Lambda$ term, is of the single “most general type,” involving a universally quantified type variable, $\forall\alpha.\alpha\rightarrow\alpha$ . It is the most general in that every other type we might feasibly assign to the term can be arrived at via substitution of this one, i.e., are less general. In System F, we actually formalize the universally quantified type variable as a single type by a typing rule: $\frac{\Gamma,\alpha\;\text{type}\vdash M:\sigma}{\Gamma\vdash\Lambda\alpha.M:\forall\alpha.\;\sigma}$ Here we’re saying, when $M$ is of type $\sigma$ and we take $\alpha$ to be a type variable, then $\Lambda\alpha.M$ is of type $\forall\alpha.\;\sigma$ . Notice how it’s not fundamentally a function type or anything; this is us actually defining a new type and its syntax, axiomatically. The quantifier $\forall$ is not purely symbolic, however; it is actually signifying quantification over all types We also have the rule describing the application of $\Lambda$ terms: $\frac{\Gamma\vdash M:\forall\alpha.\;\sigma}{\Gamma\vdash M\tau : \sigma[\alpha:=\tau]}$ which just says how when a term $\tau$ is applied to a polymorphic term $M$ , it’s as if we’re instantiating the whole template under that type, and replace any dependence on $\alpha$ in $\sigma$ with $\tau$ . Bringing back the identity example from above, we have $\vdash (\Lambda\alpha. \lambda x^\alpha. x)(\text{Bool}) : \text{Bool}\rightarrow\text{Bool}$ $\text{Bool}$ is applied to the $\Lambda$ term, effectively: Stripping off the outer $\Lambda\alpha$ Binding $x$ in $\lambda$ to type $\text{Bool}$ (although the RHS type signature really does this as well) By the application rule, for the type side we strip off $\forall\alpha$ and replace the references in term (which was $\alpha\rightarrow\alpha$ ) with $\text{Bool}$ So we effectively end up with the term $\lambda x.x:\text{Bool}\rightarrow\text{Bool}$ . We can think of the polymorphic term as a producer of a family of such concrete terms; the polymorphic term itself is of a second-order, producing a first-order, monomorphic term after application. (See also: Type variable) Type constructors ( $\lambda\underline{\omega}$ ) In System $F \underline{\omega}$ , we introduce type construction. This is a mechanism that effectively brings types into the term space, loosely speaking: types are allowed to be part of terms, e.g., taken as inputs to a function. Such terms themselves have types, which we call kinds. Type constructors facilitate the creation of new types from existing ones. They can be formally seen as $n$ -ary type operators, returning a type from $n$ types passed as arguments. Note that we typically write this in a curried manner as needed, like other functions of many variables. As such, type operators can be seen as functions in a higher-order type theory (namely a simply typed lambda calculus) with one basic type $$ , representing the type of all types in the underlying language. For instance, if $s: \sigma$ , then $\sigma: $ . Both $\sigma$ and $$ are types, albeit at different “levels;” we generally distinguish them as “proper types” and “kinds,” respectively. Kinds are effectively declarations of type constructor arity* (rather than the “type of a type”). Given just a single type in the “kind system” is allowed, first-order type operators are simply curried functions of proper types and look like $* \rightarrow \cdots \rightarrow $ : $$ is the kind of all proper types (also called “nullary types,” i.e., the constant result of a type constructor with no inputs). This is pronounced “type,” e.g., “ $\sigma:$ indicates $\sigma$ is of type type.” To be clear, even though “ $$ ” means “type,” it is part of a kind system. We say “ $\text{nat}$ is a type” (judgment “ $\vdash \text{nat Type}$ ” in the base-level type theory) and “ $$ is a kind” (judgment “ $\vdash \;\text{Type}$ ” in the higher-level type/kind theory). $\rightarrow $ is the kind of a unary type constructor, such as that of a list type, e.g., $\text{list} a: \rightarrow $ where $a$ is the type of the list elements. $\rightarrow \rightarrow $ is the kind of a binary type constructor, e.g., the function type constructor. That is, something like $(\lambda\sigma\tau.\sigma\rightarrow\tau) : \rightarrow \rightarrow $ as a judgment the kind system (although I don’t know if we really ever embrace such an explicit formation given functions are usually primitives, but I think it demonstrates the point). A higher-order type constructor is one that itself maps from a type constructor to a proper type. This might have the kind $(\rightarrow )\rightarrow $ , for instance, and helps describe the kind* of more complex constructs like monads. Note that we used $\lambda\omega$ to denote the type system including both type constructors and polymorphism. Pure type systems Pure type systems are typed lambda calculi that include an arbitrary number of sorts and dependencies between them. This generalizes the lambda cube, from which perspective we can view its “corners” as instances of pure type systems with just two sorts. In particular, a pure type system is a triple $(\mathcal{S}, \mathcal{A}, \mathcal{R})$ : $\mathcal{S}$ is the set of sorts $\mathcal{A}\subseteq\mathcal{S}^2$ is the set of axioms; an axiom $(s_1, s_2)$ is a foundational statement that $s_1:s_2$ $\mathcal{R}\subseteq\mathcal{S}^3$ is the set of rules Pure type systems “break down” a wall that previously wasn’t really visible; simply typed lambda calculus, for instance, is fundamentally constructed around terms and types, and we don’t really question those two “classes” of items. Pure type systems open that up, extending the notion of typing (allowing lower-order terms to be classified by higher-order terms) to more expansive hierarchies⁷. (See also: Pure type system (nLab)) System U Subtyping Subtyping captures a notion of substitutability between types. If $S$ is a subtype of $T$ , we typically write this $S <: T$ , implying that $S$ can be used in place of $T$ in any context where $T$ is expected. Type systems like System $F_{<:}$ formalize this properly (e.g., with subtype judgments and associated rules), but from what I can tell, most resources stay away from the weeds and discuss record types and LSP. More formally, subtype applicability is determined by subsumption. For a type $\mathrm{T}$ , we generally adopt an extensional view $\mathbf{T}$ (the set of all and an intensional view $\mathit{T}$ … …leaving off here. Taking some issue with the int/ext views of types, since it’s more set-theoretic. Want to get this right, mix in LSP, contrast with inheritance, and maybe get a clear link out to a relevant page on func prog. -> Then do universally/exist. quant types (below); I feel like I still have yet to nail this down. Then do the semantics bit, equational theory. This last piece should wrap us around to algebraic DTs. Then maybe hit combinators, finally monoids and some Haskell examples. -> Now back to types as sets with the paper Quantification Quantified types facilitate greater expressive power, appealing to higher level typing mechanisms. Universal quantification enables polymorphism through generic types, and existential quantification yields abstract data types (through information hiding). Together, they facilitate parametric data abstraction. Generally speaking, quantification is a means of abstraction, and is facilitated by binders. Binding operators act on variables of a certain kind, and place (bind) them inside a particular context. That context is made abstract as a result: an expression is produced that isn’t tied to any particular concrete value. The resulting abstraction can therefore be seen as a generic template, and is accompanied by a notion of application, wherein concrete values can be supplied to “fill in” the template and make the entire expression concrete. Fundamentally we’re introducing a higher level means of operation, a construction above terms, a producer of terms. Lambda calculus is so powerful because we let this higher level construct be a term itself; it doesn’t exist strictly in a higher plane but is “flattened back out” to the same level as concrete terms. This enables abstractions to be arbitrarily nested (terms can be abstractions, and abstractions are defined over terms), and ultimately represent all computable functions (lambda calculus is Turing complete). In the case of function abstraction, $\lambda$ binds term variables: $\lambda x. c(x) : A\rightarrow B$ binds a term $x:A$ in a context $c(x):B$ . As we’ve already seen with polymorphic typing in $\lambda 2$ , type-level functions are introduced with the $\Lambda$ operator, which binds type variables in term space. Resulting terms are generic functions $\Lambda\alpha. \lambda x^\alpha. c(x)$ , and have a universally quantified type $\forall\alpha.\alpha\rightarrow\alpha$ . This is fundamentally a new kind of term, i.e., functions that can effectively be parameterized by a type, and we’ve got a new type definition to go along with it. Existentially quantified types are a bit more nuanced. For starters, existential quantification isn’t accompanied by a fundamentally new kind of binding operator in term space. Instead, it’s purely a mechanism for abstraction in type space, typically used to hide certain type-level specifics. The statement $x : \exists\alpha. t(\alpha)$ allows us to specify $x$ only in terms of an exposed interface $t$ , i.e., some higher level type structure that, when parameterized by some type $\alpha$ , will represent $x$ ’s real type. The whole point is that we can get away with not knowing or specifying $\alpha$ , and the binding operator $\exists$ encapsulates that detail behind the visible structure seen in $t$ . Such a type is clearly amenable to high-level specifications of abstract data types: we can declare terms that verifiably observe certain type signatures and are invariant to any involved concrete types. Note that terms with existentially quantified types are introduced and eliminated with `pack` and `unpack` operators, respectively. These are fundamentally different than an operator like $\Lambda$ ; the latter introduces an irreducible new primitive (type abstraction), whereas `pack`/`unpack` are more like rules for working with existentially quantified terms (and can be fundamentally represented via universal quantification). While the notions of universal and existential quantification are similar in that types are being abstracted over, they say very different things. Informally, $x : \exists\alpha. t(\alpha)$ lets us say that $x$ has type $t(\alpha)$ for some particular $\alpha$ . This is quite different from $x : \forall\alpha. t(\alpha)$ , which suggests that $x$ is an abstraction that can handle all types $\alpha$ . In the former, $x$ has no such “awareness” of choosing a type for $\alpha$ ; a specific choice of $\alpha$ is involved but we’re blind to its actual value, suggesting it could be any one choice. In the latter, $x$ is explicitly a generic construct that can handle all choices of $\alpha$ . There’s no notion of “plugging in” some particular choice of $\alpha$ to get $x$ ’s “true” type; instead, $x$ is explicitly the thing (a generic function) that can handle any choice for $\alpha$ . More formally, it’s important to recognize that both universal and existential quantification do ultimately represent new, singular types. They’re not just loose characterizations of families of types or possible type choices (as we might be inclined to interpret them), but formal types with typing rules, just like everything else. I often find myself forgetting this since $\exists$ and $\forall$ look to be loose wrappers on other types; it’s tempting to just think of them extensionally, as possible values for the types they “wrap.” This is perhaps a perfectly reasonable and intuitive thing to do (encouraged, even), but they are nevertheless still themselves considered formal types. Bounded quantification Bounded quantification is quantification (universal or existential) with subtyping. Encoding datatypes The power of lambda calculus becomes quickly apparent when we discuss repeated application of higher-order abstractions. One way to begin formalizing this is via Church encoding, and the Church-Turing thesis shows that any computable operator can be represented under this scheme. Recursion When defining recursive functions, we might reach for a familiar self-referential form. For example, with the factorial function: `fact = fun (n: Int) if n=0 then 1 else n * fact(n-1)` As a term, this is improperly defined: `fact` refers to itself before it is fully defined. We can imagine needing to parse a definition left to right before the LHS identifier is considered valid, but we encounter `fact` before we get all the way through, at which point `fact` is not a valid alias. In any case, we do this all the time in most programming languages, it’s just that it’s a convenient expression for something more formal. That something “lifts” the function into a functional: `fact = λf. λ(n: Int). if n=0 then 1 else n * f(n-1)` This is not intrinsically recursive: it is a function that takes another function `f` as input, embeds it in another function, and returns that. That is, something like the general form `λf. λx. t(f(s(x)))`, which says `f` is a function that operates on a term `s(x)`, and `t` is some transformation of `f` after it’s applied to `s(x)`. For a quick example, -- let s just be the identity function s = λx. x -- let t(f, x) mirror the factorial setup t = λf. λx. xf(x-1) -- so our general term: λf. λx. t(f(s(x)), x) -- turns into λf. λx. xf(x-1) -- and then if we define some function g g = λy. yy -- we can apply our form to it like ( λf. λx. xf(x-1) )( g ) -> λx. xg(x-1) -> λx. x((x-1)(x-1)) -- this final term is a function which we can reapply as a "new g" h = λx. x((x-1)(x-1)) -- on the left is our general term, apply to h ( λf. λx. xf(x-1) )( h ) -> λx. x( (x-1)( (x-2)(x-2) ) ) What we’re seeing here is that each repeated application of the functional – using the last output function as input to the functional, producing the next function – approximates part* of the full recursive behavior. The last term above just accounts for two recursive steps. That is, it’s like we just iteratively pack a snapshot of the function into itself one step at a time, and once we “call” the outer thing for some input `x`, we recursively apply the function as usual (by virtual of the fact we’ve already packed in all the nested calls). The point is that such a term actually explicitly packs the function unwrapping into a term so we don’t run into the definition issues, i.e., the usual implicit syntactic sugar we get away with in most programming languages. All of that function composition happens at runtime, dynamically, rather than being one huge compositional thing we’ve pre-expanded and can evaluate in one go. The last term in the above code block represents such an explicit term after two manual composition steps, which is clearly only the beginning of the lengths we’d need to go to get the full recursive thing (spoiler: we need to do this arbitrarily many times). Note that this isn’t even the factorial function (I chose `g(x)=xx`), but if we map this onto that setting, the equivalent would be a final two-step function `λx. x(x-1)(x-2)`. You can see how `x` is allowed to be any value (assuming `x: Int`) which doesn’t guarantee we actually “recurse to a base case.” If we plug in `x=10`, for instance, we just get `1098`, which is not* `10!` (actually we do not get a convergent output whatsoever if we don’t reach the base case…more on that below). Typically, full recursive behavior is allowed to recurse arbitrarily many times until a base case is reached. But to ensure we can even do that for arbitrary inputs `x`, we need to have packed in arbitrarily many recursive steps in our expanded term. How can we possibly do that, given `x` itself may be arbitrarily large? Our full recursive function is therefore equivalent to the limit of this process, i.e., the thing we approach as we repeat composition an arbitrary number of times. With more and more nested composition, we get closer to the thing, and in the limit, we embody the notion of infinite nested composition. Any further composition therefore changes nothing: that thing is already a fixed point. More on this below. Because I’ve lost myself several times as I work this out, I think it may be helpful to spell things out slowly while I’m here in the weeds. This is almost entirely to help me now (the concept really is that slippery), but will hopefully be good starting point if I find myself back in this place in the future. Here’s how I’m seeing things now, step by step, rehashing many of the points above: You’re trying to define the factorial function, say in a usual, “intuitive,” implicitly recursive way. You start with `λ(n: Int). if n=0 then 1 else n * f(n-1)` which establishes your base case and the recursive dependence on the subproblem for `n-1`, captured by calling yourself as `f`. But here we’ve merely used `f` to stand in for our function…how are we meant to actually get ourselves squeezed into `f`? This sort of already feels like the whole recursive function as we’d want to define it (in a common programming language). The problem is I can’t formally refer to this term in itself in the usual way. I explained this pretty straightforwardly above: there’s no way for `f`, the name we want to use to refer to the whole outer term, to be made available inside its own scope before that scope is even defined. It simply doesn’t make sense; there is no `f` at the time we want to use it. So what do we do? Again, this signature already feels kind of right: we want our recursive function to take an integer as input. So we want to preserve that element of the structure without bulking up the term per se, but we need to add something in order to facilitate a meaningful self-reference as `f` inside the term. We can do this by parametrizing `f` like this: `λf. λ(n: Int). if n=0 then 1 else n * f(n-1)` This gives us a functional term where we now need to first supply a function `f` in order to “get back” to our expected signature. So this thing is not our desired function, but it’s a mechanism to build it. To be clear, the question we now are trying to answer is how we can even take a “snapshot” of our function and use it as `f`. This is where I really encountered that first feeling of limbo: I didn’t have a great sense for what we could even grab onto to put there. Where to go from here? How to set `f`? We take the above form and some first, “lowest” function $\bot$ (a canonical `undefined` that effectively “nukes” the output if reached rather than the base case) and apply it, giving us `f0 = λ(n: Int). if n=0 then 1 else n * ⊥(n-1)` So `f0` is back to the direct function signature we’re after, having chosen a particular `f` to “embed.” We can then repeat this: `f1 = λ(n: Int). if n=0 then 1 else n * f0(n-1)` and so on, where `f<n>` represents a partial factorial function that includes `n` nested compositions. Note that `f<n>` is only well-defined for integers `0` to `n` given our function; for inputs greater than `n`, we never reach the base case and diverge (becoming `undefined`; we’ll basically end up making a call like `⊥(x)` for some `x > 0`, blowing everything up). Additionally note that the `n` in `f<n>` only controls the depth of composition rather than directly restricting the input to the underlying function (I’ve had the tendency to interpret it as like improving the “coverage” of our input space, and while in this case it sort of does that indirectly, in general it does not determine the kind of inputs that produce concrete outputs). With the above mechanism, we can let `n` tend toward infinity, which will step us ever closer to recovering an arbitrarily deep capacity for recursion. With some finite `n`, however, we can always supply inputs `>n` that cause our output to diverge in the function `f<n>` (i.e., never hit the base case), which differs from a true, fully dynamic recursive definition. We therefore look toward the limiting term, the term that can recurse dynamically. Such a term therefore cannot change under any further applications of composition (it already embodies a notion of “infinite composition;” to do it again would be like trying to take $\infty+1$ . Put another way, the two mirrors are fully snapped parallel, and you have a term that simply never deals in finite levels of composition: you can’t “unravel” it with a finite number of applications), and is thus a fixed point. We explain a bit more explicitly below, but it’s as if we’ve produced `f∞`. The internal reference to itself can’t be something finite; if you tried, similar to our above point, you’d have something like `f<∞-1>`, which is just `f∞` again. So it literally has its full self inside itself: the internal thing isn’t smaller, and the outer composition isn’t larger. As confusing as it is (certainly given our construction from increasingly large finite composition steps, where `f<n>` is indeed larger than its internal use of `f<n-1>`), they’re the same. To be clear about what we mean by fixed point: we’re saying that our true, fully recursive factorial function is the fixed point of the functional term `G = λf. λ(n: Int). if n=0 then 1 else n * f(n-1)` That is, this “builder term” `G` has our target recursive function as a fixed point. If our target function is called `fact`, we are therefore saying that `G(fact) = fact`: `fact` is the thing that already fully embeds/references itself. You cannot “squeeze” another level of composition into that `f` reference by calling `G` again (like we did above, in sequence, for finite levels). If this still feels slimy (and it certainly is for me; I literally find myself able to get it one second and lose it the next), go ahead and actually perform the application: `G(fact) = λ(n: Int). if n=0 then 1 else n * fact(n-1)` What we’re saying is: when we embed our fully recursive function `fact` into `G`, the function we get out is just `fact` again. You can’t “outsmart” it or “outwrap” it, as counterintuitive as it may be, since it feels you could always take another composition that the term you’re using can’t be “aware of,” that it can’t anticipate that you’re going to do it again. But no, it can be aware of further composition and it’ll have no effect: it literally includes its full self. That final point is really the best characterization in my opinion, and if you just can’t be satisfied with the formal argument, sit with that phrase until it sinks in.⁸ So how can we systematically get to this fixed point “right away,” as if we just defined things implicitly in the first place? The fixed-point, or Y, combinator. This is a higher-order functional, i.e., a function taking a functional, and it returns the fixed point function of that functional. We refer to this as $\text{fix}$ or $Y$ . In our above example, we’d say `fix G = fact -- or equivalently Y G = fact -- and we have fix G = G(fix G) = G(G(fix G)) = ... = fact` We actually define $Y$ as $Y = \lambda f. (\lambda x. f(x\;x)) (\lambda x. f(x\;x))$ When we apply $Y$ to some functional $g$ , we can expand to find $Y g = g (Y g)$ . $Y$ is a construct that literally builds a fully recursive function out of the wrapper $g$ . Intuitively, it basically unpacks the internal logic from $g$ and puts it back into itself (although I don’t feel that confident about it becoming anything too concrete; reducing the thing still looks like $Y g$ ). Combinators Combinators are simply closed $\lambda$ -expressions. In combinatory logic, we capture the full power of lambda calculus, but without the notion of abstraction (or at least the ability to construct new abstractions). Instead, we can take a few closed, axiomatic abstractions (called combinators), and application can be used to construct certain kinds of new functions. Composition of these combinators can then replicate any function from lambda calculus. This simplification removes much of the complexity (although also the convenience) of abstraction, and was originally introduced to eliminate quantified variables from other logic systems, effectively presenting an alternative means of capturing the same functionality from even more primitive operations. This makes it more of an interesting demonstration rather than a practical choice for a language due to its verbosity; here the SKI system is a prime example. Semantics Curry-Howard correspondence The Curry-Howard correspondence relates computer programs to mathematical proofs. It is a formal link between typed lambda expressions and statements in mathematical logic, an isomorphism between the proof systems. The logic-to-type theory analogs are as follows: Formula $\iff$ Type Proof $\iff$ Term Implication ( $\implies$ ) $\iff$ Function ( $\rightarrow$ ) Conjunction ( $\land$ ) $\iff$ Product type ( $\times$ ) Disjunction ( $\lor$ ) $\iff$ Sum type ( $+$ ) Universal quantification ( $\forall$ ) $\iff$ Dependent product type ( $\Pi$ ) Existential quantification ( $\exists$ ) $\iff$ Dependent sum type ( $\Sigma$ ) So we can craft types that are analogous to logical formulas/statements, and type inhabitance (i.e., creating a term of the type) corresponds to a notion of proof for that logical statement. Expanding this for the specific analogs above: $t:T$ ; the proposition/type $T$ holds due to the proof/term $t$ $f:S\rightarrow T$ ; the function $f$ inhabits the function type $S\rightarrow T$ , which can be thought as a map from any proof of proposition $S$ to a proof of proposition $T$ . This embodies the notion that $T$ can be shown to hold if $S$ holds, precisely what we mean by implication. $(a,b):\Sigma_{x:A}B(x)$ ; the pair term $(a,b)$ provides some “witness” $a:A$ such that $B(a)$ holds. To be clear, there are two elements here: a proof in the usual sense above, with $b:B(a)$ (where $B(a)$ is a concrete proposition for a fixed $a$ that is inhabited by a term $b$ ), as well as a provided $a:A$ such that we actually inhabit $B(a)$ . The pair $(a,b)$ is therefore evidence that there exists some $x$ such that $B(x)$ holds, precisely what we mean by $\exists a\in A, B(a)$ . $f:\Pi_{x:A}B(x)$ ; the dependent function term $f$ provides witness terms $x:A$ such that every $B(x)$ holds. We naturally think of dependent functions as maps with output terms with types that can depend on the input value. Therefore, the existence of some inhabiting function $f$ implies we’ve mapped from all input terms $x:A$ to output terms $y:B(x)$ . That is, we effectively have a collection of pairs, each of which is similar to what we saw with $\Sigma$ types, with witnesses and proofs for all parameterizations of $B(x)$ . This is precisely what we mean by $\forall a\in A, B(a)$ . Order Lattice of types: from `Top` to `Bot` Order theory and hierarchies of types Notion of least general type: Note how, for a given term, we can say things like `c : a -> a` and `c : b`. The former is more specific, characterizing it as a function from/to a particular type, while the latter…actually I don’t get this, not really. For starters, we’re letting `a` and `b` be free type variables here, so if I think about how I’d actually write them for any concrete types, including as a UQ type, I don’t know how these are both correct. With Hindly-Milner, we see some discussion around generics also being able to inhabit specific typed variants (look at the Wikipedia discussion under “Type order”). But I don’t get this, since the term is a generic, at least as I’m used to, which makes it a distinct thing even if it can appear to be a particular typed version. en.wikipedia.org/wiki/Simply_typed_lambda_calculus en.wikipedia.org/wiki/Lambda_calculus en.wikipedia.org/wiki/Lambda_calculus#Capture-avoiding_substitutions Free variables, binding Free/bound variables, generally en.wikipedia.org/wiki/Lambda_cube commons.wikimedia.org/wiki/File:Lambda_Cube_img.svg en.wikipedia.org/wiki/Dependent_type#%CE%A0_type en.wikipedia.org/wiki/Parametric_polymorphism en.wikipedia.org/wiki/System_F en.wikipedia.org/wiki/Type_variable en.wikipedia.org/wiki/Kind_(type_theory) en.wikipedia.org/wiki/Type_constructor www.lesswrong.com/posts/ccbsYSpTcTqXwukH8/basic-building-blocks-of-dependent-type-theory ncatlab.org/nlab/show/pure+type+system en.wikipedia.org/wiki/System_F#System_F%3C: www3.cs.stonybrook.edu/~cram/cse526/Spring20/Lectures/untyped-lambda.pdf en.wikipedia.org/wiki/SKI_combinator_calculus en.wikipedia.org/wiki/Combinatory_logic en.wikipedia.org/wiki/Hindley%E2%80%93Milner_type_system This is a function from terms of type $A$ to “terms” of type $\mathcal{U}$ , where $\mathcal{U}$ is actually a type universe whose “terms” are themselves types. So $B$ maps from terms $a:A$ to types. $B$ characterizes a family of types if we think about it extensionally (not actually needed, but helps evoke how I’m thinking about it at the moment), i.e., the type space that is the function’s image. ↩︎ As far as I’ve seen, we just fundamentally take function types as given and never really need to throw some extra means of canonically specifying regular functions, which I guess is why I’ve not seen this prior to dependent type systems. Nevertheless, I like looking at this as an alternative way to color how I think about functions. ↩︎ Here I’m referred to notion $B^A$ to define function spaces, where $B$ is a codomain and $A$ is a domain. We can expand this to look just like our dependent product, even when $B$ does not dependent on any terms $x:A$ . Intuitively, we’re just saying we have a “copy” of $B$ for every input $x:A$ , and a point in the resulting product space is a choice of output for every input (i.e., a function). ↩︎ For example, we could call $(1, 2, 3)$ a function in the context of an indexed family $f$ that is the identity function. $f$ maps a 0-indexed tuple position to the same value, and we associate with that input the output value in that tuple position. So the function is explicitly captured by $\{(f(0), 1), (f(1), 2), (f(2), 3)\} = \{(0, 1), (1, 2), (2, 3)\}$ ↩︎ This has been an annoying sticking point for me. I keep wanting to assign the whole space to each point $a\in A$ , i.e., producing a collection of pairs $(a, B(a))$ , since that feels like it might be just as effective. But that really is just another way to write the function $B$ itself (again, just its graph), and doesn’t “unpack” the items of each $B(a)$ into a new, “flattened” space. ↩︎ This is a particular topic I’ve struggled with, one that has left me without a clear foundation for some time. When we say something like $\lambda x.x : \alpha\rightarrow\alpha$ , $\alpha$ is serving as a “type variable,” allowing for any choice of $\alpha$ . But even though $\alpha$ isn’t anything particular in the definition, it’s a placeholder for something that will be. In the polymorphic case, $\alpha$ also serves as a placeholder in the same way, but we include a term that explicitly “loops” over every type. We bind that type variable with the universal quantifier $\forall\alpha$ and make sure the term “handles them all,” explicitly. That is to say, it’s tied to none of them, and $\alpha$ isn’t free the way it is in the monomorphic case. Bottom line: the monomorphic type is defined around another fixed but unspecified type $\alpha$ and we’ll end up with a term that only works as a map from/to one fixed type (we’ll have to supply a choice for $\alpha$ when we instantiate). In the polymorphic case, that function is defined to work with all types; there’s not even a choice to be made for $\alpha$ during instantiation, since the function should be able to operate under a term of any type that gets passed in. ↩︎ This got me (however counter-intuitively) questioning the notion of typing at all. What is the rule that is typing itself? We seem to take the notion of typing as a given, like an action that’s baked in. But why, and fundamentally what are we left with when it’s not present? After flailing for a minute, I realized we just get back to untyped lambda calculus. I’ve been so completely focused on types for so long that I kind of forgot what it’d even mean not to have them. But turns it out it’s not all that confusing, really, and untyped settings can generally just be seen as special cases of typed ones (where there’s implicitly just one type). ↩︎ Maybe worth the explicit reminder that `fact` must know about $G$ for composition to have no effect. $G$ injects a function $f$ into its template structure, and the fixed point `fact` is completely built around that scaffolding. It’s not something strictly “pure” or independent of $G$ , and $G$ is therefore powerless to change it: it’s the construct that embodies infinite application of $G$ , and that’s why it has no effect. The $\infty+1$ analogy here is perfectly correct, but somehow even that often feels finite, like you’re still stacking an extra item to what’s ultimately just a very tall pile. Therefore it doesn’t do a great job of offloading the burden of understanding infinite composition; you have to break that annoyingly sticky finite thinking if you want to the fixed point to feel familiar. It’s the exact same analogy, but an infinitely extending spiral may feel a little nicer, or at least just a bit easier to visualize: When we wind up an additional cycle (compose once more), we get the same structure. ↩︎
name_fmt_1	Typed_lambda_calculus.md+html5
toc	Lambda cube Typing mechanisms Dependent typing ( $\lambda\Pi$ ) $\Pi$ -type $\Sigma$ -type Polymorphic typing ( $\lambda 2$ ) Type constructors ( $\lambda\underline{\omega}$ ) Pure type systems System U Subtyping Quantification Bounded quantification Encoding datatypes Recursion Combinators Semantics Curry-Howard correspondence Order
type_1	wiki
created	2025-03-16 22:42
modified	2025-09-10 02:34
summary
abstract
series
aggregates	[{'id': 10799, 'path': '/home/smgr/Documents/notes/Typed_lambda_calculus.md', 'rpath': 'Typed_lambda_calculus.md', 'name': 'Typed_lambda_calculus.md', 'title': 'Typed lambda calculus', 'link': 'Typed_lambda_calculus', 'ftype': 'md', 'ctime': '1757496869.29', 'mtime': '1757496869.29', 'atime': '1757496869.29', 'type': 'wiki', 'yaml_text': 'title: Typed lambda calculus\ncreated: 2025-03-16 22:42\nmodified: 2025-09-10 02:34\ndatelink: [[2025-03-16]]\ntype: wiki\nsummary: ', 'name_fmt': 'Typed_lambda_calculus.md+html5', 'format': 'html5', 'content': '\n \n \n TODO: \n \n \n \n Dependent sum types \n \n \n \n Typed lambda calculi are type\nsystems with (anonymous) function abstraction from Lambda calculus. \n \n \n Perhaps worth noting that both (untyped) lambda calculi and type\nsystems are fundamentally isolated formal\nsystems. The latter generally wrap up the notion of typing, and the\nformer is canonically defined without the inclusion of any such\nmechanism (lambda terms are abstractions that apply to anything). Typed\nlambda calculi make up a class of type systems that “bring in” lambda\nterms axiomatically, along with the rules of\nintroduction/application/reduction that we formalize in the general,\nuntyped lambda calculus setting. \n \n \n Lambda cube \n \n The $\\lambda$ -cube\nis a framework that captures different binding behaviors in\ntype systems as “movements” along dimensions over a 3D cube. Each vertex\nof the cube yields a particular type system with the respective binding\nbehavior present as a typing rule, i.e., each system is associated with\na point in\n $\\{0,1\\}^3$ ,\nand a\n $1$ \nin a given dimension implies the presence of that typing rule. \n \n \n \n\n\n \n \n \n The dimensions correspond to different kinds of binding mechanisms\nbetween terms and types: \n \n \n \n $x$ -axis:\nintroduces dependent types, types that can depend on terms \n \n \n $y$ -axis:\nintroduces polymorphism, terms that can depend on types \n \n \n $z$ -axis:\nintroduces type constructors, types that can depend on (other)\ntypes \n \n \n \n Starting from the simply typed lambda calculus\n $\\lambda_\\rightarrow$ \nat\n $(0,0,0)$ , \n \n \n \n We move along the\n $x$ -axis\nto\n $(1,0,0)$ \nto get\n $\\lambda\\Pi$ \n(also referred to as\n $\\lambda P$ ),\na first-order dependent type system. \n \n \n We move along the\n $y$ -axis\nto\n $(0,1,0)$ \nto get\n $\\lambda 2$ \n(also referred to as System F), a polymorphic type system. \n \n \n We move along the\n $z$ -axis\nto\n $(0,0,1)$ \nto get\n $\\lambda\\underline{\\omega}$ ,\na type system including type constructors. \n \n \n \n These represent type systems that introduce each of the typing\nmechanisms in isolation. Systems at other vertices are those with\ncombinations of these mechanisms. At\n $(1,1,1)$ ,\nwe reach the calculus of constructions\n $\\lambda\nC$ ,\nwhere types and terms can depend on types and terms. We dive into each\nof these systems and their respective typing mechanisms below. \n \n \n \n Typing mechanisms \n \n The lambda cube captures a few common typing mechanisms. Below we\ndiscuss these in greater detail. Simply put, \n \n \n \n Dependent typing: types that depend on terms \n \n \n Polymorphic typing: terms that depend on types \n \n \n Type constructors: types that depend on types \n \n \n \n Dependent typing\n( $\\lambda\\Pi$ ) \n \n $\\Pi$ -type \n \n The type system\n $\\lambda\\Pi$ \nintroduces dependent typing, facilitated primarily by\nthe\n $\\Pi$ \ntype (the dependent\nproduct type). This type captures the notion of a function whose\nreturn value’s type (as well as the value itself, of course)\ncan vary with the argument value. That is to say, there’s no fixed\ncodomain; both the output term and its type are effectively\ndynamically selected by the value of the input. \n \n \n In particular, if\n $B: A\\rightarrow \\mathcal{U}$ ¹, the type of a dependent\nfunction can be written\n $\\prod_{x:A}B(x)$ ,\ncapturing the fact such a term will map to a type determined by\n $B(x)$ .\nFor example, if we write\n $\\text{Vec}(\\mathbb{R}, n)$ \nto represent a real-valued\n $n$ -tuple,\nthen\n $\\prod_{n:\\mathbb{N}}\\text{Vec}(\\mathbb{R}, n)$ \nis how we “parameterize” it as a dependent type. \n \n \n It’s worth noting that when\n $B:A\\rightarrow\\mathcal{U}$ \nis a constant function (all\n $a:A$ \nmap to the same type), the dependent product type acts exactly the same\nas the usual function type. While we don’t fundamentally use this\nnotation when defining function types², it clarifies (for me)\nthe use of notation seen elsewhere in mathematics³.\nWe’re saying here that the type\n $\\prod_{x:A}B$ \n(no dependence on\n $x$ \nin\n $B$ )\nis equivalent to\n $A\\rightarrow B$ ;\nthey’re two different ways to write the same thing. Further, this\nproduct type can be re-written explicitly in the usual sense as \n \n \n $B\\times B\\times\\cdots\\times B,$ \n \n \n but how is this the “type” of a function (since we’re claiming\n $\\prod_{x:A}B$ \nand\n $A\\rightarrow B$ \nare the same)? We can think of a concrete function as a long tuple where\neach tuple position corresponds to a particular input value in the\ndomain (captured as an indexed family, say) and “chooses” an output\nvalue corresponding to that input⁴. That is, we’re thinking\nof functions in the more explicit, extensional, binary relation sense. In any case,\nsuch a tuple can be seen as an element of the product space\nwritten above; the concrete tuple represents a concrete function term,\nand its type can be seen as a product of the types associated with each\noutput. All this to say: this isn’t a product space serving as a domain\nor codomain for some class of functions (which is how I originally got\nconfused here), but is itself the space/type of all functions,\nwhen we think of particular functions as tuples of output values. \n \n \n This lends itself back to the original context within which the\nnotation was introduced: to express dependent types. When\n $B$ \ndoes vary with\n $x:A$ ,\nwe can explicitly write our dependent function type as \n \n \n $B_{a_1}\\times B_{a_2}\\times B_{a_3}\\times\\cdots,$ \n \n \n where\n $a_i : A$ \nranges over all terms of type\n $A$ .\nThis is a dependent product, and captures the space of\npossible dependent functions: terms that themselves map\nfrom all inputs\n $a_i:A$ \nto terms\n $b_i:B(a_i)$ .\nHere we simply get the extra flexibility to specify (loosely) a\nparticular domain (type) for each input value (abstraction-bound\nvariable). We formalize the construction of such a term with the\nintroduction rule \n \n \n $\\frac{\\Gamma,x:A\\vdash B:}{\\Gamma\\vdash\\left(\\prod_{x:A}B\\right):}$ \n \n \n which is really a simple statement that formally recognizes\n $\\prod_{x:A}B$ \nas a type (with\n“ $:$ ”\nread as “is of type type”). To use the example from before,\n $\\prod_{n:\\mathbb{N}}\\text{Vec}(\\mathbb{R}, n)$ \ncould be the type of a function that maps from an integer\n $n$ \nto the unit vector of length\n $n$ .\nNote how\n $\\text{Vec}(\\mathbb{R}, m)$ \nis the type of a vector for a specific value of\n $m$ ,\nbut not any\n $m$ .\nSo creating a function that produces a vector with a variable length\n(dependent on the function’s input) can be formed via the notion of a\ndependent type. Note how the introduction of this typing mechanism\ndoesn’t formally increase the computational power* of the type\nsystem (untyped lambda calculus is Turing complete already), but instead\nfacilitates more logical expressiveness, e.g., how precisely we\ncan encode logical constraints over terms. Without dependent types, for\ninstance, we could still have a function that produces variable-length\nvectors, but from the perspective of statically verifying the logical\ncorrectness of further statements involving such functions, it becomes\nmuch more difficult (if not impossible) to make certain guarantees about\nthe value or type of involved terms. \n \n \n The type of dependent functions like this, in the general sense, is\nreferred to as a\n $\\Pi$ \ntype, often called the dependent function type (with\nexpansion below on the use of the term “dependent product type” from\nbefore). But System\n $\\lambda\\Pi$ \nalso includes a type that’s dual to\n $\\Pi$ \ntypes: the\n $\\Sigma$ \ntype, also called the dependent pair type. \n \n \n \n $\\Sigma$ -type \n \n Using similar notation from before, we write\n $\\Sigma$ \ntypes as\n $\\Sigma_{x:A}B(x)$ .\nThis captures the notion of ordered pairs where the second term’s type\ncan depend on the value of the first. That is, if\n $(a, b):\\sum_{x:A}B(x)$ ,\nthen\n $a:A$ \nand\n $b:B(a)$ :\nthe type of\n $b$ \ngets to depend on\n $a$ ’s\nvalue. As before, we can expand this and try to make sense of its full\nsyntactical implications: \n \n \n $\\sum_{x:A}B(x) = B_{a_1} + B_{a_2} + B_{a_3}+ \\cdots,$ \n \n \n where\n $a_i : A$ \nranges over all terms of type\n $A$ .\nRight away I find this a little confusing: I see arbitrarily many types\nbeing summed over (or for at least as many terms\n $x:A$ ),\nand yet this is the type of a pair of terms. With our dependent\nfunction type, our terms themselves were functions that have to\naccount for all possible inputs\n $x:A$ ,\nso it seemed justified that we’d need to have a representation for each\n $x:A$ \nin the associated type definition. But here, it’s a little surprising to\nsee we compose more than just two types in order to capture the pair\ntype. \n \n \n Intuitively, all that’s happening here is the\n $+$ \nis behaving like an “or” operator. Since\n $b$ ’s\ntype can depend on any term\n $a:A$ ,\nit does make sense that we’d need every possible choice in our type\ndefinition; it’s just that any term must “commit” to only one of them.\nSo we need them all like before, but while\n $\\times$ \nindicates joint involvement (our term will use them all),\n $+$ \ncaptures the notion of a choice among them (our term will only\npick one, but it could be any of them). \n \n \n In some sense (and I do mean that vaguely), this is actually more\ngeneral than the\n $\\Pi$ \ntype. In that case, our type captures the idea of assigning an output\ntype to every possible input term, giving us terms (functions)\nthat must actually make such a decision for all inputs, i.e., assigning\na\n $B(a)$ \nfor every\n $x:A$ .\nBut here, we’re doing that exact same thing…just without the requirement\nto do it for all inputs at once. In a sense we “free up” the input as\nwell, and our terms get to pick whichever input (along with the\nassociated, dependent output type) they want, rather than needing to do\nit for all of them (doing it for all inputs leaves the only\nfreedom in the output side of things; the function doesn’t get to pick\nspecific inputs it should be defined over). So our pair is like any\nindividual input-output slice of a full dependent function. \n \n \n \nInterpreting\n $\\Sigma$ \ntypes as spaces\n \n \n We can sometimes think of types extensionally, representing spaces of\nall their possible constituent terms. Given the extra “degree of\nfreedom” we described above, it actually feels harder to shape that\ncorresponding space for\n $\\Sigma$ \ntypes, or at least to capture how we should think about it. In the\n $\\Pi$ \ntype setting, I know my function terms pick a type\n $B(a)$ \nfor every input\n $a$ \n(for some choice of\n $B$ ),\nand all my concrete function terms under that\n $B$ \nwill map any given input\n $a$ \nto a term\n $b : B(a)$ .\nI basically “stretch out” all those\n $B(a)$ \nand I’ve got my space’s structure: all terms in that space are maps with\nthe same output types when queried at the same inputs. The presence of\nthe “term conditioning” from\n $a$ \nsort of disappears in the implicit order of my\n $B(a)$ ;\nthe type\n $A$ \nis expressed implicitly through the family of\n $B(a)$ ’s.\nWe actually have to deal with terms of the type\n $A$ \nin order to “see” what types of\n $B$ \nwill be, so it makes sense that we can’t exactly have a term\n $a : A$ \nshow up in the type’s definition. For\n $\\Sigma$ \ntypes, we do a similar thing by packing in\n $B$ \ntypes in such a way that the dependence on terms\n $a : A$ \nis made implicit. But our space is effectively more locally\nvariable, and all of our terms don’t share the same array of output\ntypes like we saw with\n $\\Pi$ \ntypes (which acts as kind of a common thread, like a “stitch” in the\nspace that captures the common structure of terms). Instead, our terms\nare even more “tightly dependent” and sort of stubbornly dissimilar,\nmaking it hard to capture the structure of the common space all terms\nshare. So I resign myself to taking just that very structure: the\n(implicit) coupling of a term\n $a:A$ \nwith its paired type\n $B(a)$ ,\nand we bag all the possible concrete couplings up into a set like\n $\\{(a, b)\|a:A,b:B(a)\\}$ . \n \n \n In general, I think I let this be more a sticking point than it\nneeded to be, but my inability to let go of little confusions or\nfrustrations sometimes leads to greater insight. I may end up reusing\nthis outside this scope, but the point is that writing the “sum” \n \n \n $\\sum_{x:A}B(x) = B_{a_1} + B_{a_2} + B_{a_3}+ \\cdots,$ \n \n \n is not really all that accurate. It should be thought of as a\ndisjoint union, which explicitly lets us tag those types while\nbagging them all up like we want. That is, it just preserves the\nidentity of the types\n $B(a)$ \nusing the\n $a$ \nthat produced them, and those are the pairs that can make up our type\nspace. So more like \n \n \n $\\sum_{x:A}B(x) = (\\{a_1\\}\\times B(a_1)) + (\\{a_2\\}\\times B(a_2))+ \\cdots,$ \n \n \n This still leaves\n $B(a_i)$ \nas a type, but we can interpret that as a set in the generic way, i.e.,\n $\\{b \| b:B(a_i)\\}$ .\nThat gives you a formal set product\n $\\{(a_i, b) \| b:B(a_i)\\}$ \nfor each\n $a_i:A$ ,\nand when you let\n $+$ \nmean\n $\\cup$ ,\nyou get your big, fully expanded disjoint union. That is your formal\nset-based interpretation; it’s clear how we can dynamically expand only\nthe properly dependent pairs of values. \n \n \n So when I said this space feels like has less structure, I actually\nstill feel that’s an accurate depiction. Even if the little set-based\njumps we make to expand the space are easy to follow, it’s harder to\nhook into common structure between these objects. You have the paired\npart of them, yes, but beyond that we literally just tag dependent types\nexplicitly. It’s not some tightly wound, reduced structure, or even as\nlike rigid as the functions with\n $\\Pi$ \ntypes where we at least have all objects stretching across the input\nspace. Nope: we just slap a name tag on the types and put them in the\nbag, sort of the bare minimum to ensure they can be distinguished. And\nthat very process of tagging is really the only structure we\ncan even grab onto afterward, to group up “stubbornly dissimilar” terms\n(to reuse the phrase from earlier, which I again feel is accurate).\nThat, I suppose, is really just a consequence of great flexibility:\nwe’re literally defining a compositional type that is more or less an\nunimposing bag of other types. That’s as unstructured as you get when it\ncomes to wrangling collections of objects: you’re adding no new\nstructure or transforming things into a common shape. Perhaps that’s why\nit’s so slimy to me, so hard to just let be. It’s like I’m\nexpecting more to be there, more structure to understand, when there in\nactual fact is none, on purpose. \n \n \n \n Note the syntactic analogies to\naddition/multiplication/exponentiation, when the dependent function\n $B$ \nis constant: \n \n \n \nThe dependent pair\n $\\Sigma_{x:A}B$ \ncan also be written\n $A \\times B$ ,\nwhich can be interpreted in the usual product space sense, where to each\npoint in\n $A$ \nwe attach an instance of\n $B$ .\nTo be precise:\n \n \n When\n $B(x)$ \nvaries with\n $x:A$ ,\nwe can treat as as an indexed family of types/sets, indexed by\n $A$ \n \n \n The dependent pair is then analogous to a disjoint union: \n $\\Sigma_{a:A}B(a) = \\bigsqcup_{a:A}B(a) = \\bigcup_{a:A}\\{(a,b)\|b:B(a)\\}$ \n \n \n When\n $B(x)=B$ \n(i.e., is constant, no dependence on\n $A$ ),\nthe disjoint union is simply to the usual Cartesian product: \n $\\Sigma_{a:A}B = \\bigsqcup_{a:A}B = A\\times B$ \n \n \n Note how\n $\\Sigma$ \nor\n $\\bigsqcup$ \nactually do some work in “breaking up” and “rearranging”\n $B$ ;\na set like\n $\\{(a, B(a)) \| a:A\\}$ \nthat might attempt to attach to each\n $a\\in A$ \nthe full type/space\n $B(a)$ \nis only half way to something useful (and is just the graph of\n $B$ ).⁵ \n \n \n \n \n\n\n \n \n \n \n The dependent product\n $\\Pi_{x:A}B$ \ncan also be written\n $B^A$ ,\nwhich can be interpreted as “multiplying” instances of\n $B$ \nfor each element/term of\n $A$ .\nAs mentioned above, when\n $B$ \nis static, this simply recovers the notion of an infinite product space\nhousing functions\n $f: A \\rightarrow B$ .\nOtherwise, the product is composed of conditioned spaces produced by\ninvoking\n $B(x)$ \nfor each\n $x: A$ ,\nand we usually don’t employ the compact representation\n $B^A$ \n(it doesn’t really capture the\n $B$ ’s\ndependence on\n $A$ ,\nbut could be implicitly understood). \n When we write out the product explicitly, we have a sequence of\nconcrete products like we saw with the dependent pair, where each\n $B(a)$ \nis a fixed set making up one part of a nested product: \n $\\begin{aligned}\n \\Pi_{x:a}B(x) &= B(a_1) \\times B(a_2) \\times \\cdots \\\\\n &= \\Sigma_{b_1:B(a_1)}\\left[\\Sigma_{b_2:B(a_2)}\\cdots\\left[\\Sigma_{b_{n-1}:B(a_{n-1})}B(a_{n})\\right]\\right]\n\\end{aligned}$ \n Here we see increasing levels of abstraction taking place with\nrepeated application, again analogous to multiplication representing\nrepeated addition, exponentiation representing repeated multiplication,\netc. \n \n \n \n (See also: Building\nblocks of dep. type theory) \n \n \n \n \n Polymorphic typing\n( $\\lambda 2$ ) \n \n Polymorphic types allow term definitions to depend on a\nspecific type. For example, instead of saying something like \n \n \n $\\lambda x.x:\\alpha\\rightarrow\\alpha,$ \n \n \n which can be interpreted as an identity function mapping from/to\nterms of type\n $\\alpha$ \n(for a particular choice of of\n $\\alpha$ ),\nwe instead have \n \n \n $\\lambda x.x:\\forall\\alpha.\\alpha\\rightarrow\\alpha,$ \n \n \n which defines an identity function polymorphic in\n $\\alpha$ ,\nmeaning any (every) choice of\n $\\alpha$ \nis allowed. For the former, the type\n $\\alpha\\rightarrow\\alpha$ \ndescribes a monomorphic “family” of types since\n $\\alpha$ \ncan be anything. But the term will always involve a particular\nchoice, e.g., be the identity function from/to terms of\n`int` or `bool` types. In the polymorphic case, we\nnever require such specificity: the identity function must work for\nall types (as in, we could pick any one of them). This is kind\nof like saying we get one term that is “type aware,” whereas in the\nmonomorphic case we’d actually have to define multiple (separate) terms\nif we wanted to work with different choices of\n $\\alpha$ .⁶ \n \n \n System\nF is a second-order typed lambda calculus,\nformalizing parametric\npolymorphism. In simply typed lambda calculus, this isn’t exactly a\nnative element of the type theory, so universally quantified type\nvariables are formalized. For example, in the case of our identity\nfunction, we have \n \n \n $\\vdash \\Lambda\\alpha. \\lambda x^\\alpha. x : \\forall\\alpha.\\alpha\\rightarrow\\alpha,$ \n \n \n where\n $\\alpha$ \nis a type variable, and\n $\\Lambda$ \nrefers to a type-level function. This is a judgment that says, as a\nfunction of type\n $\\alpha$ ,\nthe identity function of a bound variable\n $x$ \nof type\n $\\alpha$ \nhas type\n $\\alpha\\rightarrow\\alpha$ .\nThe entire left-hand expression, including the outer\n $\\Lambda$ \nterm, is of the single “most general type,” involving a universally\nquantified type variable,\n $\\forall\\alpha.\\alpha\\rightarrow\\alpha$ .\nIt is the most general in that every other type we might feasibly assign\nto the term can be arrived at via substitution of this one, i.e., are\nless general. \n \n \n In System F, we actually formalize the universally quantified type\nvariable as a single type by a typing rule: \n \n \n $\\frac{\\Gamma,\\alpha\\;\\text{type}\\vdash\nM:\\sigma}{\\Gamma\\vdash\\Lambda\\alpha.M:\\forall\\alpha.\\;\\sigma}$ \n \n \n Here we’re saying, when\n $M$ \nis of type\n $\\sigma$ \nand we take\n $\\alpha$ \nto be a type variable, then\n $\\Lambda\\alpha.M$ \nis of type\n $\\forall\\alpha.\\;\\sigma$ .\nNotice how it’s not fundamentally a function type or anything; this is\nus actually defining a new type and its syntax, axiomatically.\nThe quantifier\n $\\forall$ \nis not purely symbolic, however; it is actually signifying\nquantification over all types \n \n \n We also have the rule describing the application of\n $\\Lambda$ \nterms: \n \n \n $\\frac{\\Gamma\\vdash M:\\forall\\alpha.\\;\\sigma}{\\Gamma\\vdash M\\tau :\n\\sigma[\\alpha:=\\tau]}$ \n \n \n which just says how when a term\n $\\tau$ \nis applied to a polymorphic term\n $M$ ,\nit’s as if we’re instantiating the whole template under that type, and\nreplace any dependence on\n $\\alpha$ \nin\n $\\sigma$ \nwith\n $\\tau$ .\nBringing back the identity example from above, we have \n \n \n $\\vdash (\\Lambda\\alpha. \\lambda x^\\alpha. x)(\\text{Bool}) : \\text{Bool}\\rightarrow\\text{Bool}$ \n \n \n $\\text{Bool}$ \nis applied to the\n $\\Lambda$ \nterm, effectively: \n \n \n \n Stripping off the outer\n $\\Lambda\\alpha$ \n \n \n Binding\n $x$ \nin\n $\\lambda$ \nto type\n $\\text{Bool}$ \n(although the RHS type signature really does this as well) \n \n \n By the application rule, for the type side we strip off\n $\\forall\\alpha$ \nand replace the references in term (which was\n $\\alpha\\rightarrow\\alpha$ )\nwith\n $\\text{Bool}$ \n \n \n \n So we effectively end up with the term\n $\\lambda\nx.x:\\text{Bool}\\rightarrow\\text{Bool}$ .\nWe can think of the polymorphic term as a producer of a family\nof such concrete terms; the polymorphic term itself is of a\nsecond-order, producing a first-order, monomorphic term after\napplication. \n \n \n (See also: Type\nvariable) \n \n \n \n Type constructors\n( $\\lambda\\underline{\\omega}$ ) \n \n In System\n $F \\underline{\\omega}$ ,\nwe introduce type construction. This is a mechanism that\neffectively brings types into the term space, loosely speaking: types\nare allowed to be part of terms, e.g., taken as inputs to a function.\nSuch terms themselves have types, which we call kinds. \n \n \n Type\nconstructors facilitate the creation of new types from existing\nones. They can be formally seen as\n $n$ -ary\ntype operators, returning a type from\n $n$ \ntypes passed as arguments. Note that we typically write this in a\ncurried manner as needed, like other functions of many variables. As\nsuch, type operators can be seen as functions in a higher-order type\ntheory (namely a simply typed lambda calculus) with one basic type\n $$ ,\nrepresenting the type of all types in the underlying language. \n \n \n For instance, if\n $s: \\sigma$ ,\nthen\n $\\sigma: $ .\nBoth\n $\\sigma$ \nand\n $$ \nare types, albeit at different “levels;” we generally distinguish them\nas “proper types” and “kinds,” respectively. \n \n \n Kinds are effectively declarations of type constructor arity\n(rather than the “type of a type”). Given just a single type in the\n“kind system” is allowed, first-order type operators are simply curried\nfunctions of proper types and look like\n $ \\rightarrow \\cdots \\rightarrow $ : \n \n \n \n $$ \nis the kind of all proper types (also called “nullary types,” i.e., the\nconstant result of a type constructor with no inputs). This is\npronounced “type,” e.g.,\n“ $\\sigma:$ \nindicates\n $\\sigma$ \nis of type type.” \n To be clear, even though\n“ $$ ”\nmeans “type,” it is part of a kind system. We say\n“ $\\text{nat}$ \nis a type” (judgment\n“ $\\vdash \\text{nat Type}$ ”\nin the base-level type theory) and\n“ $$ \nis a kind” (judgment\n“ $\\vdash\n\\;\\text{Type}$ ”\nin the higher-level type/kind theory). \n \n \n $\\rightarrow $ \nis the kind of a unary type constructor, such as that of a list\ntype, e.g.,\n $\\text{list} a: \\rightarrow $ \nwhere\n $a$ \nis the type of the list elements. \n \n \n $\\rightarrow \\rightarrow $ \nis the kind of a binary type constructor, e.g., the function type\nconstructor. That is, something like\n $(\\lambda\\sigma\\tau.\\sigma\\rightarrow\\tau) : \\rightarrow \\rightarrow $ \nas a judgment the kind system (although I don’t know if we really ever\nembrace such an explicit formation given functions are usually\nprimitives, but I think it demonstrates the point). \n \n \n \n A higher-order type constructor is one that itself maps from a type\nconstructor to a proper type. This might have the kind\n $(\\rightarrow )\\rightarrow $ ,\nfor instance, and helps describe the kind* of more complex\nconstructs like monads. \n \n \n Note that we used\n $\\lambda\\omega$ \nto denote the type system including both type constructors and\npolymorphism. \n \n \n \n \n Pure type systems \n \n Pure type systems are typed lambda calculi that include an\narbitrary number of sorts and dependencies between them. This\ngeneralizes the lambda cube, from which perspective we can view its\n“corners” as instances of pure type systems with just two sorts. \n \n \n In particular, a pure type system is a triple\n $(\\mathcal{S}, \\mathcal{A},\n\\mathcal{R})$ : \n \n \n \n $\\mathcal{S}$ \nis the set of sorts \n \n \n $\\mathcal{A}\\subseteq\\mathcal{S}^2$ \nis the set of axioms; an axiom\n $(s_1,\ns_2)$ \nis a foundational statement that\n $s_1:s_2$ \n \n \n $\\mathcal{R}\\subseteq\\mathcal{S}^3$ \nis the set of rules \n \n \n \n Pure type systems “break down” a wall that previously wasn’t really\nvisible; simply typed lambda calculus, for instance, is fundamentally\nconstructed around terms and types, and we don’t really question those\ntwo “classes” of items. Pure type systems open that up, extending the\nnotion of typing (allowing lower-order terms to be classified by\nhigher-order terms) to more expansive hierarchies⁷. \n \n \n (See also: Pure type system\n(nLab)) \n \n \n System U \n \n \n \n Subtyping \n \n Subtyping captures a notion of substitutability\nbetween types. If\n $S$ \nis a subtype of\n $T$ ,\nwe typically write this\n $S <: T$ ,\nimplying that\n $S$ \ncan be used in place of\n $T$ \nin any context where\n $T$ \nis expected. Type systems like System\n $F_{<:}$ \nformalize this properly (e.g., with subtype judgments and associated\nrules), but from what I can tell, most resources stay away from the\nweeds and discuss record types and LSP. \n \n \n More formally, subtype applicability is determined by\nsubsumption. For a type\n $\\mathrm{T}$ ,\nwe generally adopt an extensional view\n $\\mathbf{T}$ \n(the set of all and an intensional view\n $\\mathit{T}$ … \n \n \n …leaving off here. Taking some issue with the int/ext views of types,\nsince it’s more set-theoretic. Want to get this right, mix in LSP,\ncontrast with inheritance, and maybe get a clear link out to a relevant\npage on func prog. \n \n \n -> Then do universally/exist. quant types (below); I feel like I\nstill have yet to nail this down. Then do the semantics bit, equational\ntheory. This last piece should wrap us around to algebraic DTs. Then\nmaybe hit combinators, finally monoids and some Haskell examples. \n \n \n -> Now back to types as sets with the paper \n \n \n \n Quantification \n \n Quantified types facilitate greater expressive power, appealing to\nhigher level typing mechanisms. Universal\nquantification enables polymorphism through generic types, and\nexistential quantification yields abstract data types\n(through information hiding). Together, they facilitate parametric\ndata abstraction. \n \n \n Generally speaking, quantification is a means of\nabstraction, and is facilitated by binders. Binding\noperators act on variables of a certain kind, and place (bind) them\ninside a particular context. That context is made abstract as a\nresult: an expression is produced that isn’t tied to any particular\nconcrete value. The resulting abstraction can therefore be seen\nas a generic template, and is accompanied by a notion of\napplication, wherein concrete values can be supplied to “fill\nin” the template and make the entire expression concrete. Fundamentally\nwe’re introducing a higher level means of operation, a construction\nabove terms, a producer of terms. Lambda calculus is so\npowerful because we let this higher level construct be a term\nitself; it doesn’t exist strictly in a higher plane but is\n“flattened back out” to the same level as concrete terms. This enables\nabstractions to be arbitrarily nested (terms can be abstractions, and\nabstractions are defined over terms), and ultimately represent all\ncomputable functions (lambda calculus is Turing complete). \n \n \n In the case of function abstraction,\n $\\lambda$ \nbinds term variables:\n $\\lambda\nx. c(x) : A\\rightarrow B$ \nbinds a term\n $x:A$ \nin a context\n $c(x):B$ .\nAs we’ve already seen with polymorphic typing in\n $\\lambda 2$ ,\ntype-level functions are introduced with the\n $\\Lambda$ \noperator, which binds type variables in term space. Resulting\nterms are generic functions\n $\\Lambda\\alpha. \\lambda\nx^\\alpha. c(x)$ ,\nand have a universally quantified type\n $\\forall\\alpha.\\alpha\\rightarrow\\alpha$ .\nThis is fundamentally a new kind of term, i.e., functions that can\neffectively be parameterized by a type, and we’ve got a new type\ndefinition to go along with it. \n \n \n Existentially quantified types are a bit more nuanced. For starters,\nexistential quantification isn’t accompanied by a fundamentally new kind\nof binding operator in term space. Instead, it’s purely a mechanism for\nabstraction in type space, typically used to hide\ncertain type-level specifics. The statement\n $x : \\exists\\alpha. t(\\alpha)$ \nallows us to specify\n $x$ \nonly in terms of an exposed interface\n $t$ ,\ni.e., some higher level type structure that, when parameterized by\nsome type\n $\\alpha$ ,\nwill represent\n $x$ ’s\nreal type. The whole point is that we can get away with not knowing or\nspecifying\n $\\alpha$ ,\nand the binding operator\n $\\exists$ \nencapsulates that detail behind the visible structure seen in\n $t$ .\nSuch a type is clearly amenable to high-level specifications of abstract\ndata types: we can declare terms that verifiably observe certain type\nsignatures and are invariant to any involved concrete types. Note that\nterms with existentially quantified types are introduced and eliminated\nwith `pack` and `unpack` operators, respectively.\nThese are fundamentally different than an operator like\n $\\Lambda$ ;\nthe latter introduces an irreducible new primitive (type abstraction),\nwhereas `pack`/`unpack` are more like rules for\nworking with existentially quantified terms (and can be fundamentally\nrepresented via universal quantification). \n \n \n While the notions of universal and existential quantification are\nsimilar in that types are being abstracted over, they say very different\nthings. Informally,\n $x : \\exists\\alpha. t(\\alpha)$ \nlets us say that\n $x$ \nhas type\n $t(\\alpha)$ \nfor some particular\n $\\alpha$ .\nThis is quite different from\n $x :\n\\forall\\alpha. t(\\alpha)$ ,\nwhich suggests that\n $x$ \nis an abstraction that can handle all types\n $\\alpha$ .\nIn the former,\n $x$ \nhas no such “awareness” of choosing a type for\n $\\alpha$ ;\na specific choice of\n $\\alpha$ \nis involved but we’re blind to its actual value, suggesting it could be\nany one choice. In the latter,\n $x$ \nis explicitly a generic construct that can handle all choices\nof\n $\\alpha$ .\nThere’s no notion of “plugging in” some particular choice of\n $\\alpha$ \nto get\n $x$ ’s\n“true” type; instead,\n $x$ \nis explicitly the thing (a generic function) that can handle any choice\nfor\n $\\alpha$ . \n \n \n More formally, it’s important to recognize that both universal and\nexistential quantification do ultimately represent new, singular types.\nThey’re not just loose characterizations of families of types or\npossible type choices (as we might be inclined to interpret them), but\nformal types with typing rules, just like everything else. I often find\nmyself forgetting this since\n $\\exists$ \nand\n $\\forall$ \nlook to be loose wrappers on other types; it’s tempting to just think of\nthem extensionally, as possible values for the types they “wrap.” This\nis perhaps a perfectly reasonable and intuitive thing to do (encouraged,\neven), but they are nevertheless still themselves considered formal\ntypes. \n \n \n Bounded quantification \n \n Bounded quantification is quantification (universal\nor existential) with subtyping. \n \n \n \n \n Encoding datatypes \n \n The power of lambda calculus becomes quickly apparent when we discuss\nrepeated application of higher-order abstractions. One way to begin\nformalizing this is via Church encoding, and the Church-Turing\nthesis shows that any computable operator can be represented under this\nscheme. \n \n \n Recursion \n \n When defining recursive functions, we might reach for a familiar\nself-referential form. For example, with the factorial function: \n \n `fact = fun (n: Int) if n=0 then 1 else n * fact(n-1)` \n \n As a term, this is improperly defined: `fact` refers to\nitself before it is fully defined. We can imagine needing to parse a\ndefinition left to right before the LHS identifier is considered valid,\nbut we encounter `fact` before we get all the way through, at\nwhich point `fact` is not a valid alias. In any case, we do\nthis all the time in most programming languages, it’s just that it’s a\nconvenient expression for something more formal. That something “lifts”\nthe function into a functional: \n \n `fact = λf. λ(n: Int). if n=0 then 1 else n * f(n-1)` \n \n This is not intrinsically recursive: it is a function that takes\nanother function `f` as input, embeds it in another function,\nand returns that. That is, something like the general form\n`λf. λx. t(f(s(x)))`, which says `f` is a function\nthat operates on a term `s(x)`, and `t` is some\ntransformation of `f` after it’s applied to\n`s(x)`. For a quick example, \n \n -- let s just be the identity function\ns = λx. x\n\n-- let t(f, x) mirror the factorial setup\nt = λf. λx. xf(x-1)\n\n-- so our general term:\nλf. λx. t(f(s(x)), x)\n\n-- turns into\nλf. λx. xf(x-1)\n\n-- and then if we define some function g\ng = λy. yy\n\n-- we can apply our form to it like\n( λf. λx. xf(x-1) )( g )\n-> λx. xg(x-1)\n-> λx. x((x-1)(x-1))\n\n-- this final term is a function which we can reapply as a "new g"\nh = λx. x((x-1)(x-1))\n\n-- on the left is our general term, apply to h\n( λf. λx. xf(x-1) )( h )\n-> λx. x( (x-1)( (x-2)(x-2) ) ) \n \n What we’re seeing here is that each repeated application of the\nfunctional – using the last output function as input to the functional,\nproducing the next function – approximates part* of the full\nrecursive behavior. The last term above just accounts for two recursive\nsteps. That is, it’s like we just iteratively pack a snapshot of the\nfunction into itself one step at a time, and once we “call” the outer\nthing for some input `x`, we recursively apply the function\nas usual (by virtual of the fact we’ve already packed in all the nested\ncalls). The point is that such a term actually explicitly packs\nthe function unwrapping into a term so we don’t run into the definition\nissues, i.e., the usual implicit syntactic sugar we get away\nwith in most programming languages. All of that function composition\nhappens at runtime, dynamically, rather than being one huge\ncompositional thing we’ve pre-expanded and can evaluate in one go. The\nlast term in the above code block represents such an explicit term after\ntwo manual composition steps, which is clearly only the beginning of the\nlengths we’d need to go to get the full recursive thing (spoiler: we\nneed to do this arbitrarily many times). \n \n \n Note that this isn’t even the factorial function (I chose\n`g(x)=xx`), but if we map this onto that setting, the\nequivalent would be a final two-step function\n`λx. x(x-1)(x-2)`. You can see how `x` is\nallowed to be any value (assuming `x: Int`) which doesn’t\nguarantee we actually “recurse to a base case.” If we plug in\n`x=10`, for instance, we just get `1098`, which\nis not* `10!` (actually we do not get a\nconvergent output whatsoever if we don’t reach the base case…more on\nthat below). Typically, full recursive behavior is allowed to recurse\narbitrarily many times until a base case is reached. But to ensure we\ncan even do that for arbitrary inputs `x`, we need to have\npacked in arbitrarily many recursive steps in our expanded term. How can\nwe possibly do that, given `x` itself may be arbitrarily\nlarge? \n \n \n Our full recursive function is therefore equivalent to the\nlimit of this process, i.e., the thing we approach as we repeat\ncomposition an arbitrary number of times. With more and more nested\ncomposition, we get closer to the thing, and in the limit, we\nembody the notion of infinite nested composition. Any\nfurther composition therefore changes nothing: that thing is\nalready a fixed point. More on this below. \n \n \n Because I’ve lost myself several times as I work this out, I think it\nmay be helpful to spell things out slowly while I’m here in the weeds.\nThis is almost entirely to help me now (the concept really is that\nslippery), but will hopefully be good starting point if I find myself\nback in this place in the future. Here’s how I’m seeing things now, step\nby step, rehashing many of the points above: \n \n \n \n You’re trying to define the factorial function, say in a usual,\n“intuitive,” implicitly recursive way. You start with \n `λ(n: Int). if n=0 then 1 else n * f(n-1)` \n which establishes your base case and the recursive dependence on the\nsubproblem for `n-1`, captured by calling yourself as\n`f`. But here we’ve merely used `f` to stand\nin for our function…how are we meant to actually get ourselves\nsqueezed into `f`? This sort of already feels like\nthe whole recursive function as we’d want to define it (in a common\nprogramming language). The problem is I can’t formally refer to this\nterm in itself in the usual way. I explained this pretty\nstraightforwardly above: there’s no way for `f`, the name we\nwant to use to refer to the whole outer term, to be made available\ninside its own scope before that scope is even defined. It\nsimply doesn’t make sense; there is no `f` at the time we\nwant to use it. \n \n \n So what do we do? Again, this signature already feels kind of right:\nwe want our recursive function to take an integer as input. So we want\nto preserve that element of the structure without bulking up the term\nper se, but we need to add something in order to facilitate a\nmeaningful self-reference as `f` inside the term. We can do\nthis by parametrizing `f` like this: \n `λf. λ(n: Int). if n=0 then 1 else n * f(n-1)` \n This gives us a functional term where we now need to first supply a\nfunction `f` in order to “get back” to our expected\nsignature. So this thing is not our desired function, but it’s a\nmechanism to build it. To be clear, the question we now are\ntrying to answer is how we can even take a “snapshot” of our function\nand use it as `f`. This is where I really encountered that\nfirst feeling of limbo: I didn’t have a great sense for what we could\neven grab onto to put there. \n \n \n Where to go from here? How to set `f`? We take the above\nform and some first, “lowest” function\n $\\bot$ \n(a canonical `undefined` that effectively “nukes” the output\nif reached rather than the base case) and apply it, giving us \n `f0 = λ(n: Int). if n=0 then 1 else n * ⊥(n-1)` \n So `f0` is back to the direct function signature we’re\nafter, having chosen a particular `f` to “embed.” We can then\nrepeat this: \n `f1 = λ(n: Int). if n=0 then 1 else n * f0(n-1)` \n and so on, where `f<n>` represents a\npartial factorial function that includes `n` nested\ncompositions. Note that `f<n>` is only well-defined for\nintegers `0` to `n` given our function; for inputs\ngreater than `n`, we never reach the base case and diverge\n(becoming `undefined`; we’ll basically end up making a call\nlike `⊥(x)` for some `x > 0`, blowing\neverything up). Additionally note that the `n` in\n`f<n>` only controls the depth of composition rather\nthan directly restricting the input to the underlying function (I’ve had\nthe tendency to interpret it as like improving the “coverage” of our\ninput space, and while in this case it sort of does that indirectly, in\ngeneral it does not determine the kind of inputs that produce concrete\noutputs). \n \n \nWith the above mechanism, we can let `n` tend toward\ninfinity, which will step us ever closer to recovering an arbitrarily\ndeep capacity for recursion. With some finite `n`,\nhowever, we can always supply inputs `>n` that cause our\noutput to diverge in the function `f<n>` (i.e., never\nhit the base case), which differs from a true, fully dynamic recursive\ndefinition. We therefore look toward the limiting term, the\nterm that can recurse dynamically. Such a term therefore\ncannot change under any further applications of composition (it\nalready embodies a notion of “infinite composition;” to do it again\nwould be like trying to take\n $\\infty+1$ .\nPut another way, the two mirrors are fully snapped parallel, and you\nhave a term that simply never deals in finite levels of composition: you\ncan’t “unravel” it with a finite number of applications), and is thus a\nfixed point.\n \n \n We explain a bit more explicitly below, but it’s as if we’ve produced\n`f∞`. The internal reference to itself can’t be\nsomething finite; if you tried, similar to our above point, you’d have\nsomething like `f<∞-1>`, which is just `f∞`\nagain. So it literally has its full self inside itself: the\ninternal thing isn’t smaller, and the outer composition isn’t larger. As\nconfusing as it is (certainly given our construction from increasingly\nlarge finite composition steps, where `f<n>` is indeed\nlarger than its internal use of `f<n-1>`), they’re\nthe same. \n \n \n \n \n To be clear about what we mean by fixed point: we’re saying that our\ntrue, fully recursive factorial function is the fixed point of the\nfunctional term \n `G = λf. λ(n: Int). if n=0 then 1 else n * f(n-1)` \n That is, this “builder term” `G` has our target recursive\nfunction as a fixed point. If our target function is called\n`fact`, we are therefore saying that\n`G(fact) = fact`: `fact` is the thing that\nalready fully embeds/references itself. You cannot “squeeze”\nanother level of composition into that `f` reference by\ncalling `G` again (like we did above, in sequence, for finite\nlevels). \n If this still feels slimy (and it certainly is for me; I literally\nfind myself able to get it one second and lose it the next), go ahead\nand actually perform the application: \n `G(fact) = λ(n: Int). if n=0 then 1 else n * fact(n-1)` \n What we’re saying is: when we embed our fully recursive function\n`fact` into `G`, the function we get out is just\n`fact` again. You can’t “outsmart” it or “outwrap” it, as\ncounterintuitive as it may be, since it feels you could always\ntake another composition that the term you’re using can’t be “aware of,”\nthat it can’t anticipate that you’re going to do it again. But no, it\ncan be aware of further composition and it’ll have no effect:\nit literally includes its full self. That final point\nis really the best characterization in my opinion, and if you just can’t\nbe satisfied with the formal argument, sit with that phrase until it\nsinks in.⁸ \n \n \n So how can we systematically get to this fixed point “right away,” as\nif we just defined things implicitly in the first place? The\nfixed-point, or Y,\ncombinator. This is a higher-order functional,\ni.e., a function taking a functional, and it returns the fixed\npoint function of that functional. We refer to this as\n $\\text{fix}$ \nor\n $Y$ .\nIn our above example, we’d say \n `fix G = fact\n\n-- or equivalently\nY G = fact\n\n-- and we have\nfix G = G(fix G) = G(G(fix G)) = ... = fact` \n We actually define\n $Y$ \nas \n $Y = \\lambda f. (\\lambda x. f(x\\;x)) (\\lambda x. f(x\\;x))$ \n When we apply\n $Y$ \nto some functional\n $g$ ,\nwe can expand to find\n $Y g = g (Y\ng)$ .\n $Y$ \nis a construct that literally builds a fully recursive function out of\nthe wrapper\n $g$ .\nIntuitively, it basically unpacks the internal logic from\n $g$ \nand puts it back into itself (although I don’t feel that confident about\nit becoming anything too concrete; reducing the thing still looks like\n $Y\ng$ ). \n \n \n \n \n Combinators \n \n Combinators are simply closed\n $\\lambda$ -expressions.\nIn combinatory\nlogic, we capture the full power of lambda calculus, but\nwithout the notion of abstraction (or at least the ability to\nconstruct new abstractions). Instead, we can take a few closed,\naxiomatic abstractions (called combinators), and application can be used\nto construct certain kinds of new functions. Composition of these\ncombinators can then replicate any function from lambda calculus. This\nsimplification removes much of the complexity (although also the\nconvenience) of abstraction, and was originally introduced to eliminate\nquantified variables from other logic systems, effectively presenting an\nalternative means of capturing the same functionality from even more\nprimitive operations. This makes it more of an interesting demonstration\nrather than a practical choice for a language due to its verbosity; here\nthe SKI\nsystem is a prime example. \n \n \n \n \n Semantics \n \n Curry-Howard correspondence \n \n The Curry-Howard correspondence relates computer\nprograms to mathematical proofs. It is a formal link\nbetween typed lambda expressions and statements in mathematical logic,\nan isomorphism between the proof systems. The logic-to-type\ntheory analogs are as follows: \n \n \n \n Formula\n $\\iff$ \nType \n \n \n Proof\n $\\iff$ \nTerm \n \n \n Implication\n( $\\implies$ )\n $\\iff$ \nFunction\n( $\\rightarrow$ ) \n \n \n Conjunction\n( $\\land$ )\n $\\iff$ \nProduct type\n( $\\times$ ) \n \n \n Disjunction\n( $\\lor$ )\n $\\iff$ \nSum type\n( $+$ ) \n \n \n Universal quantification\n( $\\forall$ )\n $\\iff$ \nDependent product type\n( $\\Pi$ ) \n \n \n Existential quantification\n( $\\exists$ )\n $\\iff$ \nDependent sum type\n( $\\Sigma$ ) \n \n \n \n So we can craft types that are analogous to logical\nformulas/statements, and type inhabitance (i.e., creating a\nterm of the type) corresponds to a notion of proof for that\nlogical statement. Expanding this for the specific analogs above: \n \n \n \n $t:T$ ;\nthe proposition/type\n $T$ \nholds due to the proof/term\n $t$ \n \n \n $f:S\\rightarrow T$ ;\nthe function\n $f$ \ninhabits the function type\n $S\\rightarrow\nT$ ,\nwhich can be thought as a map from any proof of proposition\n $S$ \nto a proof of proposition\n $T$ .\nThis embodies the notion that\n $T$ \ncan be shown to hold if\n $S$ \nholds, precisely what we mean by implication. \n \n \n $(a,b):\\Sigma_{x:A}B(x)$ ;\nthe pair term\n $(a,b)$ \nprovides some “witness”\n $a:A$ \nsuch that\n $B(a)$ \nholds. To be clear, there are two elements here: a proof in the usual\nsense above, with\n $b:B(a)$ \n(where\n $B(a)$ \nis a concrete proposition for a fixed\n $a$ \nthat is inhabited by a term\n $b$ ),\nas well as a provided\n $a:A$ \nsuch that we actually inhabit\n $B(a)$ .\nThe pair\n $(a,b)$ \nis therefore evidence that there exists some\n $x$ \nsuch that\n $B(x)$ \nholds, precisely what we mean by\n $\\exists a\\in A, B(a)$ . \n \n \n $f:\\Pi_{x:A}B(x)$ ;\nthe dependent function term\n $f$ \nprovides witness terms\n $x:A$ \nsuch that every\n $B(x)$ \nholds. We naturally think of dependent functions as maps with output\nterms with types that can depend on the input value. Therefore, the\nexistence of some inhabiting function\n $f$ \nimplies we’ve mapped from all input terms\n $x:A$ \nto output terms\n $y:B(x)$ .\nThat is, we effectively have a collection of pairs, each of\nwhich is similar to what we saw with\n $\\Sigma$ \ntypes, with witnesses and proofs for all parameterizations of\n $B(x)$ .\nThis is precisely what we mean by\n $\\forall a\\in A, B(a)$ . \n \n \n \n \n \n Order \n \n \n Lattice of types: from `Top` to `Bot` \n \n \n Order theory and hierarchies of types \n \n \nNotion of least general type:\n \n \n Note how, for a given term, we can say things like\n`c : a -> a` and `c : b`. The former is more\nspecific, characterizing it as a function from/to a particular type,\nwhile the latter…actually I don’t get this, not really. For starters,\nwe’re letting `a` and `b` be free type variables\nhere, so if I think about how I’d actually write them for any concrete\ntypes, including as a UQ type, I don’t know how these are both correct.\nWith Hindly-Milner,\nwe see some discussion around generics also being able to inhabit\nspecific typed variants (look at the Wikipedia discussion under “Type\norder”). But I don’t get this, since the term is a generic, at\nleast as I’m used to, which makes it a distinct thing even if it can\nappear to be a particular typed version. \n \n \n \n \n \n en.wikipedia.org/wiki/Simply_typed_lambda_calculus \n en.wikipedia.org/wiki/Lambda_calculus \n en.wikipedia.org/wiki/Lambda_calculus#Capture-avoiding_substitutions \n Free variables, binding \n Free/bound variables,\ngenerally \n en.wikipedia.org/wiki/Lambda_cube \n commons.wikimedia.org/wiki/File:Lambda_Cube_img.svg \n en.wikipedia.org/wiki/Dependent_type#%CE%A0_type \n en.wikipedia.org/wiki/Parametric_polymorphism \n en.wikipedia.org/wiki/System_F \n en.wikipedia.org/wiki/Type_variable \n en.wikipedia.org/wiki/Kind_(type_theory) \n en.wikipedia.org/wiki/Type_constructor \n www.lesswrong.com/posts/ccbsYSpTcTqXwukH8/basic-building-blocks-of-dependent-type-theory \n ncatlab.org/nlab/show/pure+type+system \n en.wikipedia.org/wiki/System_F#System_F%3C: \n www3.cs.stonybrook.edu/~cram/cse526/Spring20/Lectures/untyped-lambda.pdf \n en.wikipedia.org/wiki/SKI_combinator_calculus \n en.wikipedia.org/wiki/Combinatory_logic \n en.wikipedia.org/wiki/Hindley%E2%80%93Milner_type_system \n \n \n \n \n \n \nThis is a function from terms of type\n $A$ \nto “terms” of type\n $\\mathcal{U}$ ,\nwhere\n $\\mathcal{U}$ \nis actually a type universe whose “terms” are themselves types. So\n $B$ \nmaps from terms\n $a:A$ \nto types.\n $B$ \ncharacterizes a family of types if we think about it\nextensionally (not actually needed, but helps evoke how I’m thinking\nabout it at the moment), i.e., the type space that is the function’s\nimage.\n \n↩︎ \n \nAs far as I’ve seen, we just fundamentally take function types as given\nand never really need to throw some extra means of canonically\nspecifying regular functions, which I guess is why I’ve not seen this\nprior to dependent type systems. Nevertheless, I like looking at this as\nan alternative way to color how I think about functions.\n \n↩︎ \n \nHere I’m referred to notion\n $B^A$ \nto define function spaces, where\n $B$ \nis a codomain and\n $A$ \nis a domain. We can expand this to look just like our dependent product,\neven when\n $B$ \ndoes not dependent on any terms\n $x:A$ .\nIntuitively, we’re just saying we have a “copy” of\n $B$ \nfor every input\n $x:A$ ,\nand a point in the resulting product space is a choice of output for\nevery input (i.e., a function).\n \n↩︎ \n \n For example, we could call\n $(1, 2, 3)$ \na function in the context of an indexed family\n $f$ \nthat is the identity function.\n $f$ \nmaps a 0-indexed tuple position to the same value, and we associate with\nthat input the output value in that tuple position. So the function is\nexplicitly captured by \n $\\{(f(0), 1), (f(1), 2), (f(2), 3)\\} = \\{(0, 1), (1, 2), (2, 3)\\}$ \n \n↩︎ \n \nThis has been an annoying sticking point for me. I keep wanting to\nassign the whole space to each point\n $a\\in A$ ,\ni.e., producing a collection of pairs\n $(a, B(a))$ ,\nsince that feels like it might be just as effective. But that really is\njust another way to write the function\n $B$ \nitself (again, just its graph), and doesn’t “unpack” the items of each\n $B(a)$ \ninto a new, “flattened” space.\n \n↩︎ \n \nThis is a particular topic I’ve struggled with, one that has left me\nwithout a clear foundation for some time. When we say something like\n $\\lambda x.x : \\alpha\\rightarrow\\alpha$ ,\n $\\alpha$ \nis serving as a “type variable,” allowing for any choice of\n $\\alpha$ .\nBut even though\n $\\alpha$ \nisn’t anything particular in the definition, it’s a placeholder for\nsomething that will be. In the polymorphic case,\n $\\alpha$ \nalso serves as a placeholder in the same way, but we include a term that\nexplicitly “loops” over every type. We bind that type variable\nwith the universal quantifier\n $\\forall\\alpha$ \nand make sure the term “handles them all,” explicitly. That is to say,\nit’s tied to none of them, and\n $\\alpha$ \nisn’t free the way it is in the monomorphic case. Bottom line: the\nmonomorphic type is defined around another fixed but unspecified\ntype\n $\\alpha$ \nand we’ll end up with a term that only works as a map from/to one fixed\ntype (we’ll have to supply a choice for\n $\\alpha$ \nwhen we instantiate). In the polymorphic case, that function is defined\nto work with all types; there’s not even a choice to be made for\n $\\alpha$ \nduring instantiation, since the function should be able to operate under\na term of any type that gets passed in.\n \n↩︎ \n \nThis got me (however counter-intuitively) questioning the notion of\ntyping at all. What is the rule that is typing itself? We seem\nto take the notion of typing as a given, like an action that’s baked in.\nBut why, and fundamentally what are we left with when it’s not present?\nAfter flailing for a minute, I realized we just get back to untyped\nlambda calculus. I’ve been so completely focused on types for so long\nthat I kind of forgot what it’d even mean not to have them. But turns it\nout it’s not all that confusing, really, and untyped settings can\ngenerally just be seen as special cases of typed ones (where there’s\nimplicitly just one type).\n \n↩︎ \n \n Maybe worth the explicit reminder that `fact` must know\nabout\n $G$ \nfor composition to have no effect.\n $G$ \ninjects a function\n $f$ \ninto its template structure, and the fixed point `fact` is\ncompletely built around that scaffolding. It’s not something strictly\n“pure” or independent of\n $G$ ,\nand\n $G$ \nis therefore powerless to change it: it’s the construct that\nembodies infinite application of\n $G$ ,\nand that’s why it has no effect. \n The\n $\\infty+1$ \nanalogy here is perfectly correct, but somehow even that often feels\nfinite, like you’re still stacking an extra item to what’s ultimately\njust a very tall pile. Therefore it doesn’t do a great job of offloading\nthe burden of understanding infinite composition; you have to break that\nannoyingly sticky finite thinking if you want to the fixed point to feel\nfamiliar. It’s the exact same analogy, but an infinitely extending\nspiral may feel a little nicer, or at least just a bit easier to\nvisualize: \n \n\n\n \nWhen we wind up an additional cycle (compose once more), we get the same\nstructure.\n \n↩︎ \n \n \n', 'toc': ' \n Lambda cube \n Typing\nmechanisms\n \n Dependent typing\n( $\\lambda\\Pi$ )\n \n $\\Pi$ -type \n $\\Sigma$ -type \n \n Polymorphic typing\n( $\\lambda 2$ ) \n Type constructors\n( $\\lambda\\underline{\\omega}$ ) \n \n Pure type\nsystems\n \n System U \n \n Subtyping \n Quantification\n \n Bounded quantification \n \n Encoding\ndatatypes\n \n Recursion \n Combinators \n \n Semantics\n \n Curry-Howard\ncorrespondence \n \n Order \n ', 'created': '2025-03-16 22:42', 'modified': '2025-09-10 02:34', 'summary': '', 'abstract': '', 'series': ''}, {'id': 10800, 'path': '/home/smgr/Documents/notes/Typed_lambda_calculus.md', 'rpath': 'Typed_lambda_calculus.md', 'name': 'Typed_lambda_calculus.md', 'title': 'Typed lambda calculus', 'link': 'Typed_lambda_calculus', 'ftype': 'md', 'ctime': '1757496869.29', 'mtime': '1757496869.29', 'atime': '1757496869.29', 'type': 'wiki', 'yaml_text': 'title: Typed lambda calculus\ncreated: 2025-03-16 22:42\nmodified: 2025-09-10 02:34\ndatelink: [[2025-03-16]]\ntype: wiki\nsummary: ', 'name_fmt': 'Typed_lambda_calculus.md+src', 'format': 'src', 'content': '', 'toc': '', 'created': '2025-03-16 22:42', 'modified': '2025-09-10 02:34', 'summary': '', 'abstract': '', 'series': ''}]
indexes	{'format': {'html5': {'id': 10799, 'path': '/home/smgr/Documents/notes/Typed_lambda_calculus.md', 'rpath': 'Typed_lambda_calculus.md', 'name': 'Typed_lambda_calculus.md', 'title': 'Typed lambda calculus', 'link': 'Typed_lambda_calculus', 'ftype': 'md', 'ctime': '1757496869.29', 'mtime': '1757496869.29', 'atime': '1757496869.29', 'type': 'wiki', 'yaml_text': 'title: Typed lambda calculus\ncreated: 2025-03-16 22:42\nmodified: 2025-09-10 02:34\ndatelink: [[2025-03-16]]\ntype: wiki\nsummary: ', 'name_fmt': 'Typed_lambda_calculus.md+html5', 'format': 'html5', 'content': '\n \n \n TODO: \n \n \n \n Dependent sum types \n \n \n \n Typed lambda calculi are type\nsystems with (anonymous) function abstraction from Lambda calculus. \n \n \n Perhaps worth noting that both (untyped) lambda calculi and type\nsystems are fundamentally isolated formal\nsystems. The latter generally wrap up the notion of typing, and the\nformer is canonically defined without the inclusion of any such\nmechanism (lambda terms are abstractions that apply to anything). Typed\nlambda calculi make up a class of type systems that “bring in” lambda\nterms axiomatically, along with the rules of\nintroduction/application/reduction that we formalize in the general,\nuntyped lambda calculus setting. \n \n \n Lambda cube \n \n The $\\lambda$ -cube\nis a framework that captures different binding behaviors in\ntype systems as “movements” along dimensions over a 3D cube. Each vertex\nof the cube yields a particular type system with the respective binding\nbehavior present as a typing rule, i.e., each system is associated with\na point in\n $\\{0,1\\}^3$ ,\nand a\n $1$ \nin a given dimension implies the presence of that typing rule. \n \n \n \n\n\n \n \n \n The dimensions correspond to different kinds of binding mechanisms\nbetween terms and types: \n \n \n \n $x$ -axis:\nintroduces dependent types, types that can depend on terms \n \n \n $y$ -axis:\nintroduces polymorphism, terms that can depend on types \n \n \n $z$ -axis:\nintroduces type constructors, types that can depend on (other)\ntypes \n \n \n \n Starting from the simply typed lambda calculus\n $\\lambda_\\rightarrow$ \nat\n $(0,0,0)$ , \n \n \n \n We move along the\n $x$ -axis\nto\n $(1,0,0)$ \nto get\n $\\lambda\\Pi$ \n(also referred to as\n $\\lambda P$ ),\na first-order dependent type system. \n \n \n We move along the\n $y$ -axis\nto\n $(0,1,0)$ \nto get\n $\\lambda 2$ \n(also referred to as System F), a polymorphic type system. \n \n \n We move along the\n $z$ -axis\nto\n $(0,0,1)$ \nto get\n $\\lambda\\underline{\\omega}$ ,\na type system including type constructors. \n \n \n \n These represent type systems that introduce each of the typing\nmechanisms in isolation. Systems at other vertices are those with\ncombinations of these mechanisms. At\n $(1,1,1)$ ,\nwe reach the calculus of constructions\n $\\lambda\nC$ ,\nwhere types and terms can depend on types and terms. We dive into each\nof these systems and their respective typing mechanisms below. \n \n \n \n Typing mechanisms \n \n The lambda cube captures a few common typing mechanisms. Below we\ndiscuss these in greater detail. Simply put, \n \n \n \n Dependent typing: types that depend on terms \n \n \n Polymorphic typing: terms that depend on types \n \n \n Type constructors: types that depend on types \n \n \n \n Dependent typing\n( $\\lambda\\Pi$ ) \n \n $\\Pi$ -type \n \n The type system\n $\\lambda\\Pi$ \nintroduces dependent typing, facilitated primarily by\nthe\n $\\Pi$ \ntype (the dependent\nproduct type). This type captures the notion of a function whose\nreturn value’s type (as well as the value itself, of course)\ncan vary with the argument value. That is to say, there’s no fixed\ncodomain; both the output term and its type are effectively\ndynamically selected by the value of the input. \n \n \n In particular, if\n $B: A\\rightarrow \\mathcal{U}$ ¹, the type of a dependent\nfunction can be written\n $\\prod_{x:A}B(x)$ ,\ncapturing the fact such a term will map to a type determined by\n $B(x)$ .\nFor example, if we write\n $\\text{Vec}(\\mathbb{R}, n)$ \nto represent a real-valued\n $n$ -tuple,\nthen\n $\\prod_{n:\\mathbb{N}}\\text{Vec}(\\mathbb{R}, n)$ \nis how we “parameterize” it as a dependent type. \n \n \n It’s worth noting that when\n $B:A\\rightarrow\\mathcal{U}$ \nis a constant function (all\n $a:A$ \nmap to the same type), the dependent product type acts exactly the same\nas the usual function type. While we don’t fundamentally use this\nnotation when defining function types², it clarifies (for me)\nthe use of notation seen elsewhere in mathematics³.\nWe’re saying here that the type\n $\\prod_{x:A}B$ \n(no dependence on\n $x$ \nin\n $B$ )\nis equivalent to\n $A\\rightarrow B$ ;\nthey’re two different ways to write the same thing. Further, this\nproduct type can be re-written explicitly in the usual sense as \n \n \n $B\\times B\\times\\cdots\\times B,$ \n \n \n but how is this the “type” of a function (since we’re claiming\n $\\prod_{x:A}B$ \nand\n $A\\rightarrow B$ \nare the same)? We can think of a concrete function as a long tuple where\neach tuple position corresponds to a particular input value in the\ndomain (captured as an indexed family, say) and “chooses” an output\nvalue corresponding to that input⁴. That is, we’re thinking\nof functions in the more explicit, extensional, binary relation sense. In any case,\nsuch a tuple can be seen as an element of the product space\nwritten above; the concrete tuple represents a concrete function term,\nand its type can be seen as a product of the types associated with each\noutput. All this to say: this isn’t a product space serving as a domain\nor codomain for some class of functions (which is how I originally got\nconfused here), but is itself the space/type of all functions,\nwhen we think of particular functions as tuples of output values. \n \n \n This lends itself back to the original context within which the\nnotation was introduced: to express dependent types. When\n $B$ \ndoes vary with\n $x:A$ ,\nwe can explicitly write our dependent function type as \n \n \n $B_{a_1}\\times B_{a_2}\\times B_{a_3}\\times\\cdots,$ \n \n \n where\n $a_i : A$ \nranges over all terms of type\n $A$ .\nThis is a dependent product, and captures the space of\npossible dependent functions: terms that themselves map\nfrom all inputs\n $a_i:A$ \nto terms\n $b_i:B(a_i)$ .\nHere we simply get the extra flexibility to specify (loosely) a\nparticular domain (type) for each input value (abstraction-bound\nvariable). We formalize the construction of such a term with the\nintroduction rule \n \n \n $\\frac{\\Gamma,x:A\\vdash B:}{\\Gamma\\vdash\\left(\\prod_{x:A}B\\right):}$ \n \n \n which is really a simple statement that formally recognizes\n $\\prod_{x:A}B$ \nas a type (with\n“ $:$ ”\nread as “is of type type”). To use the example from before,\n $\\prod_{n:\\mathbb{N}}\\text{Vec}(\\mathbb{R}, n)$ \ncould be the type of a function that maps from an integer\n $n$ \nto the unit vector of length\n $n$ .\nNote how\n $\\text{Vec}(\\mathbb{R}, m)$ \nis the type of a vector for a specific value of\n $m$ ,\nbut not any\n $m$ .\nSo creating a function that produces a vector with a variable length\n(dependent on the function’s input) can be formed via the notion of a\ndependent type. Note how the introduction of this typing mechanism\ndoesn’t formally increase the computational power* of the type\nsystem (untyped lambda calculus is Turing complete already), but instead\nfacilitates more logical expressiveness, e.g., how precisely we\ncan encode logical constraints over terms. Without dependent types, for\ninstance, we could still have a function that produces variable-length\nvectors, but from the perspective of statically verifying the logical\ncorrectness of further statements involving such functions, it becomes\nmuch more difficult (if not impossible) to make certain guarantees about\nthe value or type of involved terms. \n \n \n The type of dependent functions like this, in the general sense, is\nreferred to as a\n $\\Pi$ \ntype, often called the dependent function type (with\nexpansion below on the use of the term “dependent product type” from\nbefore). But System\n $\\lambda\\Pi$ \nalso includes a type that’s dual to\n $\\Pi$ \ntypes: the\n $\\Sigma$ \ntype, also called the dependent pair type. \n \n \n \n $\\Sigma$ -type \n \n Using similar notation from before, we write\n $\\Sigma$ \ntypes as\n $\\Sigma_{x:A}B(x)$ .\nThis captures the notion of ordered pairs where the second term’s type\ncan depend on the value of the first. That is, if\n $(a, b):\\sum_{x:A}B(x)$ ,\nthen\n $a:A$ \nand\n $b:B(a)$ :\nthe type of\n $b$ \ngets to depend on\n $a$ ’s\nvalue. As before, we can expand this and try to make sense of its full\nsyntactical implications: \n \n \n $\\sum_{x:A}B(x) = B_{a_1} + B_{a_2} + B_{a_3}+ \\cdots,$ \n \n \n where\n $a_i : A$ \nranges over all terms of type\n $A$ .\nRight away I find this a little confusing: I see arbitrarily many types\nbeing summed over (or for at least as many terms\n $x:A$ ),\nand yet this is the type of a pair of terms. With our dependent\nfunction type, our terms themselves were functions that have to\naccount for all possible inputs\n $x:A$ ,\nso it seemed justified that we’d need to have a representation for each\n $x:A$ \nin the associated type definition. But here, it’s a little surprising to\nsee we compose more than just two types in order to capture the pair\ntype. \n \n \n Intuitively, all that’s happening here is the\n $+$ \nis behaving like an “or” operator. Since\n $b$ ’s\ntype can depend on any term\n $a:A$ ,\nit does make sense that we’d need every possible choice in our type\ndefinition; it’s just that any term must “commit” to only one of them.\nSo we need them all like before, but while\n $\\times$ \nindicates joint involvement (our term will use them all),\n $+$ \ncaptures the notion of a choice among them (our term will only\npick one, but it could be any of them). \n \n \n In some sense (and I do mean that vaguely), this is actually more\ngeneral than the\n $\\Pi$ \ntype. In that case, our type captures the idea of assigning an output\ntype to every possible input term, giving us terms (functions)\nthat must actually make such a decision for all inputs, i.e., assigning\na\n $B(a)$ \nfor every\n $x:A$ .\nBut here, we’re doing that exact same thing…just without the requirement\nto do it for all inputs at once. In a sense we “free up” the input as\nwell, and our terms get to pick whichever input (along with the\nassociated, dependent output type) they want, rather than needing to do\nit for all of them (doing it for all inputs leaves the only\nfreedom in the output side of things; the function doesn’t get to pick\nspecific inputs it should be defined over). So our pair is like any\nindividual input-output slice of a full dependent function. \n \n \n \nInterpreting\n $\\Sigma$ \ntypes as spaces\n \n \n We can sometimes think of types extensionally, representing spaces of\nall their possible constituent terms. Given the extra “degree of\nfreedom” we described above, it actually feels harder to shape that\ncorresponding space for\n $\\Sigma$ \ntypes, or at least to capture how we should think about it. In the\n $\\Pi$ \ntype setting, I know my function terms pick a type\n $B(a)$ \nfor every input\n $a$ \n(for some choice of\n $B$ ),\nand all my concrete function terms under that\n $B$ \nwill map any given input\n $a$ \nto a term\n $b : B(a)$ .\nI basically “stretch out” all those\n $B(a)$ \nand I’ve got my space’s structure: all terms in that space are maps with\nthe same output types when queried at the same inputs. The presence of\nthe “term conditioning” from\n $a$ \nsort of disappears in the implicit order of my\n $B(a)$ ;\nthe type\n $A$ \nis expressed implicitly through the family of\n $B(a)$ ’s.\nWe actually have to deal with terms of the type\n $A$ \nin order to “see” what types of\n $B$ \nwill be, so it makes sense that we can’t exactly have a term\n $a : A$ \nshow up in the type’s definition. For\n $\\Sigma$ \ntypes, we do a similar thing by packing in\n $B$ \ntypes in such a way that the dependence on terms\n $a : A$ \nis made implicit. But our space is effectively more locally\nvariable, and all of our terms don’t share the same array of output\ntypes like we saw with\n $\\Pi$ \ntypes (which acts as kind of a common thread, like a “stitch” in the\nspace that captures the common structure of terms). Instead, our terms\nare even more “tightly dependent” and sort of stubbornly dissimilar,\nmaking it hard to capture the structure of the common space all terms\nshare. So I resign myself to taking just that very structure: the\n(implicit) coupling of a term\n $a:A$ \nwith its paired type\n $B(a)$ ,\nand we bag all the possible concrete couplings up into a set like\n $\\{(a, b)\|a:A,b:B(a)\\}$ . \n \n \n In general, I think I let this be more a sticking point than it\nneeded to be, but my inability to let go of little confusions or\nfrustrations sometimes leads to greater insight. I may end up reusing\nthis outside this scope, but the point is that writing the “sum” \n \n \n $\\sum_{x:A}B(x) = B_{a_1} + B_{a_2} + B_{a_3}+ \\cdots,$ \n \n \n is not really all that accurate. It should be thought of as a\ndisjoint union, which explicitly lets us tag those types while\nbagging them all up like we want. That is, it just preserves the\nidentity of the types\n $B(a)$ \nusing the\n $a$ \nthat produced them, and those are the pairs that can make up our type\nspace. So more like \n \n \n $\\sum_{x:A}B(x) = (\\{a_1\\}\\times B(a_1)) + (\\{a_2\\}\\times B(a_2))+ \\cdots,$ \n \n \n This still leaves\n $B(a_i)$ \nas a type, but we can interpret that as a set in the generic way, i.e.,\n $\\{b \| b:B(a_i)\\}$ .\nThat gives you a formal set product\n $\\{(a_i, b) \| b:B(a_i)\\}$ \nfor each\n $a_i:A$ ,\nand when you let\n $+$ \nmean\n $\\cup$ ,\nyou get your big, fully expanded disjoint union. That is your formal\nset-based interpretation; it’s clear how we can dynamically expand only\nthe properly dependent pairs of values. \n \n \n So when I said this space feels like has less structure, I actually\nstill feel that’s an accurate depiction. Even if the little set-based\njumps we make to expand the space are easy to follow, it’s harder to\nhook into common structure between these objects. You have the paired\npart of them, yes, but beyond that we literally just tag dependent types\nexplicitly. It’s not some tightly wound, reduced structure, or even as\nlike rigid as the functions with\n $\\Pi$ \ntypes where we at least have all objects stretching across the input\nspace. Nope: we just slap a name tag on the types and put them in the\nbag, sort of the bare minimum to ensure they can be distinguished. And\nthat very process of tagging is really the only structure we\ncan even grab onto afterward, to group up “stubbornly dissimilar” terms\n(to reuse the phrase from earlier, which I again feel is accurate).\nThat, I suppose, is really just a consequence of great flexibility:\nwe’re literally defining a compositional type that is more or less an\nunimposing bag of other types. That’s as unstructured as you get when it\ncomes to wrangling collections of objects: you’re adding no new\nstructure or transforming things into a common shape. Perhaps that’s why\nit’s so slimy to me, so hard to just let be. It’s like I’m\nexpecting more to be there, more structure to understand, when there in\nactual fact is none, on purpose. \n \n \n \n Note the syntactic analogies to\naddition/multiplication/exponentiation, when the dependent function\n $B$ \nis constant: \n \n \n \nThe dependent pair\n $\\Sigma_{x:A}B$ \ncan also be written\n $A \\times B$ ,\nwhich can be interpreted in the usual product space sense, where to each\npoint in\n $A$ \nwe attach an instance of\n $B$ .\nTo be precise:\n \n \n When\n $B(x)$ \nvaries with\n $x:A$ ,\nwe can treat as as an indexed family of types/sets, indexed by\n $A$ \n \n \n The dependent pair is then analogous to a disjoint union: \n $\\Sigma_{a:A}B(a) = \\bigsqcup_{a:A}B(a) = \\bigcup_{a:A}\\{(a,b)\|b:B(a)\\}$ \n \n \n When\n $B(x)=B$ \n(i.e., is constant, no dependence on\n $A$ ),\nthe disjoint union is simply to the usual Cartesian product: \n $\\Sigma_{a:A}B = \\bigsqcup_{a:A}B = A\\times B$ \n \n \n Note how\n $\\Sigma$ \nor\n $\\bigsqcup$ \nactually do some work in “breaking up” and “rearranging”\n $B$ ;\na set like\n $\\{(a, B(a)) \| a:A\\}$ \nthat might attempt to attach to each\n $a\\in A$ \nthe full type/space\n $B(a)$ \nis only half way to something useful (and is just the graph of\n $B$ ).⁵ \n \n \n \n \n\n\n \n \n \n \n The dependent product\n $\\Pi_{x:A}B$ \ncan also be written\n $B^A$ ,\nwhich can be interpreted as “multiplying” instances of\n $B$ \nfor each element/term of\n $A$ .\nAs mentioned above, when\n $B$ \nis static, this simply recovers the notion of an infinite product space\nhousing functions\n $f: A \\rightarrow B$ .\nOtherwise, the product is composed of conditioned spaces produced by\ninvoking\n $B(x)$ \nfor each\n $x: A$ ,\nand we usually don’t employ the compact representation\n $B^A$ \n(it doesn’t really capture the\n $B$ ’s\ndependence on\n $A$ ,\nbut could be implicitly understood). \n When we write out the product explicitly, we have a sequence of\nconcrete products like we saw with the dependent pair, where each\n $B(a)$ \nis a fixed set making up one part of a nested product: \n $\\begin{aligned}\n \\Pi_{x:a}B(x) &= B(a_1) \\times B(a_2) \\times \\cdots \\\\\n &= \\Sigma_{b_1:B(a_1)}\\left[\\Sigma_{b_2:B(a_2)}\\cdots\\left[\\Sigma_{b_{n-1}:B(a_{n-1})}B(a_{n})\\right]\\right]\n\\end{aligned}$ \n Here we see increasing levels of abstraction taking place with\nrepeated application, again analogous to multiplication representing\nrepeated addition, exponentiation representing repeated multiplication,\netc. \n \n \n \n (See also: Building\nblocks of dep. type theory) \n \n \n \n \n Polymorphic typing\n( $\\lambda 2$ ) \n \n Polymorphic types allow term definitions to depend on a\nspecific type. For example, instead of saying something like \n \n \n $\\lambda x.x:\\alpha\\rightarrow\\alpha,$ \n \n \n which can be interpreted as an identity function mapping from/to\nterms of type\n $\\alpha$ \n(for a particular choice of of\n $\\alpha$ ),\nwe instead have \n \n \n $\\lambda x.x:\\forall\\alpha.\\alpha\\rightarrow\\alpha,$ \n \n \n which defines an identity function polymorphic in\n $\\alpha$ ,\nmeaning any (every) choice of\n $\\alpha$ \nis allowed. For the former, the type\n $\\alpha\\rightarrow\\alpha$ \ndescribes a monomorphic “family” of types since\n $\\alpha$ \ncan be anything. But the term will always involve a particular\nchoice, e.g., be the identity function from/to terms of\n`int` or `bool` types. In the polymorphic case, we\nnever require such specificity: the identity function must work for\nall types (as in, we could pick any one of them). This is kind\nof like saying we get one term that is “type aware,” whereas in the\nmonomorphic case we’d actually have to define multiple (separate) terms\nif we wanted to work with different choices of\n $\\alpha$ .⁶ \n \n \n System\nF is a second-order typed lambda calculus,\nformalizing parametric\npolymorphism. In simply typed lambda calculus, this isn’t exactly a\nnative element of the type theory, so universally quantified type\nvariables are formalized. For example, in the case of our identity\nfunction, we have \n \n \n $\\vdash \\Lambda\\alpha. \\lambda x^\\alpha. x : \\forall\\alpha.\\alpha\\rightarrow\\alpha,$ \n \n \n where\n $\\alpha$ \nis a type variable, and\n $\\Lambda$ \nrefers to a type-level function. This is a judgment that says, as a\nfunction of type\n $\\alpha$ ,\nthe identity function of a bound variable\n $x$ \nof type\n $\\alpha$ \nhas type\n $\\alpha\\rightarrow\\alpha$ .\nThe entire left-hand expression, including the outer\n $\\Lambda$ \nterm, is of the single “most general type,” involving a universally\nquantified type variable,\n $\\forall\\alpha.\\alpha\\rightarrow\\alpha$ .\nIt is the most general in that every other type we might feasibly assign\nto the term can be arrived at via substitution of this one, i.e., are\nless general. \n \n \n In System F, we actually formalize the universally quantified type\nvariable as a single type by a typing rule: \n \n \n $\\frac{\\Gamma,\\alpha\\;\\text{type}\\vdash\nM:\\sigma}{\\Gamma\\vdash\\Lambda\\alpha.M:\\forall\\alpha.\\;\\sigma}$ \n \n \n Here we’re saying, when\n $M$ \nis of type\n $\\sigma$ \nand we take\n $\\alpha$ \nto be a type variable, then\n $\\Lambda\\alpha.M$ \nis of type\n $\\forall\\alpha.\\;\\sigma$ .\nNotice how it’s not fundamentally a function type or anything; this is\nus actually defining a new type and its syntax, axiomatically.\nThe quantifier\n $\\forall$ \nis not purely symbolic, however; it is actually signifying\nquantification over all types \n \n \n We also have the rule describing the application of\n $\\Lambda$ \nterms: \n \n \n $\\frac{\\Gamma\\vdash M:\\forall\\alpha.\\;\\sigma}{\\Gamma\\vdash M\\tau :\n\\sigma[\\alpha:=\\tau]}$ \n \n \n which just says how when a term\n $\\tau$ \nis applied to a polymorphic term\n $M$ ,\nit’s as if we’re instantiating the whole template under that type, and\nreplace any dependence on\n $\\alpha$ \nin\n $\\sigma$ \nwith\n $\\tau$ .\nBringing back the identity example from above, we have \n \n \n $\\vdash (\\Lambda\\alpha. \\lambda x^\\alpha. x)(\\text{Bool}) : \\text{Bool}\\rightarrow\\text{Bool}$ \n \n \n $\\text{Bool}$ \nis applied to the\n $\\Lambda$ \nterm, effectively: \n \n \n \n Stripping off the outer\n $\\Lambda\\alpha$ \n \n \n Binding\n $x$ \nin\n $\\lambda$ \nto type\n $\\text{Bool}$ \n(although the RHS type signature really does this as well) \n \n \n By the application rule, for the type side we strip off\n $\\forall\\alpha$ \nand replace the references in term (which was\n $\\alpha\\rightarrow\\alpha$ )\nwith\n $\\text{Bool}$ \n \n \n \n So we effectively end up with the term\n $\\lambda\nx.x:\\text{Bool}\\rightarrow\\text{Bool}$ .\nWe can think of the polymorphic term as a producer of a family\nof such concrete terms; the polymorphic term itself is of a\nsecond-order, producing a first-order, monomorphic term after\napplication. \n \n \n (See also: Type\nvariable) \n \n \n \n Type constructors\n( $\\lambda\\underline{\\omega}$ ) \n \n In System\n $F \\underline{\\omega}$ ,\nwe introduce type construction. This is a mechanism that\neffectively brings types into the term space, loosely speaking: types\nare allowed to be part of terms, e.g., taken as inputs to a function.\nSuch terms themselves have types, which we call kinds. \n \n \n Type\nconstructors facilitate the creation of new types from existing\nones. They can be formally seen as\n $n$ -ary\ntype operators, returning a type from\n $n$ \ntypes passed as arguments. Note that we typically write this in a\ncurried manner as needed, like other functions of many variables. As\nsuch, type operators can be seen as functions in a higher-order type\ntheory (namely a simply typed lambda calculus) with one basic type\n $$ ,\nrepresenting the type of all types in the underlying language. \n \n \n For instance, if\n $s: \\sigma$ ,\nthen\n $\\sigma: $ .\nBoth\n $\\sigma$ \nand\n $$ \nare types, albeit at different “levels;” we generally distinguish them\nas “proper types” and “kinds,” respectively. \n \n \n Kinds are effectively declarations of type constructor arity\n(rather than the “type of a type”). Given just a single type in the\n“kind system” is allowed, first-order type operators are simply curried\nfunctions of proper types and look like\n $ \\rightarrow \\cdots \\rightarrow $ : \n \n \n \n $$ \nis the kind of all proper types (also called “nullary types,” i.e., the\nconstant result of a type constructor with no inputs). This is\npronounced “type,” e.g.,\n“ $\\sigma:$ \nindicates\n $\\sigma$ \nis of type type.” \n To be clear, even though\n“ $$ ”\nmeans “type,” it is part of a kind system. We say\n“ $\\text{nat}$ \nis a type” (judgment\n“ $\\vdash \\text{nat Type}$ ”\nin the base-level type theory) and\n“ $$ \nis a kind” (judgment\n“ $\\vdash\n\\;\\text{Type}$ ”\nin the higher-level type/kind theory). \n \n \n $\\rightarrow $ \nis the kind of a unary type constructor, such as that of a list\ntype, e.g.,\n $\\text{list} a: \\rightarrow $ \nwhere\n $a$ \nis the type of the list elements. \n \n \n $\\rightarrow \\rightarrow $ \nis the kind of a binary type constructor, e.g., the function type\nconstructor. That is, something like\n $(\\lambda\\sigma\\tau.\\sigma\\rightarrow\\tau) : \\rightarrow \\rightarrow $ \nas a judgment the kind system (although I don’t know if we really ever\nembrace such an explicit formation given functions are usually\nprimitives, but I think it demonstrates the point). \n \n \n \n A higher-order type constructor is one that itself maps from a type\nconstructor to a proper type. This might have the kind\n $(\\rightarrow )\\rightarrow $ ,\nfor instance, and helps describe the kind* of more complex\nconstructs like monads. \n \n \n Note that we used\n $\\lambda\\omega$ \nto denote the type system including both type constructors and\npolymorphism. \n \n \n \n \n Pure type systems \n \n Pure type systems are typed lambda calculi that include an\narbitrary number of sorts and dependencies between them. This\ngeneralizes the lambda cube, from which perspective we can view its\n“corners” as instances of pure type systems with just two sorts. \n \n \n In particular, a pure type system is a triple\n $(\\mathcal{S}, \\mathcal{A},\n\\mathcal{R})$ : \n \n \n \n $\\mathcal{S}$ \nis the set of sorts \n \n \n $\\mathcal{A}\\subseteq\\mathcal{S}^2$ \nis the set of axioms; an axiom\n $(s_1,\ns_2)$ \nis a foundational statement that\n $s_1:s_2$ \n \n \n $\\mathcal{R}\\subseteq\\mathcal{S}^3$ \nis the set of rules \n \n \n \n Pure type systems “break down” a wall that previously wasn’t really\nvisible; simply typed lambda calculus, for instance, is fundamentally\nconstructed around terms and types, and we don’t really question those\ntwo “classes” of items. Pure type systems open that up, extending the\nnotion of typing (allowing lower-order terms to be classified by\nhigher-order terms) to more expansive hierarchies⁷. \n \n \n (See also: Pure type system\n(nLab)) \n \n \n System U \n \n \n \n Subtyping \n \n Subtyping captures a notion of substitutability\nbetween types. If\n $S$ \nis a subtype of\n $T$ ,\nwe typically write this\n $S <: T$ ,\nimplying that\n $S$ \ncan be used in place of\n $T$ \nin any context where\n $T$ \nis expected. Type systems like System\n $F_{<:}$ \nformalize this properly (e.g., with subtype judgments and associated\nrules), but from what I can tell, most resources stay away from the\nweeds and discuss record types and LSP. \n \n \n More formally, subtype applicability is determined by\nsubsumption. For a type\n $\\mathrm{T}$ ,\nwe generally adopt an extensional view\n $\\mathbf{T}$ \n(the set of all and an intensional view\n $\\mathit{T}$ … \n \n \n …leaving off here. Taking some issue with the int/ext views of types,\nsince it’s more set-theoretic. Want to get this right, mix in LSP,\ncontrast with inheritance, and maybe get a clear link out to a relevant\npage on func prog. \n \n \n -> Then do universally/exist. quant types (below); I feel like I\nstill have yet to nail this down. Then do the semantics bit, equational\ntheory. This last piece should wrap us around to algebraic DTs. Then\nmaybe hit combinators, finally monoids and some Haskell examples. \n \n \n -> Now back to types as sets with the paper \n \n \n \n Quantification \n \n Quantified types facilitate greater expressive power, appealing to\nhigher level typing mechanisms. Universal\nquantification enables polymorphism through generic types, and\nexistential quantification yields abstract data types\n(through information hiding). Together, they facilitate parametric\ndata abstraction. \n \n \n Generally speaking, quantification is a means of\nabstraction, and is facilitated by binders. Binding\noperators act on variables of a certain kind, and place (bind) them\ninside a particular context. That context is made abstract as a\nresult: an expression is produced that isn’t tied to any particular\nconcrete value. The resulting abstraction can therefore be seen\nas a generic template, and is accompanied by a notion of\napplication, wherein concrete values can be supplied to “fill\nin” the template and make the entire expression concrete. Fundamentally\nwe’re introducing a higher level means of operation, a construction\nabove terms, a producer of terms. Lambda calculus is so\npowerful because we let this higher level construct be a term\nitself; it doesn’t exist strictly in a higher plane but is\n“flattened back out” to the same level as concrete terms. This enables\nabstractions to be arbitrarily nested (terms can be abstractions, and\nabstractions are defined over terms), and ultimately represent all\ncomputable functions (lambda calculus is Turing complete). \n \n \n In the case of function abstraction,\n $\\lambda$ \nbinds term variables:\n $\\lambda\nx. c(x) : A\\rightarrow B$ \nbinds a term\n $x:A$ \nin a context\n $c(x):B$ .\nAs we’ve already seen with polymorphic typing in\n $\\lambda 2$ ,\ntype-level functions are introduced with the\n $\\Lambda$ \noperator, which binds type variables in term space. Resulting\nterms are generic functions\n $\\Lambda\\alpha. \\lambda\nx^\\alpha. c(x)$ ,\nand have a universally quantified type\n $\\forall\\alpha.\\alpha\\rightarrow\\alpha$ .\nThis is fundamentally a new kind of term, i.e., functions that can\neffectively be parameterized by a type, and we’ve got a new type\ndefinition to go along with it. \n \n \n Existentially quantified types are a bit more nuanced. For starters,\nexistential quantification isn’t accompanied by a fundamentally new kind\nof binding operator in term space. Instead, it’s purely a mechanism for\nabstraction in type space, typically used to hide\ncertain type-level specifics. The statement\n $x : \\exists\\alpha. t(\\alpha)$ \nallows us to specify\n $x$ \nonly in terms of an exposed interface\n $t$ ,\ni.e., some higher level type structure that, when parameterized by\nsome type\n $\\alpha$ ,\nwill represent\n $x$ ’s\nreal type. The whole point is that we can get away with not knowing or\nspecifying\n $\\alpha$ ,\nand the binding operator\n $\\exists$ \nencapsulates that detail behind the visible structure seen in\n $t$ .\nSuch a type is clearly amenable to high-level specifications of abstract\ndata types: we can declare terms that verifiably observe certain type\nsignatures and are invariant to any involved concrete types. Note that\nterms with existentially quantified types are introduced and eliminated\nwith `pack` and `unpack` operators, respectively.\nThese are fundamentally different than an operator like\n $\\Lambda$ ;\nthe latter introduces an irreducible new primitive (type abstraction),\nwhereas `pack`/`unpack` are more like rules for\nworking with existentially quantified terms (and can be fundamentally\nrepresented via universal quantification). \n \n \n While the notions of universal and existential quantification are\nsimilar in that types are being abstracted over, they say very different\nthings. Informally,\n $x : \\exists\\alpha. t(\\alpha)$ \nlets us say that\n $x$ \nhas type\n $t(\\alpha)$ \nfor some particular\n $\\alpha$ .\nThis is quite different from\n $x :\n\\forall\\alpha. t(\\alpha)$ ,\nwhich suggests that\n $x$ \nis an abstraction that can handle all types\n $\\alpha$ .\nIn the former,\n $x$ \nhas no such “awareness” of choosing a type for\n $\\alpha$ ;\na specific choice of\n $\\alpha$ \nis involved but we’re blind to its actual value, suggesting it could be\nany one choice. In the latter,\n $x$ \nis explicitly a generic construct that can handle all choices\nof\n $\\alpha$ .\nThere’s no notion of “plugging in” some particular choice of\n $\\alpha$ \nto get\n $x$ ’s\n“true” type; instead,\n $x$ \nis explicitly the thing (a generic function) that can handle any choice\nfor\n $\\alpha$ . \n \n \n More formally, it’s important to recognize that both universal and\nexistential quantification do ultimately represent new, singular types.\nThey’re not just loose characterizations of families of types or\npossible type choices (as we might be inclined to interpret them), but\nformal types with typing rules, just like everything else. I often find\nmyself forgetting this since\n $\\exists$ \nand\n $\\forall$ \nlook to be loose wrappers on other types; it’s tempting to just think of\nthem extensionally, as possible values for the types they “wrap.” This\nis perhaps a perfectly reasonable and intuitive thing to do (encouraged,\neven), but they are nevertheless still themselves considered formal\ntypes. \n \n \n Bounded quantification \n \n Bounded quantification is quantification (universal\nor existential) with subtyping. \n \n \n \n \n Encoding datatypes \n \n The power of lambda calculus becomes quickly apparent when we discuss\nrepeated application of higher-order abstractions. One way to begin\nformalizing this is via Church encoding, and the Church-Turing\nthesis shows that any computable operator can be represented under this\nscheme. \n \n \n Recursion \n \n When defining recursive functions, we might reach for a familiar\nself-referential form. For example, with the factorial function: \n \n `fact = fun (n: Int) if n=0 then 1 else n * fact(n-1)` \n \n As a term, this is improperly defined: `fact` refers to\nitself before it is fully defined. We can imagine needing to parse a\ndefinition left to right before the LHS identifier is considered valid,\nbut we encounter `fact` before we get all the way through, at\nwhich point `fact` is not a valid alias. In any case, we do\nthis all the time in most programming languages, it’s just that it’s a\nconvenient expression for something more formal. That something “lifts”\nthe function into a functional: \n \n `fact = λf. λ(n: Int). if n=0 then 1 else n * f(n-1)` \n \n This is not intrinsically recursive: it is a function that takes\nanother function `f` as input, embeds it in another function,\nand returns that. That is, something like the general form\n`λf. λx. t(f(s(x)))`, which says `f` is a function\nthat operates on a term `s(x)`, and `t` is some\ntransformation of `f` after it’s applied to\n`s(x)`. For a quick example, \n \n -- let s just be the identity function\ns = λx. x\n\n-- let t(f, x) mirror the factorial setup\nt = λf. λx. xf(x-1)\n\n-- so our general term:\nλf. λx. t(f(s(x)), x)\n\n-- turns into\nλf. λx. xf(x-1)\n\n-- and then if we define some function g\ng = λy. yy\n\n-- we can apply our form to it like\n( λf. λx. xf(x-1) )( g )\n-> λx. xg(x-1)\n-> λx. x((x-1)(x-1))\n\n-- this final term is a function which we can reapply as a "new g"\nh = λx. x((x-1)(x-1))\n\n-- on the left is our general term, apply to h\n( λf. λx. xf(x-1) )( h )\n-> λx. x( (x-1)( (x-2)(x-2) ) ) \n \n What we’re seeing here is that each repeated application of the\nfunctional – using the last output function as input to the functional,\nproducing the next function – approximates part* of the full\nrecursive behavior. The last term above just accounts for two recursive\nsteps. That is, it’s like we just iteratively pack a snapshot of the\nfunction into itself one step at a time, and once we “call” the outer\nthing for some input `x`, we recursively apply the function\nas usual (by virtual of the fact we’ve already packed in all the nested\ncalls). The point is that such a term actually explicitly packs\nthe function unwrapping into a term so we don’t run into the definition\nissues, i.e., the usual implicit syntactic sugar we get away\nwith in most programming languages. All of that function composition\nhappens at runtime, dynamically, rather than being one huge\ncompositional thing we’ve pre-expanded and can evaluate in one go. The\nlast term in the above code block represents such an explicit term after\ntwo manual composition steps, which is clearly only the beginning of the\nlengths we’d need to go to get the full recursive thing (spoiler: we\nneed to do this arbitrarily many times). \n \n \n Note that this isn’t even the factorial function (I chose\n`g(x)=xx`), but if we map this onto that setting, the\nequivalent would be a final two-step function\n`λx. x(x-1)(x-2)`. You can see how `x` is\nallowed to be any value (assuming `x: Int`) which doesn’t\nguarantee we actually “recurse to a base case.” If we plug in\n`x=10`, for instance, we just get `1098`, which\nis not* `10!` (actually we do not get a\nconvergent output whatsoever if we don’t reach the base case…more on\nthat below). Typically, full recursive behavior is allowed to recurse\narbitrarily many times until a base case is reached. But to ensure we\ncan even do that for arbitrary inputs `x`, we need to have\npacked in arbitrarily many recursive steps in our expanded term. How can\nwe possibly do that, given `x` itself may be arbitrarily\nlarge? \n \n \n Our full recursive function is therefore equivalent to the\nlimit of this process, i.e., the thing we approach as we repeat\ncomposition an arbitrary number of times. With more and more nested\ncomposition, we get closer to the thing, and in the limit, we\nembody the notion of infinite nested composition. Any\nfurther composition therefore changes nothing: that thing is\nalready a fixed point. More on this below. \n \n \n Because I’ve lost myself several times as I work this out, I think it\nmay be helpful to spell things out slowly while I’m here in the weeds.\nThis is almost entirely to help me now (the concept really is that\nslippery), but will hopefully be good starting point if I find myself\nback in this place in the future. Here’s how I’m seeing things now, step\nby step, rehashing many of the points above: \n \n \n \n You’re trying to define the factorial function, say in a usual,\n“intuitive,” implicitly recursive way. You start with \n `λ(n: Int). if n=0 then 1 else n * f(n-1)` \n which establishes your base case and the recursive dependence on the\nsubproblem for `n-1`, captured by calling yourself as\n`f`. But here we’ve merely used `f` to stand\nin for our function…how are we meant to actually get ourselves\nsqueezed into `f`? This sort of already feels like\nthe whole recursive function as we’d want to define it (in a common\nprogramming language). The problem is I can’t formally refer to this\nterm in itself in the usual way. I explained this pretty\nstraightforwardly above: there’s no way for `f`, the name we\nwant to use to refer to the whole outer term, to be made available\ninside its own scope before that scope is even defined. It\nsimply doesn’t make sense; there is no `f` at the time we\nwant to use it. \n \n \n So what do we do? Again, this signature already feels kind of right:\nwe want our recursive function to take an integer as input. So we want\nto preserve that element of the structure without bulking up the term\nper se, but we need to add something in order to facilitate a\nmeaningful self-reference as `f` inside the term. We can do\nthis by parametrizing `f` like this: \n `λf. λ(n: Int). if n=0 then 1 else n * f(n-1)` \n This gives us a functional term where we now need to first supply a\nfunction `f` in order to “get back” to our expected\nsignature. So this thing is not our desired function, but it’s a\nmechanism to build it. To be clear, the question we now are\ntrying to answer is how we can even take a “snapshot” of our function\nand use it as `f`. This is where I really encountered that\nfirst feeling of limbo: I didn’t have a great sense for what we could\neven grab onto to put there. \n \n \n Where to go from here? How to set `f`? We take the above\nform and some first, “lowest” function\n $\\bot$ \n(a canonical `undefined` that effectively “nukes” the output\nif reached rather than the base case) and apply it, giving us \n `f0 = λ(n: Int). if n=0 then 1 else n * ⊥(n-1)` \n So `f0` is back to the direct function signature we’re\nafter, having chosen a particular `f` to “embed.” We can then\nrepeat this: \n `f1 = λ(n: Int). if n=0 then 1 else n * f0(n-1)` \n and so on, where `f<n>` represents a\npartial factorial function that includes `n` nested\ncompositions. Note that `f<n>` is only well-defined for\nintegers `0` to `n` given our function; for inputs\ngreater than `n`, we never reach the base case and diverge\n(becoming `undefined`; we’ll basically end up making a call\nlike `⊥(x)` for some `x > 0`, blowing\neverything up). Additionally note that the `n` in\n`f<n>` only controls the depth of composition rather\nthan directly restricting the input to the underlying function (I’ve had\nthe tendency to interpret it as like improving the “coverage” of our\ninput space, and while in this case it sort of does that indirectly, in\ngeneral it does not determine the kind of inputs that produce concrete\noutputs). \n \n \nWith the above mechanism, we can let `n` tend toward\ninfinity, which will step us ever closer to recovering an arbitrarily\ndeep capacity for recursion. With some finite `n`,\nhowever, we can always supply inputs `>n` that cause our\noutput to diverge in the function `f<n>` (i.e., never\nhit the base case), which differs from a true, fully dynamic recursive\ndefinition. We therefore look toward the limiting term, the\nterm that can recurse dynamically. Such a term therefore\ncannot change under any further applications of composition (it\nalready embodies a notion of “infinite composition;” to do it again\nwould be like trying to take\n $\\infty+1$ .\nPut another way, the two mirrors are fully snapped parallel, and you\nhave a term that simply never deals in finite levels of composition: you\ncan’t “unravel” it with a finite number of applications), and is thus a\nfixed point.\n \n \n We explain a bit more explicitly below, but it’s as if we’ve produced\n`f∞`. The internal reference to itself can’t be\nsomething finite; if you tried, similar to our above point, you’d have\nsomething like `f<∞-1>`, which is just `f∞`\nagain. So it literally has its full self inside itself: the\ninternal thing isn’t smaller, and the outer composition isn’t larger. As\nconfusing as it is (certainly given our construction from increasingly\nlarge finite composition steps, where `f<n>` is indeed\nlarger than its internal use of `f<n-1>`), they’re\nthe same. \n \n \n \n \n To be clear about what we mean by fixed point: we’re saying that our\ntrue, fully recursive factorial function is the fixed point of the\nfunctional term \n `G = λf. λ(n: Int). if n=0 then 1 else n * f(n-1)` \n That is, this “builder term” `G` has our target recursive\nfunction as a fixed point. If our target function is called\n`fact`, we are therefore saying that\n`G(fact) = fact`: `fact` is the thing that\nalready fully embeds/references itself. You cannot “squeeze”\nanother level of composition into that `f` reference by\ncalling `G` again (like we did above, in sequence, for finite\nlevels). \n If this still feels slimy (and it certainly is for me; I literally\nfind myself able to get it one second and lose it the next), go ahead\nand actually perform the application: \n `G(fact) = λ(n: Int). if n=0 then 1 else n * fact(n-1)` \n What we’re saying is: when we embed our fully recursive function\n`fact` into `G`, the function we get out is just\n`fact` again. You can’t “outsmart” it or “outwrap” it, as\ncounterintuitive as it may be, since it feels you could always\ntake another composition that the term you’re using can’t be “aware of,”\nthat it can’t anticipate that you’re going to do it again. But no, it\ncan be aware of further composition and it’ll have no effect:\nit literally includes its full self. That final point\nis really the best characterization in my opinion, and if you just can’t\nbe satisfied with the formal argument, sit with that phrase until it\nsinks in.⁸ \n \n \n So how can we systematically get to this fixed point “right away,” as\nif we just defined things implicitly in the first place? The\nfixed-point, or Y,\ncombinator. This is a higher-order functional,\ni.e., a function taking a functional, and it returns the fixed\npoint function of that functional. We refer to this as\n $\\text{fix}$ \nor\n $Y$ .\nIn our above example, we’d say \n `fix G = fact\n\n-- or equivalently\nY G = fact\n\n-- and we have\nfix G = G(fix G) = G(G(fix G)) = ... = fact` \n We actually define\n $Y$ \nas \n $Y = \\lambda f. (\\lambda x. f(x\\;x)) (\\lambda x. f(x\\;x))$ \n When we apply\n $Y$ \nto some functional\n $g$ ,\nwe can expand to find\n $Y g = g (Y\ng)$ .\n $Y$ \nis a construct that literally builds a fully recursive function out of\nthe wrapper\n $g$ .\nIntuitively, it basically unpacks the internal logic from\n $g$ \nand puts it back into itself (although I don’t feel that confident about\nit becoming anything too concrete; reducing the thing still looks like\n $Y\ng$ ). \n \n \n \n \n Combinators \n \n Combinators are simply closed\n $\\lambda$ -expressions.\nIn combinatory\nlogic, we capture the full power of lambda calculus, but\nwithout the notion of abstraction (or at least the ability to\nconstruct new abstractions). Instead, we can take a few closed,\naxiomatic abstractions (called combinators), and application can be used\nto construct certain kinds of new functions. Composition of these\ncombinators can then replicate any function from lambda calculus. This\nsimplification removes much of the complexity (although also the\nconvenience) of abstraction, and was originally introduced to eliminate\nquantified variables from other logic systems, effectively presenting an\nalternative means of capturing the same functionality from even more\nprimitive operations. This makes it more of an interesting demonstration\nrather than a practical choice for a language due to its verbosity; here\nthe SKI\nsystem is a prime example. \n \n \n \n \n Semantics \n \n Curry-Howard correspondence \n \n The Curry-Howard correspondence relates computer\nprograms to mathematical proofs. It is a formal link\nbetween typed lambda expressions and statements in mathematical logic,\nan isomorphism between the proof systems. The logic-to-type\ntheory analogs are as follows: \n \n \n \n Formula\n $\\iff$ \nType \n \n \n Proof\n $\\iff$ \nTerm \n \n \n Implication\n( $\\implies$ )\n $\\iff$ \nFunction\n( $\\rightarrow$ ) \n \n \n Conjunction\n( $\\land$ )\n $\\iff$ \nProduct type\n( $\\times$ ) \n \n \n Disjunction\n( $\\lor$ )\n $\\iff$ \nSum type\n( $+$ ) \n \n \n Universal quantification\n( $\\forall$ )\n $\\iff$ \nDependent product type\n( $\\Pi$ ) \n \n \n Existential quantification\n( $\\exists$ )\n $\\iff$ \nDependent sum type\n( $\\Sigma$ ) \n \n \n \n So we can craft types that are analogous to logical\nformulas/statements, and type inhabitance (i.e., creating a\nterm of the type) corresponds to a notion of proof for that\nlogical statement. Expanding this for the specific analogs above: \n \n \n \n $t:T$ ;\nthe proposition/type\n $T$ \nholds due to the proof/term\n $t$ \n \n \n $f:S\\rightarrow T$ ;\nthe function\n $f$ \ninhabits the function type\n $S\\rightarrow\nT$ ,\nwhich can be thought as a map from any proof of proposition\n $S$ \nto a proof of proposition\n $T$ .\nThis embodies the notion that\n $T$ \ncan be shown to hold if\n $S$ \nholds, precisely what we mean by implication. \n \n \n $(a,b):\\Sigma_{x:A}B(x)$ ;\nthe pair term\n $(a,b)$ \nprovides some “witness”\n $a:A$ \nsuch that\n $B(a)$ \nholds. To be clear, there are two elements here: a proof in the usual\nsense above, with\n $b:B(a)$ \n(where\n $B(a)$ \nis a concrete proposition for a fixed\n $a$ \nthat is inhabited by a term\n $b$ ),\nas well as a provided\n $a:A$ \nsuch that we actually inhabit\n $B(a)$ .\nThe pair\n $(a,b)$ \nis therefore evidence that there exists some\n $x$ \nsuch that\n $B(x)$ \nholds, precisely what we mean by\n $\\exists a\\in A, B(a)$ . \n \n \n $f:\\Pi_{x:A}B(x)$ ;\nthe dependent function term\n $f$ \nprovides witness terms\n $x:A$ \nsuch that every\n $B(x)$ \nholds. We naturally think of dependent functions as maps with output\nterms with types that can depend on the input value. Therefore, the\nexistence of some inhabiting function\n $f$ \nimplies we’ve mapped from all input terms\n $x:A$ \nto output terms\n $y:B(x)$ .\nThat is, we effectively have a collection of pairs, each of\nwhich is similar to what we saw with\n $\\Sigma$ \ntypes, with witnesses and proofs for all parameterizations of\n $B(x)$ .\nThis is precisely what we mean by\n $\\forall a\\in A, B(a)$ . \n \n \n \n \n \n Order \n \n \n Lattice of types: from `Top` to `Bot` \n \n \n Order theory and hierarchies of types \n \n \nNotion of least general type:\n \n \n Note how, for a given term, we can say things like\n`c : a -> a` and `c : b`. The former is more\nspecific, characterizing it as a function from/to a particular type,\nwhile the latter…actually I don’t get this, not really. For starters,\nwe’re letting `a` and `b` be free type variables\nhere, so if I think about how I’d actually write them for any concrete\ntypes, including as a UQ type, I don’t know how these are both correct.\nWith Hindly-Milner,\nwe see some discussion around generics also being able to inhabit\nspecific typed variants (look at the Wikipedia discussion under “Type\norder”). But I don’t get this, since the term is a generic, at\nleast as I’m used to, which makes it a distinct thing even if it can\nappear to be a particular typed version. \n \n \n \n \n \n en.wikipedia.org/wiki/Simply_typed_lambda_calculus \n en.wikipedia.org/wiki/Lambda_calculus \n en.wikipedia.org/wiki/Lambda_calculus#Capture-avoiding_substitutions \n Free variables, binding \n Free/bound variables,\ngenerally \n en.wikipedia.org/wiki/Lambda_cube \n commons.wikimedia.org/wiki/File:Lambda_Cube_img.svg \n en.wikipedia.org/wiki/Dependent_type#%CE%A0_type \n en.wikipedia.org/wiki/Parametric_polymorphism \n en.wikipedia.org/wiki/System_F \n en.wikipedia.org/wiki/Type_variable \n en.wikipedia.org/wiki/Kind_(type_theory) \n en.wikipedia.org/wiki/Type_constructor \n www.lesswrong.com/posts/ccbsYSpTcTqXwukH8/basic-building-blocks-of-dependent-type-theory \n ncatlab.org/nlab/show/pure+type+system \n en.wikipedia.org/wiki/System_F#System_F%3C: \n www3.cs.stonybrook.edu/~cram/cse526/Spring20/Lectures/untyped-lambda.pdf \n en.wikipedia.org/wiki/SKI_combinator_calculus \n en.wikipedia.org/wiki/Combinatory_logic \n en.wikipedia.org/wiki/Hindley%E2%80%93Milner_type_system \n \n \n \n \n \n \nThis is a function from terms of type\n $A$ \nto “terms” of type\n $\\mathcal{U}$ ,\nwhere\n $\\mathcal{U}$ \nis actually a type universe whose “terms” are themselves types. So\n $B$ \nmaps from terms\n $a:A$ \nto types.\n $B$ \ncharacterizes a family of types if we think about it\nextensionally (not actually needed, but helps evoke how I’m thinking\nabout it at the moment), i.e., the type space that is the function’s\nimage.\n \n↩︎ \n \nAs far as I’ve seen, we just fundamentally take function types as given\nand never really need to throw some extra means of canonically\nspecifying regular functions, which I guess is why I’ve not seen this\nprior to dependent type systems. Nevertheless, I like looking at this as\nan alternative way to color how I think about functions.\n \n↩︎ \n \nHere I’m referred to notion\n $B^A$ \nto define function spaces, where\n $B$ \nis a codomain and\n $A$ \nis a domain. We can expand this to look just like our dependent product,\neven when\n $B$ \ndoes not dependent on any terms\n $x:A$ .\nIntuitively, we’re just saying we have a “copy” of\n $B$ \nfor every input\n $x:A$ ,\nand a point in the resulting product space is a choice of output for\nevery input (i.e., a function).\n \n↩︎ \n \n For example, we could call\n $(1, 2, 3)$ \na function in the context of an indexed family\n $f$ \nthat is the identity function.\n $f$ \nmaps a 0-indexed tuple position to the same value, and we associate with\nthat input the output value in that tuple position. So the function is\nexplicitly captured by \n $\\{(f(0), 1), (f(1), 2), (f(2), 3)\\} = \\{(0, 1), (1, 2), (2, 3)\\}$ \n \n↩︎ \n \nThis has been an annoying sticking point for me. I keep wanting to\nassign the whole space to each point\n $a\\in A$ ,\ni.e., producing a collection of pairs\n $(a, B(a))$ ,\nsince that feels like it might be just as effective. But that really is\njust another way to write the function\n $B$ \nitself (again, just its graph), and doesn’t “unpack” the items of each\n $B(a)$ \ninto a new, “flattened” space.\n \n↩︎ \n \nThis is a particular topic I’ve struggled with, one that has left me\nwithout a clear foundation for some time. When we say something like\n $\\lambda x.x : \\alpha\\rightarrow\\alpha$ ,\n $\\alpha$ \nis serving as a “type variable,” allowing for any choice of\n $\\alpha$ .\nBut even though\n $\\alpha$ \nisn’t anything particular in the definition, it’s a placeholder for\nsomething that will be. In the polymorphic case,\n $\\alpha$ \nalso serves as a placeholder in the same way, but we include a term that\nexplicitly “loops” over every type. We bind that type variable\nwith the universal quantifier\n $\\forall\\alpha$ \nand make sure the term “handles them all,” explicitly. That is to say,\nit’s tied to none of them, and\n $\\alpha$ \nisn’t free the way it is in the monomorphic case. Bottom line: the\nmonomorphic type is defined around another fixed but unspecified\ntype\n $\\alpha$ \nand we’ll end up with a term that only works as a map from/to one fixed\ntype (we’ll have to supply a choice for\n $\\alpha$ \nwhen we instantiate). In the polymorphic case, that function is defined\nto work with all types; there’s not even a choice to be made for\n $\\alpha$ \nduring instantiation, since the function should be able to operate under\na term of any type that gets passed in.\n \n↩︎ \n \nThis got me (however counter-intuitively) questioning the notion of\ntyping at all. What is the rule that is typing itself? We seem\nto take the notion of typing as a given, like an action that’s baked in.\nBut why, and fundamentally what are we left with when it’s not present?\nAfter flailing for a minute, I realized we just get back to untyped\nlambda calculus. I’ve been so completely focused on types for so long\nthat I kind of forgot what it’d even mean not to have them. But turns it\nout it’s not all that confusing, really, and untyped settings can\ngenerally just be seen as special cases of typed ones (where there’s\nimplicitly just one type).\n \n↩︎ \n \n Maybe worth the explicit reminder that `fact` must know\nabout\n $G$ \nfor composition to have no effect.\n $G$ \ninjects a function\n $f$ \ninto its template structure, and the fixed point `fact` is\ncompletely built around that scaffolding. It’s not something strictly\n“pure” or independent of\n $G$ ,\nand\n $G$ \nis therefore powerless to change it: it’s the construct that\nembodies infinite application of\n $G$ ,\nand that’s why it has no effect. \n The\n $\\infty+1$ \nanalogy here is perfectly correct, but somehow even that often feels\nfinite, like you’re still stacking an extra item to what’s ultimately\njust a very tall pile. Therefore it doesn’t do a great job of offloading\nthe burden of understanding infinite composition; you have to break that\nannoyingly sticky finite thinking if you want to the fixed point to feel\nfamiliar. It’s the exact same analogy, but an infinitely extending\nspiral may feel a little nicer, or at least just a bit easier to\nvisualize: \n \n\n\n \nWhen we wind up an additional cycle (compose once more), we get the same\nstructure.\n \n↩︎ \n \n \n', 'toc': ' \n Lambda cube \n Typing\nmechanisms\n \n Dependent typing\n( $\\lambda\\Pi$ )\n \n $\\Pi$ -type \n $\\Sigma$ -type \n \n Polymorphic typing\n( $\\lambda 2$ ) \n Type constructors\n( $\\lambda\\underline{\\omega}$ ) \n \n Pure type\nsystems\n \n System U \n \n Subtyping \n Quantification\n \n Bounded quantification \n \n Encoding\ndatatypes\n \n Recursion \n Combinators \n \n Semantics\n \n Curry-Howard\ncorrespondence \n \n Order \n ', 'created': '2025-03-16 22:42', 'modified': '2025-09-10 02:34', 'summary': '', 'abstract': '', 'series': ''}, 'src': {'id': 10800, 'path': '/home/smgr/Documents/notes/Typed_lambda_calculus.md', 'rpath': 'Typed_lambda_calculus.md', 'name': 'Typed_lambda_calculus.md', 'title': 'Typed lambda calculus', 'link': 'Typed_lambda_calculus', 'ftype': 'md', 'ctime': '1757496869.29', 'mtime': '1757496869.29', 'atime': '1757496869.29', 'type': 'wiki', 'yaml_text': 'title: Typed lambda calculus\ncreated: 2025-03-16 22:42\nmodified: 2025-09-10 02:34\ndatelink: [[2025-03-16]]\ntype: wiki\nsummary: ', 'name_fmt': 'Typed_lambda_calculus.md+src', 'format': 'src', 'content': '', 'toc': '', 'created': '2025-03-16 22:42', 'modified': '2025-09-10 02:34', 'summary': '', 'abstract': '', 'series': ''}}}

blocks mode

Typed_lambda_calculus.md@para@10:1-10:6

Typed_lambda_calculus.md@listitem@12:1-13:1

Typed_lambda_calculus.md@para@14:1-15:47

Typed_lambda_calculus.md@para@17:1-24:25

Typed_lambda_calculus.md@para@27:1-32:13

Typed_lambda_calculus.md@figure@34:1-34:82

Typed_lambda_calculus.md@para@36:1-37:17

Typed_lambda_calculus.md@listitem@39:1-40:1

Typed_lambda_calculus.md@listitem@40:1-41:1

Typed_lambda_calculus.md@listitem@41:1-43:1

Typed_lambda_calculus.md@para@44:1-45:11

Typed_lambda_calculus.md@listitem@47:1-49:1

Typed_lambda_calculus.md@listitem@49:1-51:1

Typed_lambda_calculus.md@listitem@51:1-53:1

Typed_lambda_calculus.md@para@54:1-58:60

Typed_lambda_calculus.md@para@61:1-62:31

Typed_lambda_calculus.md@listitem@64:1-65:1

Typed_lambda_calculus.md@listitem@65:1-66:1

Typed_lambda_calculus.md@listitem@66:1-67:1

Typed_lambda_calculus.md@para@71:1-76:48

Typed_lambda_calculus.md@para@78:1-83:18

Typed_lambda_calculus.md@para@85:1-92:44

Typed_lambda_calculus.md@para@94:1-94:35

Typed_lambda_calculus.md@para@96:1-108:15

Typed_lambda_calculus.md@para@110:1-112:48

Typed_lambda_calculus.md@para@114:1-114:53

Typed_lambda_calculus.md@para@116:1-121:55

Typed_lambda_calculus.md@para@123:1-123:73

Typed_lambda_calculus.md@para@125:1-140:62

Typed_lambda_calculus.md@para@142:1-146:47

Typed_lambda_calculus.md@para@149:1-154:14

Typed_lambda_calculus.md@para@156:1-156:58

Typed_lambda_calculus.md@para@158:1-165:32

Typed_lambda_calculus.md@para@167:1-173:28

Typed_lambda_calculus.md@para@175:1-186:10

Typed_lambda_calculus.md@para@191:1-214:41

Typed_lambda_calculus.md@para@216:1-219:26

Typed_lambda_calculus.md@para@221:1-221:58

Typed_lambda_calculus.md@para@223:1-227:10

Typed_lambda_calculus.md@para@229:1-229:78

Typed_lambda_calculus.md@para@231:1-236:27

Typed_lambda_calculus.md@para@238:1-255:72

Typed_lambda_calculus.md@para@258:1-259:40

Typed_lambda_calculus.md@listitem@264:3-266:1

Typed_lambda_calculus.md@listitem@266:3-269:1

Typed_lambda_calculus.md@listitem@269:3-273:1

Typed_lambda_calculus.md@listitem@273:3-277:1

Typed_lambda_calculus.md@figure@278:3-278:105

Typed_lambda_calculus.md@listitem@261:1-279:1

Typed_lambda_calculus.md@listitem@279:1-300:1

Typed_lambda_calculus.md@para@301:1-301:54

Typed_lambda_calculus.md@para@304:1-305:42

Typed_lambda_calculus.md@para@307:1-307:41

Typed_lambda_calculus.md@para@309:1-310:67

Typed_lambda_calculus.md@para@312:1-312:55

Typed_lambda_calculus.md@para@314:1-323:55

Typed_lambda_calculus.md@para@325:1-329:8

Typed_lambda_calculus.md@para@331:1-331:87

Typed_lambda_calculus.md@para@333:1-340:68

Typed_lambda_calculus.md@para@342:1-343:34

Typed_lambda_calculus.md@para@345:1-346:64

Typed_lambda_calculus.md@para@348:1-353:30

Typed_lambda_calculus.md@para@355:1-355:69

Typed_lambda_calculus.md@para@357:1-358:24

Typed_lambda_calculus.md@para@360:1-363:28

Typed_lambda_calculus.md@para@365:1-365:97

Typed_lambda_calculus.md@para@367:1-367:61

Typed_lambda_calculus.md@listitem@369:1-370:1

Typed_lambda_calculus.md@listitem@370:1-372:1

Typed_lambda_calculus.md@listitem@372:1-375:1

Typed_lambda_calculus.md@para@376:1-379:80

Typed_lambda_calculus.md@para@381:1-381:32

Typed_lambda_calculus.md@para@384:1-387:63

Typed_lambda_calculus.md@para@389:1-395:38

Typed_lambda_calculus.md@para@397:1-399:34

Typed_lambda_calculus.md@para@401:1-404:48

Typed_lambda_calculus.md@listitem@406:1-414:1

Typed_lambda_calculus.md@listitem@414:1-417:1

Typed_lambda_calculus.md@listitem@417:1-423:1

Typed_lambda_calculus.md@para@424:1-426:80

Typed_lambda_calculus.md@para@428:1-429:31

Typed_lambda_calculus.md@para@432:1-435:21

Typed_lambda_calculus.md@para@437:1-438:15

Typed_lambda_calculus.md@listitem@440:1-441:1

Typed_lambda_calculus.md@listitem@441:1-443:1

Typed_lambda_calculus.md@listitem@443:1-444:1

Typed_lambda_calculus.md@para@445:1-450:17

Typed_lambda_calculus.md@para@452:1-452:42

Typed_lambda_calculus.md@para@457:1-462:40

Typed_lambda_calculus.md@para@464:1-466:46

Typed_lambda_calculus.md@para@468:1-470:77

Typed_lambda_calculus.md@para@472:1-475:43

Typed_lambda_calculus.md@para@477:1-477:44

Typed_lambda_calculus.md@para@480:1-484:14

Typed_lambda_calculus.md@para@486:1-500:11

Typed_lambda_calculus.md@para@502:1-510:53

Typed_lambda_calculus.md@para@512:1-529:75

Typed_lambda_calculus.md@para@531:1-542:59

Typed_lambda_calculus.md@para@544:1-552:68

Typed_lambda_calculus.md@para@555:1-556:11

Typed_lambda_calculus.md@para@560:1-563:47

Typed_lambda_calculus.md@para@566:1-567:65

Typed_lambda_calculus.md@para@573:1-579:34

Typed_lambda_calculus.md@para@585:1-589:51

Typed_lambda_calculus.md@para@620:1-635:54

Typed_lambda_calculus.md@para@637:1-647:52

Typed_lambda_calculus.md@para@649:1-654:37

Typed_lambda_calculus.md@para@656:1-661:14

Typed_lambda_calculus.md@listitem@663:1-680:1

Typed_lambda_calculus.md@listitem@680:1-697:1

Typed_lambda_calculus.md@listitem@697:1-723:1

Typed_lambda_calculus.md@listitem@735:3-743:1

Typed_lambda_calculus.md@listitem@723:1-743:1

Typed_lambda_calculus.md@listitem@743:1-774:1

Typed_lambda_calculus.md@listitem@774:1-801:1

Typed_lambda_calculus.md@para@803:1-815:20

Typed_lambda_calculus.md@para@820:1-823:49

Typed_lambda_calculus.md@listitem@825:1-826:1

Typed_lambda_calculus.md@listitem@826:1-827:1

Typed_lambda_calculus.md@listitem@827:1-828:1

Typed_lambda_calculus.md@listitem@828:1-829:1

Typed_lambda_calculus.md@listitem@829:1-830:1

Typed_lambda_calculus.md@listitem@830:1-831:1

Typed_lambda_calculus.md@listitem@831:1-833:1

Typed_lambda_calculus.md@para@834:1-837:15

Typed_lambda_calculus.md@listitem@839:1-840:1

Typed_lambda_calculus.md@listitem@840:1-844:1

Typed_lambda_calculus.md@listitem@844:1-851:1

Typed_lambda_calculus.md@listitem@851:1-859:1

Typed_lambda_calculus.md@listitem@862:1-863:1

Typed_lambda_calculus.md@listitem@863:1-864:1

Typed_lambda_calculus.md@listitem@866:3-877:1

Typed_lambda_calculus.md@listitem@865:1-877:1

Typed_lambda_calculus.md@note@901:7-906:55

Typed_lambda_calculus.md@note@908:7-912:58

Typed_lambda_calculus.md@note@914:7-920:73

Typed_lambda_calculus.md@note@922:7-938:66

Typed_lambda_calculus.md@note@940:7-948:77

Typed_lambda_calculus.md@note@950:7-955:36

Typed_lambda_calculus.md@note@957:7-962:23

Typed_lambda_calculus.md@note@964:7-982:15

Lambda cube

Typing mechanisms

Dependent typing (λΠ\lambda\PiλΠ)

Π\PiΠ-type

Σ\SigmaΣ-type

Polymorphic typing (λ2\lambda 2λ2)

Type constructors (λω‾\lambda\underline{\omega}λω​)

Pure type systems

System U

Subtyping

Quantification

Bounded quantification

Encoding datatypes

Recursion

Combinators

Semantics

Curry-Howard correspondence

Order

Lambda cube

Typing mechanisms

Dependent typing (λΠ\lambda\PiλΠ)

Π\PiΠ-type

Σ\SigmaΣ-type

Polymorphic typing (λ2\lambda 2λ2)

Type constructors (λω‾\lambda\underline{\omega}λω​)

Pure type systems

System U

Subtyping

Quantification

Bounded quantification

Encoding datatypes

Recursion

Combinators

Semantics

Curry-Howard correspondence

Order

Lambda cube

Typing mechanisms

Dependent typing\n(λΠ\\lambda\\PiλΠ)

Π\\PiΠ-type

Σ\\SigmaΣ-type

Polymorphic typing\n(λ2\\lambda 2λ2)

Type constructors\n(λω‾\\lambda\\underline{\\omega}λω\u200b)

Pure type systems

System U

Subtyping

Quantification

Bounded quantification

Encoding datatypes

Recursion

Combinators

Semantics

Curry-Howard correspondence

Order

Lambda cube

Typing mechanisms

Dependent typing\n(λΠ\\lambda\\PiλΠ)

Π\\PiΠ-type

Σ\\SigmaΣ-type

Polymorphic typing\n(λ2\\lambda 2λ2)

Type constructors\n(λω‾\\lambda\\underline{\\omega}λω\u200b)

Pure type systems

System U

Subtyping

Quantification

Bounded quantification

Encoding datatypes

Recursion

Combinators

Semantics

Curry-Howard correspondence

Order

Dependent typing ( $\lambda\Pi$ )

$\Pi$ -type

$\Sigma$ -type

Polymorphic typing ( $\lambda 2$ )

Type constructors ( $\lambda\underline{\omega}$ )

Dependent typing ( $\lambda\Pi$ )

$\Pi$ -type

$\Sigma$ -type

Polymorphic typing ( $\lambda 2$ )

Type constructors ( $\lambda\underline{\omega}$ )

Dependent typing\n( $\\lambda\\Pi$ )

$\\Pi$ -type

$\\Sigma$ -type

Polymorphic typing\n( $\\lambda 2$ )

Type constructors\n( $\\lambda\\underline{\\omega}$ )

Dependent typing\n( $\\lambda\\Pi$ )

$\\Pi$ -type

$\\Sigma$ -type

Polymorphic typing\n( $\\lambda 2$ )

Type constructors\n( $\\lambda\\underline{\\omega}$ )