Haskell Notice: Experimental page format This is a page from the latest site version, rolled out only on a select few preview pages (as of late 2025). Some things many be out of place or broken; sorry about that! Metadata Media type wiki created 2024-05-28 05:29 modified 2025-10-11 00:39 Images placeholder Sketches placeholder Contents Syntax groups Syntax definitions Polymorphic types User-defined types Type synonyms Functions Functions are non-strict Recursive functions Case expressions and pattern matching Case expressions Lazy patterns Non-strictness vs laziness Thunks Foreword: this page was written with the content in Typed lambda calculus fresh on my mind. Implicit references are made to more formal type theory concepts throughout, and this page may not be a great standalone accounting of the covered Haskell topics (I’ve confused myself re-reading this page without the lambda calculus terminology so top-of-mind, for instance). Syntax groups Expressions are the symbolic representations of values. This takes place at both the term and type level. -- expressions refer and evaluate to values -- here I refer to a value X as "value<X>" to separate it from a symbol -- representation of that value, include the symbol "X" itself -- this is at the *term* level 9+1 --refers to--> value<10> 10 --refers to--> value<10> -- this is at the *type* level Integer --refers to-> <space of integers> Integer -> Integer --refers to-> <space of functions from/to integers> Here we’re simply saying that expressions are the things we write down (a choice of symbols), while values are the objects to which those expressions refer (the more abstract thing “under the hood,” independent of representation). And this applies to any syntactical representation of categories of objects, so we make a distinction between this happening at a term and type level (but the expression-value “pattern” is present in either case). Equations associate patterns with expressions, specifically in term space. The expression here is therefore analogous to the usual notion of a term, and the pattern is more a reusable, functional alias to parametrize that term structure in various contexts. This is somewhat muddy: patterns are more than just names, but they aren’t fundamentally part of a function term. They are perhaps best thought of as syntactic sugar for case switching different behavior that we bundle up under the same alias, such that the alias behaves as “one function” (but endowed with case matching over its arguments in the intuitive sense). To hit on this point more (since it initially confused me): equations are not responsible for creating function terms in that they do not fundamentally facilitate abstraction. = effectively just attaches a term (RHS) to a name (LHS), but with the pattern matching over various signatures. A definition like -- equations are of the form <pattern> = <expression> factorial 0 = 1 factorial n = n * factorial (n - 1) simply breaks up the definition of a single function, factorial, into two cases based on the input. The original confusion arose due to terms appearing on the LHS, and so it feels as if = is the thing doing the “bridging” here to connect them. But = is not a replacement for λ in the usual lambda calculus sense; the latter is occurring implicitly on the RHS, and = merely attaches the resulting function to the name. To be clear: factorial n = n * factorial (n - 1) -- is semantically equivalent to factorial = \n -> n * factorial (n - 1) -- which might be (lazily) represented in lambda calculus as factorial := λn. n * factorial (n - 1) So: while = helps assign names to terms in this fairly primitive sense, and is separate from the actual creation of function terms, it is more than just a := in that the LHS pattern can be matched against later on. It is more than a name in that way, and really a name for a function term that case matches its input before determining its behavior. If we wanted to be very explicit, even across several Haskell equations, factorial can still be truly thought of a single term with this case switching embedded: factorial := λn. case n of (0 -> 1 ∣ m -> m × factorial(m−1)) Mind you, everything is a function when we draw analogies to lambda calculus. The most general characterization of this is written as follows, where case ... of is a valid Haskell expression: -- a set of equations like f p11 ... p1k = e1 ... f pn1 ... pnk = en -- is precisely equivalent to f x1 x2 ... xk = case (x1, ..., xk) of (p11, ..., p1k) -> e1 ... (pn1, ..., pnk) -> en -- and can be more explicitly written f = \x1 x2 ... xk -> case (x1, ..., xk) of (p11, ..., p1k) -> e1 ... (pn1, ..., pnk) -> en This syntactically separates each of the involved components here: = attaches an expression to an alias \ initializes an abstraction (function) case ... of facilitates pattern matching This story is far less clear with just the first version (the set of equations), where our convenient use of = masks these mechanisms. Declarations are what we generally call statements that serve as definitions. This includes equations (i.e., pattern bindings), as well as type synonyms, data statements, etc. Any statement that associates some expression with a name, “ascribing meaning” to scoped aliases, is a declaration. The notion of a “declaration” is more general than what we called an equation above. One can think of the hierarchy of terms involved loosely as Declaration (introduces objects) ├─ Binding‑declaration (pattern-expression associations, aka equations) │ ├─ Function‑binding -- “f x y = …” (multiple clauses allowed) │ └─ Pattern‑binding -- “(x,y) = …”, “Just n = …”, etc. └─ Other declarations -- type, data, class, import, … Note that function bindings can fail to match (refutable), while pattern bindings are lazy (and are irrefutable). Equation resolution: matching, binding, evaluation Syntax definitions Expressions (syntactic terms) refer to values (the actual abstract objects). 5 is the character (expression) we use to refer to the “concept of 5.” Simply put, we say the values are the real things, expressions are how we write them down (and not strictly in a canonical form, e.g., both 5 and 4+1 are expressions that yield the same value). Type expressions are how we write type values (or just “types”), just like the above distinction. For instance, Integer -> Integer is the type expression that represents the type of functions to/from integers. A value together with its type is called a “typing,” and we use :: to declare a type relation (read as “has type”): 5 :: Integer inc :: Integer -> Integer [1,2,3] :: [Integer] Polymorphic types In λ2\lambda 2λ2 we introduced generics and higher-order quantification of type variables. Haskell only supports universal quantification (i.e., the specification of generics), and we do this just using “bare” (free) type variables. So something like length :: [a] -> Integer length [] = 0 length (x:xs) = 1 + length xs does the following: First declares length to be a generic function operating on lists of any type and maps them to an integer. In our more formal type theory context we might instead write length : ∀a. List[a] -> Int but the point is that we don’t need to explicitly bind a with ∀. Note also that [x] is a type constructor. Recall our formal definition of a list type, like rec type List[Item] = [nil: Unit, cons: {head: Item, tail: List[Item] } ] Although it’s sort of hard to justify putting a in the middle of some brackets in the functional sense (so [a] versus something like [](a), where in the latter case it’s clear we’re explicitly “calling” something we’ve decided to label as []), we simply define [a] = List[a]. With length [] = 0, we’re declaring that length applied to [], the empty list (or nil above), evaluates 0. With length (x:xs) = 1 + length xs, we define the rest of the function’s behavior recursively. Note that x:xs basically unpacks cons of the above definition, assigning x to the head of the list and xs to the remaining sublist. To be clear, our list type is a variant/sum type, and has values that are either nil or cons, or in Haskell, either [] or (x:xs). Our eliminator here is pattern matching, which is being done implicitly as we define our “overloaded” function variants, which will match a sum type [a] as either its nil or cons “internal” type. User-defined types data can be used to declare new types: data Bool = False | True data Color = Red | Green | Blue | Indigo | Violet data Point a = Pt a a Bool and Color are nullary type constructors (can define all types as type constructors, where those not parametrized by a type variable can just be seen as a constant, nullary function) that are sum types. The enumerated “values” here are called data constructors (also nullary in this case). To be clear: data constructors produce values, type constructors produce types. Point provides an example of a “proper” type constructor, ranging over a type variable a. data Point a suggests Point is a type constructor generic in any type a, and has terms that are constructed via a data constructor Pt a a. The latter can perhaps be more formally written Pt t1:a t2:a, which is to say it “operates on” two terms t1 and t2, both of type a as bound by the type Point a. So on the left, a is a type variable (ranging over all types), but on the right, a is a term variable (ranging over terms/values of of type a). In any case, from the usual programming perspective, this notation slightly confuses me because it seems to do so little, and the RHS feels so redundant: we end up with two super generic things that define basically no specific functionality. Point a and Pt a a seem to nearly do the exactly same thing. But from the type theory perspective, I suppose it’s clear enough, where the RHS is serving to tell us the signature we’ll use to introduce terms of the type. In Python, for instance, this is like the difference between a class declaration and the constructor signature: class Point[a]: def __init__(self, x: a, y: a): ... The confusing part of the association here is that, in Python, we don’t introduce some new alias for the constructor; we just use the class name and have Point[int](1, 2), with the type variable and terms all in one place. This raises a key point, and a common practice in Haskell: we can simply let these two names be the same to capture this arguably more intuitive syntax (they are in separate namespaces, so there’s no concern of clashing), and could have data Point a = Point a a So to be clear on the usage: -- in the former case Pt 2.0 3.0 :: Point Float Pt True False :: Point Bool -- in the latter Point 2.0 3.0 :: Point Float Point True False :: Point Bool While I get we might want to make a distinction between the names of constructors in different spaces (term vs type), for now it seems there’s really no reason to name them differently. Pt a a is complete This perhaps jumps the gun a bit with respect to the pattern matching details provided below, but I want to expand a bit on a particularly sticky concept as I warm up to Haskell conventions. For some time, I’ve treated the declaration data Point a = Pt a a as incomplete, in the sense I had always implicitly expected some more concrete definition for Pt as a constructor must be provided later. From above, I believe my intuition rather directly lead me to think the Python analog was class Point[a]: def __init__(self, x: a, y: a): ... without the constructor body, as in both appear to leave out the final details for what Pt should in fact do with the two values provided to it. Note that it is perfectly acceptable to think of Point’s __init__ as positionally equivalent to Pt: they both serve as constructors for Point[a]/Point a. In any case, Pt a a appears merely to set up the constructor signature just like __init__ shell above. But there is nothing more to say: it is already complete. It’s true that Pt :: a -> a -> Point a, but this evokes thoughts that Pt is something that will take two values x: a and y: a and produce some new form that will have type Point a. However, Pt a a is that thing, literally; there’s not a more involved object with attached variables or methods like we expect in OOP. The mechanics for acting upon such an object are built-in: pattern matching. Pt may be used, for instance, as follows: p :: Point Int p = Pt 3 4 norm1 :: Num a => Point a -> a norm1 (Pt x y) = abs x + abs y -- eliminate (pattern match) An object like Pt 3 4 is directly matched against in the pattern (Pt x y): we’re able to bring our component values into the RHS scope as x and y directly, “destructuring” our object upfront. The larger point here: data declarations don’t yield something strictly more concrete than what they appear to be syntactically. As vague as something like Pt a a looks, this form alone shapes our type: two terms of type a in a box called Pt. That’s about as specific as we need to get, and as specific as data declarations allow. Type synonyms The type keyword lets us define type “synonyms,” (or type aliases, as I’d prefer to call them) assigning new names to compositions of previously defined types. For example, type String = [Char] type Person = (Name,Address) type Name = String data Address = None | Addr String -- can also do this for polymorphic types data Tree a = Leaf a | Branch (Tree a) (Tree a) type AssocTree a b = (Tree a, Tree b) Functions -> notion for types, as usual: add :: Integer -> Integer -> Integer add x y = x + y Functions are curried in the usual way, i.e., can think of add as a function with one arg that returns another function with an arg: add :: Integer -> (Integer -> Integer) Curried functions make partial application particularly clear, e.g, inc = add 1 There’s still another operand to “finish” the add operation, but the 1 is tucked away as the first operand, such that inc is effectively now just f(x) = add 1 x. Functions can also be taken as an argument (no surprises here): map :: (a->b) -> [a] -> [b] map f [] = [] map f (x:xs) = f x : map f xs -- example application map (add 1) [1,2,3] => [2,3,4] And to be clear about function terms and their names: inc x = x+1 add x y = x+y -- are really shorthand for: inc = \x -> x+1 add = \x y -> x+y where we use \x -> x+1 to define an anonymous function, in the usual λ\lambdaλ-expression or abstraction sense from lambda calculus. Functions are non-strict Haskell processes definitions rather than assignments from other languages. If we have something like v = 1/0 Haskell treats this like “define v as 1/0” rather than “compute 1/0 and store the result in v.” The declaration alone does not imply any computation occurs, and in general, evaluation is lazy in nature. Such an evaluation will occur only when the value is needed in some context, in which case only then will we encounter the zero division error. For a function like const1 x = 1 evaluations of the function never even look at the value of its argument. It knows it doesn’t need it, and again, something like const1 (1/0) will evaluate to 1. It’s as if 1/0 simply gets to stay in syntax form, and any issues will go completely undiscovered until we need to coerce that syntax into a value. Recursive functions Haskell permits “informal” self-reference when defining recursive functions, as is commonplace in most modern programming languages. We discuss the issues here at length in Typed lambda calculus§Recursion and the fix combinator. I think there’s an important new perspective to really internalize here when thinking about the object we get from recursive definitions, certainly at a deeper level than I’ve been accustomed to. Take for instance the following: ones = 1 : ones The usual surface-level interpretation here, at least as far as I think about it, is that ones is some arbitrarily large, “expanding” self-referential object/tree. Now this is perfectly fine, and even practically accurate when we try to follow the recursive stack traces in some real program. But this is less valid when we really force ourselves to make ones concrete at definition, rather than a dynamically growing thing. Note what we’re actually saying with the definition: ones is some object that, together with some extra structure (the prefixed 1), is still itself. In other words, prefixing with a 1 can’t change what ones actually is here, it must already “internalize” that action such that the object already includes it, in a sense1. With any finite structure, this of course feels incredibly paradoxical: we are by definition adding something on top of ourselves, which must give us something new, something bigger. But when ones is simply the infinite sequence of 1s, we break out of this limitation. Such a sequence does not change when we add another 1 to it. We can also see this object as the fixed point of the function adding our structure, e.g., f(x) = 1:x. This harkens back to our definition of the Y-combinator, albeit where x in f(x) is a nullary function and the f(x) we just provided looks less like a functional. In any case, f(ones) = ones, which is effectively what our original declaration requested of us: assign to ones the object that doesn’t change when adding some structure to it (in this case, prepending a 1). The fibonacci sequence is a slightly more involved definition: fib = 1 : 1 : [ a+b | (a,b) <- zip fib (tail fib) ] Again, I find there are two ways to start understanding a definition like this: As a “generating” sequence, and you can unravel it one call at a time. We take what we know to be concrete to start (the base case): the sequence 1 1. We then zip up the sequence with its offset to produce the next chunk by adding the pairs: [1 1] [1 null] -> [2], add this to produce 1 : 1 : [2] [1 1 2] [1 2 null] -> [2 3], add this to produce 1 : 1 : [2 3] [1 1 2 3] [1 2 3 null] -> [2 3 5], add this to produce 1 : 1 : [2 3 5] Note how with each step we’re effectively “peeking” at what the next item will be, and then we perform the full computation again, taking that new sequence as if it were the value of fib to begin with. This is like packing in more and more recursive calls, approaching the true object. Rather than an iteratively applied function, which is perhaps the only practical way to build up intuition and get some object, you “snap” straight to the full, global, infinite object. Note again how we’re not even really defining functions in the usual way. For instance, in other languages we might canonically construct a fib(n) function that computes the sequence for the first n items. Here it’s as if we’re defining an infinite sequence outright; at least in this example, fib is not a function that accepts arguments (although a concrete value could be lazily evaluated if some piece of code attempted to index into the fib list). Case expressions and pattern matching Patterns are the symbols used to refer to variables in parameterized contexts, like function definitions as we’ve seen. Haskell makes use of pattern matching to verify appropriate cases. Attaching values to variables or arguments can be generally thought of as first undergoing matching (determining an appropriate definition to fill), and thereafter binding to the variable in that context. Patterns can either fail, succeed, or diverge. They diverge when an error is present (i.e., ⊥\bot⊥), fail if no pattern in an equation is matched, and succeed when at least one is (and the first is taken). Matching occurs left to right in an “equation,” ie, a sequence of patterns on a line of a function declaration, and top to bottom across lines. As-is patterns: you can attach aliases to patterns on the left side that can be reused on the right side: f (x:xs) = x:x:xs f s@(x:xs) = x:s Wild-cards: can use _ to match against any input value Boolean guards: effectively define boolean cases for input values. The “usual” case can be thought more of as direct matches, whereas here we have more control over the exact conditioning being tested for a match: sign x | x > 0 = 1 | x == 0 = 0 | x < 0 = -1 Case expressions All pattern matching thus far has been seen in the context of function definitions (which is general enough), but we don’t always want to define a function to do this. So a function like f p11 ... p1k = e1 ... f pn1 ... pnk = eN This is basically a case-switch kind of expression over K constraints (the usual case being k=1; we match one value against N cases and produce a value according to the matched case). We can move this functionality into a case expression, like f x1 x2 ... xk = case (x1, ..., xk) of (p11, ..., p1k) -> e1 ... (pn1, ..., pnk) -> en For instance, we can re-define take (although I didn’t define it above; the before and after are both below): -- equation matching in func def take 0 _ = [] take _ [] = [] take n (x:xs) = x : take (n-1) xs -- same definition but with case expression take m ys = case (m,ys) of (0,_) -> [] (_,[]) -> [] (n,x:xs) -> x : take (n-1) xs In this case I don’t feel it really demonstrates the utility (we just get the same thing, and the first takes fewer lines), but the point is that the case matching can take place on the RHS, in an arbitrary scope, as part of the actual “functional code” we might use to actually define a function. That is to say: we can use pattern matching beyond just inside function definitions. Note that if statements can also be reduced to pattern matching via case expressions: case e1 of True -> e2 False -> e3 -- can expressed with if e1 then e2 else e3 -- which can still be seen as a function if-then-else :: Bool -> a -> a -> a Lazy patterns Lazy patterns are of the form ~pat, and are irrefutable, in that matching against a value will always succeed. (This has been a particularly slippery concept to wrap my head around, but I think I’ve got it down now.) Below I follow both the tutorial and an example from the wiki. There are few lines of confusion to battle against here. First, we said functions were non-strict earlier, meaning if I pass in a problematic value to a function that doesn’t use it, it won’t “break” the function and it’ll evaluate without a hitch. This already feels like “laziness” in a sense: we only evaluate expressions/definitions at the time they’re needed. But it’s not lazy at the time the argument is matched, if it needs to be. For example, if I have f (a,b) = g a b, where f accepts a pair as input and splits up the items to apply g, Haskell will check for a pair constructor in the input before g can be applied. That is, the elements of the input appear to be needed, so at the time of f’s evaluation, we’ll check that our input is a pair, and otherwise fail to match. This very check is what we’re considering “not lazy enough.” A lazy pattern like f ~(a,b) = g a b, however, includes an irrefutable pattern, and will match successfully on any input passed to f. That input may very well not be a pair, as desired, but we’re explicitly saying we don’t care…yet. You can basically treat the lazy definition like f p = g (fst p) (snd p) where this is a legitimate non-lazy analog (p is not specific, yet we will treat it like it’s a pair inside the function body). We just see some input p, and while we know this should be a pair, we don’t care at the time of the function evaluation. You can think of f p as simply delaying the pair evaluation until we evaluate g. This is powerful because g itself may not care about the values of its inputs, in which case it can produce an output regardless of the actual values. Alternatively, g could also have lazy patterns for its inputs and defer the evaluation even further. In any case, the big point here is that lazy patterns let us pretend input values meet the structural checks until the whole expression actually needs to be evaluated. I find the simplest motivating example to be the case where a constant function is buried inside arbitrarily many outer evaluations: C x = 1 aN p = C p ... a2 p = a3 p a1 p = a2 p If I then want to call a1 for a particular value, I’ll check that the value matches the (non-lazy) pattern p from a1 p, and then attempt to evaluate a2 p. I’ll then do the same for a2, eventually getting to a3, and so on. Here I actually have to maintain a recursive stack and spend time unpacking each function until I reach C. If these were all lazy patterns, however, I can go straight to C, delaying all the evaluation steps I detailed above until the very latest moment. And in this case, we find it doesn’t even matter: my output will be 1 regardless of the input. Lazy patterns here allow my program to never need to even look at p to get that result, allowing it to “be aware” of computations it can skip (by virtue of us delaying the computation, until we get to a point where we can just toss out the whole expression). This sounds sensible enough, but it opens up two questions for me: If this is so helpful, why aren’t all patterns evaluated lazily? Strict patterns seem purposeful in that they enforce the structure we declare the input should be. Don’t we mostly want our program to stop if there’s a structural inconsistency, even if it doesn’t have a functional impact? Note how these questions are mostly at odds with each other, but I suppose that underscores some of the confusion I have here. The above example was fairly contrived, and only a guess as to the actual behavior. Below is a more principled analysis from this very thorough breakdown of laziness. Non-strictness vs laziness We just mentioned how non-strictness is not quite what we mean by laziness. Non-strictness is the general property of Haskell programs that ensures expressions won’t be evaluated until needed; “we evaluate as little as possible and delay evaluation as long as possible.” But that characterization is loose in the sense that what we mean by “evaluate” and “need” are loose. For instance, we might check for the presence of a constructor to verify type correctness but not actually look at any values, at least not unless we need to. As I understand it, lazy evaluation is effectively a mechanism for enforcing a particularly strong notion of non-strictness. Thunks A thunk is effectively an unevaluated expression, and is the mechanism through which we’ll represent lazy evaluation. Like before, we’ll use an example with a pair that’s pattern-matched on the LHS: let (x, y) = (length [1..5], reverse "olleh") in ... where we assume x and y are used somewhere beyond the in clause. Now here we might take x=5 and y="hello", but this involves resolving those expressions when binding the values. But we again don’t do this until we actually need those values, somewhere after the in. Until then, we can call both x and y thunks: unevaluated expressions, “lying dormant” but able to be resolved at any moment. As before, note the pattern matching happening in the pair argument. If we instead have let z = (length [1..5], reverse "olleh") in ... Here z is a thunk, and we don’t have to deconstruct it right away like we did with x and y before (although that’s all that happened there, producing two thunks). We can be very explicit by including a pattern match on z: let z = (length [1..5], reverse "olleh") (n, s) = z in ... After the first line, z is still a thunk. But in line 2, we pattern match on z, requiring us to split it into two thunks, with (n, s) being (<thunk>, <thunk>). To quote the article: The compiler thinks ‘I better make sure that pattern does indeed match z, and in order to do that, I need to make sure z is a pair.’ Be careful, though — we’re not yet doing anything with the component parts (the calls to length and reverse), so they can remain unevaluated. We can take this a step further with something like let z = (length [1..5], reverse "olleh") (n, s) = z 'h':ss = s in ... which is the same as before, but now pattern matches on the second component of z, checking that it is a list with a head 'h', and attaches to ss the tail of the list. Specifically at this stage we Evaluate s at a surface level to check it’s a list (or a cons object), such that s = <thunk> : <thunk>, basically The first newly “revealed” thunk is evaluated to check it’s an h, leaving us with 'h':ss = 'h' : <thunk> In total, we see that Haskell values can be partially evaluated, and any one line (or just piece) of computation may require some minimum amount of needed evaluation (like with the pair or list checks, where we’re “peeling” back some of the layers only to check for the needed structure), i.e., leaving as many nested components as thunks as possible. How the value (4, [1, 2]) is evaluated, step by step We have formal names for the “layers of evaluation” involved here. Any of the intermediate evaluation steps of a value is said to be weak head normal form (WHNF), while a fully evaluated value is in normal form. Here we’ve laid some of the groundwork of laziness and how to think about layers of evaluation via thunks. Nothing is evaluated until it is needed, generally speaking (as in we’re not even onto lazy patterns at this point). Interestingly, aside from a few I/O exceptions, pattern matching is the only place where Haskell values are evaluated; in the end, everything is left as a thunk until a pattern requires peeling back a layer to check the structure…and nowhere else does this thunk resolution take place. Lazy and strict functions Functions can be lazy or strict in an argument (and possibly different across each of them). A function is said to be strict in an argument if it does some evaluation with it, evaluating it to at least WHNF, while being lazy in an argument if no evaluation takes place. A function is stricter than another if it evaluates an argument to a deeper level (of the same structure, say; general comparison is between barebones f x and g x). Performing evaluation is also called “forcing” a value in some cases. If we “force” (try to evaluate whatsoever) the value of undefined, our program will halt. So the following will yield errors let (x, y) = undefined in x length undefined head undefined Each of these do some checking on the values, trying to peel back one layer of thunks. But in doing so they encounter undefined, and halt. But a thunk that’s “hiding” an undefined value can go undetected if we never try to evaluate, and lazier settings like the following don’t produce any errors: -- we just see our value is a pair, but not the values in the positions let (x, y) = (4, undefined) in x -- same sorta thing, we just see that we've got a list but dont look at the -- cell values length [undefined, undefined, undefined] head (4 : undefined) We can call a function f strict if (and only if) f undefined results in an error (implying that f will peek at the structure of its argument, evaluating at least one level, which is a problem). This is at least how we can determine strictness without actually knowing f’s definition. It’s worth noting there’s a bit of a confusing nuance to really hit on with this definition2. When we say a lazy function is one that doesn’t evaluate its argument, we naturally mean this when evaluating f (otherwise we just have this unevaluated thunk f x). So given we’re forcing f x, do we force x as a result? That is, if I need to “fully evaluate” f x, will I need to fully evaluate x to get there? If not, this basically entails that f’s “real value” doesn’t depend on x in any way or it hides it inside another abstraction that doesn’t need to be unpacked in order for f x to be in normal form. So laziness means we can get there without needing to think about x. Lazy pattern matching Now we’ve got a bit more context behind delayed evaluation and intermediate forms to motivate the explicit use of lazy patterns. We already introduced them, but here’s a simple example demonstrating their use once more: Prelude> let f (x,y) = 1 Prelude> f undefined *** Exception: Prelude.undefined Prelude> let f ~(x,y) = 1 Prelude> f undefined 1 The first f is strict: it doesn’t refer to x or y in its body, but it accepts a structured input (a pair) and we have to evaluate whatever’s passed in to check it has a pair constructor. So when we pass in undefined, we are forced to check if that’s a pair, and we end up evaluating undefined which halts our program. But the ~ in the argument pattern in the second f delays this structural check, meaning f will just “take it on the chin” for whatever value we give it (even if it’s not a pair). We simply wait until the last possible moment to use whatever that value is, rather than “gate checking” it on input. This means that f undefined doesn’t cause any issues; we wait as long as possible to evaluate what we pass in, and in this case that turns out to be forever. So undefined goes by un-evaluated. Scoping and nesting There two ways to create local, “block-like” bindings: let and where. let expressions facilitate local bindings scoped to a particular expression. For example let y = a*b f x = (x+y)/y in f c + f d Here we’re defining y and f locally (and note that f uses y), and then taking them as available names to construct the final expression f c + f d. let ... in can be used anywhere a typical expression can, so we could have the above in a function binding like normalizePair a b c d = let y = a*b f x = (x+y)/y in f c + f d where clauses are similar to let in that they facilitate local bindings, but they are crucially not expressions. where clauses are only allowed at the top level of a set of equations or case expression and must “attach” to a declaration (function or pattern bind). So an analog of the above that makes it look a whole lot like let: normalizePair a b c d = f c + f d where y = a*b f z = (z+y)/y This is okay because we’re operating right at the function declaration here. But if we were working in some more deeply nested layers, you would have to use a let expression, since where would not produce a sub-expression. Think of the above let variant as actually producing a proper expression strictly on the RHS the equals sign, whereas here the where is effectively attaching to the LHS and isn’t valid in isolation. As in, if I just took f c + f d where y = a*b f z = (z+y)/y this would be invalid, not an expression, etc; the where can’t “see” any nearby function binding. Again, it’s like where is just a convenience sugar that comes coupled with declarations: pattern = expression where ... The use of where is coupled with the presence of pattern; take the pattern out and you can’t use it, basically. This is mostly convenient when you’ve got a function with guards and you need your locally bound variables across the guard constraints, like f x y | y>z = ... | y==z = ... | y<z = ... where z = x*x Making z available to each constraint is something that where handles automatically. You can’t use let here because this whole thing isn’t an expression, it’s a set of expressions. You could use lets inside the ... on the RHS of each constraint, but that’s painful. where gets to “sit above” the expression level and be a helper for declarations. These statements were a bit confusing at first because they just feel so particular. But as I slowly get used to them, it’s becoming clear they just facilitate the typical composability of most non-functional languages when it comes to working with variables in scopes. That is, the really very basic ability to define some variables within a function and reuse them to build up other variables or terms. Those are very automatic in languages like Python, but require a bit more care in Haskell. A very basic example of where (with no guards, so let could used here with similar effectiveness) drove this home a bit, feeling pretty familiar: areaTriangleTrig a b c = c * height / 2 -- use trigonometry where cosa = (b ^ 2 + c ^ 2 - a ^ 2) / (2 * b * c) sina = sqrt (1 - cosa ^ 2) height = b * sina Type classes Type classes allow for the categorization of types for which particular behavior is defined. For example, if we think about an equality operator, i.e., ==, we might want this to be broadly applicable for comparing values of many different types. We face a few practical challenges right away: Not all types are comparable. That is, types aren’t required to admit some notion of equality, so we can expect == to remain undefined for some types. For types that support it, what equality means may differ heavily across those types. Here we expect == to be overloaded such that the same operator can be used in each type context with whatever specific machinery we need to compare those values. These are standard concerns for general operators. Haskell’s type classes address this by effectively allowing us to declare interfaces that types can inherit. For instance, we can define the class of types that are “equatable” as class Eq a where (==) :: a -> a -> Bool This says: a type a is an instance of the class Eq if it defines an operator == that compares values of type a. The function/operator == is considered a method of the class. We can “attach” types to this class with instance declarations, like instance Eq Integer where x == y = x `integerEq` y This can basically be taken to mean we’re “enrolling” the type Integer in the Eq class by fulfilling the template/interface required by Eq (where integerEq is assumed to be a sufficient comparison function). Contexts With access to a type class, we can use contexts to more tightly bound polymorphic type expressions to be quantified only over types belonging to that class. For example, we can now generally refer to our == operator as having the type (==) :: (Eq a) => a -> a -> Bool which suggests that == is a generic function only over types that are instances of the class Eq. The notation (<class> <type-var>) => <type-expr> is how we generally capture this: (C t) => E bounds the type t as it appears in the type expression E to belong to the type class C. This actually feels very natural, and closes the loop on a lot of confusing syntax I’ve seen up to this point. This further feels like bounded UQ as we’ve studied it in broader type theory: we can express types that are quantified over bounded type variables, and those bounds are facilitated by type classes. Contexts can be used generally in type expressions, including in other class definitions: class (Eq a) => Ord a where (<), (<=), (>=), (>) :: a -> a -> Bool max, min :: a -> a -> a This defines a new class Ord a, where the type variable a is bounded to itself be an instance of the class Eq. Ord is considered a subclass of Eq: we’re saying it is defined only over types that belong to that class, is thus comparatively more specific. One can also express multiple inheritance like class (Eq a, Show a) => C a where ... One can use class constraints (i.e., contexts) within the method definitions of another class on a type variable except that which is bound at the class level: class C a where m :: Show b => a -> b This is quite natural: we’re just saying that a can’t be further restricted by m’s type expression (e.g., doing something like :: Show a => a -> b). That is, we have no power to restrict a at that stage; if we wanted to that, we’d need to move it into the class declaration (e.g., class (Show a) => C a). More on contexts For a moment, restriction with contexts feels like we’re bringing in more than just types to our quantified type variable. With something like (Eq a) => a -> a -> Bool I’m saying I want a type that belongs to Eq, which means I’ve got a defined method along with whatever type a shows up. That feels odd, like I’ve got a term-type bundle that I’m quantifying over. This confused me at first, since it feels like I’m magically able to enforce a behavioral constraint. That isn’t to say that’s a bad thing: there’s lots of freedom there. But I struggled to draw up a pure type theoretic analogy. The thing is: it’s not really a behavioral constraint, and what we’re doing can be rather trivially expressed as simply bringing the method types into the type expression. For example, we can alternatively write the above as (a -> a -> Bool) -> a -> a -> Bool That is, my context (Eq a) => is merely “bringing along” the requirement that we have another defined term of a certain type. A given type class only “enforces” any of its methods up to their types anyway: we don’t actually have some behavioral, term-level declaration. So any time I use a type class as context, I can just think about explicitly bringing the type signatures of each of its methods into the type expression I’m constraining. Since our example here coincides with the type we assigned to == globally, we can run with that to demonstrate further: writing == :: (Eq a) => a -> a -> Bool can be directly interpreted as saying == is a function operating on two equatable types and returning a Bool writing == :: (a -> a -> Bool) -> a -> a -> Bool says the exact same thing as above, but it makes clear that whatever type a we’re working with brings its own notion of equality in the form of the (a -> a -> Bool) function (which is all that initially meant by forcing an equatable type with Eq a, so same difference) This makes clear a few things: Our type variable a, when constrained by contexts, is still just a type variable in the usual sense. The method “baggage” we’re requiring can be seen as something extra, yes, but it’s extra in the type expression, not “in” a itself. We’re basically seeing how == is really just a “wrapper” generic function that does nothing but call underlying methods. It takes two values of the same type, along with a method that operates on values of that type (the type-specific equality method), and plugs those values in. This is how we get ad-hoc polymorphism (i.e., overloading) via parametric polymorphism: we define fully generic functions, but require calls on values of a particular type to manually supply methods that work with that type, such that our outer generic function can delegate all interaction to those methods. This is basically just giving us a way to manually define or “inject” type-specific function behavior on a type-by-type basis, which is of course what we observe when overloading operators in most other languages. The point is that, in Haskell, we have to build that up using proper generics (i.e., parametric polymorphism). Type constructors Recall our parametric Point type from before: data Point a = Pt a a. Point is a type constructor: it takes a type a and produces another type Point a. So Point a is concrete, first-order (for some a), i.e., values can inhabit this type. But Point alone, as a “higher-order” type, is not first-order, and there aren’t any values in Haskell that can inhabit it. To be clear, these are both concrete values in type space (i.e., types), they just have different kinds: Point Int : * Point : * -> * We’re basically saying that a type needs to have kind * in order to be inhabitable by a value. That is, only first-order types can be inhabited by a value; it’s fairly nonsensical to suggest that Point could be inhabited by a value. Note that both Point and Point Int inhabit kinds, it’s just that only Point Int is a construct that can then further be inhabited by a value while Point is “too abstract” (we first need to plug in another type to yield a first-order type). Digression: Sorts and universes …Why is that? In what way is Point too abstract? It inhabits a kind just like Point a (for any a); is that not enough to just call them both types? This is a good place to have a discussion regarding a critical distinction in how we categorize terms, types, and higher-order sorts. This has been a bit of a pain point throughout my type theory “journey,” and the rubber seemingly meets the road now that we’re discussing these items in a more concrete context like Haskell. The key elements I want to highlight here crop up as we start thinking about where type constructors “live.” Above, we just said both Point Int and Point are “types,” since they both are concrete objects living in type space. They’re not at the term level, and they’re not a kind: they’re sandwiched in between. Nevertheless, calling them both types isn’t exactly how we’d word it in everyday use, and it’s pretty much outright wrong in a Haskell context given that Point can’t be inhabited by a value in the usual sense. Now, the fact that Point can’t be inhabited by a term/value makes intuitive sense: it’s a function that operates on types. How could you even build a term that makes sense here? Of course, you can quantify over Point, binding its type variable so that the expression can become closed and represent something concrete, but it otherwise just includes a free type variable that’s hard to make sense of. So how do we formalize the distinction here? What I mean by that is both Point and Point a are in this “type realm” like we’ve said; we’re not reaching for kinds or higher-order sorts. In a type theoretic sense we can call them both just types, similar to how we can still call a lambda abstraction a term even though it abstracts over terms (and type constructors like Point abstract over types). Universes are what help us further distinguish these items (in the Martin-Lof sense). Universes help capture what we mean by higher or lower order types. A “higher-order type” doesn’t mean we’re referring to kinds (as in a “higher order sort,” jumping from the classification “type” to “kind”), but instead to abstraction over types. In particular, universes represent a hierarchy of types, such that each step up the ladder of universes implies abstraction over the items from the previous universe. Type₀ : Type₁ Type₁ : Type₂ ... Point : Type₁ -- because Tree : Type₀ → Type₀ Point Int : Type₀ Pt 3 4 : Point Int 3 : Int Here TypeN\text{Type}_NTypeN, or Un\mathcal{U}_nUn, refers to the nnn-th universe type. Note that we somewhat overload our usual : in that we technically have types on both sides (and not a type on the LHS, kind on the RHS). This is okay given we think of these universe types as collections of certain kinds of types and therefore represent a higher order construction; something like T : Type₀ says that the type T is in the 0-th universe. How do we know something like Type₀ is itself a type, though? Part of me is tempted to simply say whatever it is, it could be a kind. The thing is, we construct this hierarchy by saying the universe Type₀ is an element in the universe Type₁, and we don’t leave “type world” to do this. So while the kind * certainly captures all nullary type constructors (first-order types), the kind itself is beyond types altogether, and can’t exactly be taken as a type in a higher type universe like Type₁. But we’re doing that in spirit, we can just think of it more like lugging the whole collection of Type₀ types with us such that it becomes a new primitive in our higher universe. (I’d like a little more on this; really getting why a universe type can still be a type in the usual sense. Probably the connection to abstraction over types, as in type constructors, is what’s can be convincing here; if Type0 is similar enough to a type constructor, and the latter is a type, then the former can reasonably be as well.) To be clear: universes simply group up types, breaking up type space into “orders” of types. First-order types, i.e., those belonging to Type₀, are the only ones that can be inhabited by values. Other higher-order types can then simply be seen as having some “impassable” universe layers separating them from the term space. Therefore, they can only be inhabited by objects in the N−1N-1N−1 universe, and for all but Type₀, those inhabiting objects are still themselves types. This gives us a meaningful basis for saying a phrase like “type constructors are inhabited by first-order types;” if something is going to inhabit a type “that abstract,” it can only be a “more concrete” type. Doing this for arbitrarily high order types eventually gets us to a “most concrete” type, after which the inhabiting object simply becomes a term (although one can naturally still think of terms as values in a universe, where a concrete type is simply a set of its possible inhabiting terms). I’ve pondered this for a considerable amount of time, struggling to really grok the idea of letting a collection of types, as represented by Type0\text{Type}_0Type0, itself be a type. It simply didn’t track for quite some time: it just feels like a kind, and I don’t like that we seem to ignore this. In fact, it seems very clear when we let universes be a partition of type space, in the sense that each universe is a collection of types bundling up a “new group” of types built on top of the last group. This tracks with the idea of universes building up increasing abstract type constructors, and aligns nicely with the notion of “unions of kinds.” The problem with this is that each universe is a term in the next one. That is, the thing we’re using to reference a collection of types is now just a primitive term in the next universe up. That bothered me for a long time. I just didn’t get what the thing was supposed to now be in the next universe. U0\mathcal{U}_0U0 as a type is sensible: I can think of it like a set with elements inside, and if we liken ::: to ∈\in∈, then something like Integer:U0Integer : \mathcal{U}_0Integer:U0 tracks just fine. But U0\mathcal{U}_0U0 as a standalone term annoys me, and it no longer feels right. The interpretation that helps me here is to allow U0\mathcal{U}_0U0 to be a building block for terms on this new “plane.” You can even liken it to * as a kind, in the way we use * in kind expressions to abstract over types. Whatever work the symbol * is doing, we’re basically letting the reference to U0\mathcal{U}_0U0 (as a term) do that same work. And you can start building other terms with it, e.g., U0→U0\mathcal{U}_0\rightarrow \mathcal{U}_0U0→U0, which is really not any different from what we mean by ∗→∗*\rightarrow *∗→∗. Note how while U1→U1\mathcal{U}_1\rightarrow \mathcal{U}_1U1→U1 would be in U2\mathcal{U}_2U2 (a new universe), it can still be related to a kind like (∗→⋯→∗)→(∗→⋯→∗)(* \rightarrow\cdots\rightarrow *) \rightarrow (* \rightarrow\cdots\rightarrow *)(∗→⋯→∗)→(∗→⋯→∗) (i.e., a map from an nnn-arity type constructor to an mmm-arity type constructor). That is, higher universes don’t correspond to higher sorts; it’s not like we have to leave “kind space” to start representing universes beyond U1\mathcal{U}_1U1. Instead, universes are basically just convenient ways to refer to all types with a certain level of abstraction (or below), and we can canonically think of those things as still having some ascribable kinds. Universes also group up types with a “broader brush:” U1\mathcal{U}_1U1, for instance, basically groups up all first-order type constructors. We don’t have a convenient way to refer to all types that meet that description with kinds; we have to write out ∗→∗*\rightarrow *∗→∗, ∗→∗→∗*\rightarrow *\rightarrow *∗→∗→∗, etc to capture the notion of arbitrary first-order arity. All types with that structure inhabit U1\mathcal{U}_1U1, however, so we get a single term we can now use to refer to them. We make a jump to a new universe where that single “smaller” universe reference a is new term, and build some new higher-order terms from there. This is again just like assigning a name to some first-order kinds, and building outer higher-order kinds (like example above) that nest the first-order ones inside. Another thing that might help, if not pretty much implied by the above: we effectively “reuse” the notion of typing at each new universe. It’s odd to initially start representing collections of types as types themselves. We never claim to leave type space, but we wrap lower universes back around to be terms that are categorized by higher universes. We just reuse the notion of type inhabitance each time to capture that relationship: each time we get some fundamentally new thing we can work as a term to build up other terms in this new universe. class Functor f where fmap :: (a -> b) -> f a -> f b instance Functor Point where fmap f (Point a) = Point (f a) www.haskell.org/tutorial/goodies.html wiki.haskell.org/Lazy_pattern_match en.wikibooks.org/wiki/Haskell/Laziness This may already be quite clear, but to reinforce this intuition even more (because I find it important): ones here is not some object that we find ourselves prepending a 1 to. When looking at the RHS, I find myself thinking that ones could technically be some arbitrary value, and we just need to find one that makes the equation work. In a sense that’s perfectly okay, but I think there’s too much mental freedom with my read on what’s happening here. I think it’s better to be very clear that ones is born entirely out of the extra structure used in that equation. A similar point can be made with the general fix setting we explored in Typed lambda calculus, e.g. for factorial: G(fact) = λ(n: Int). if n=0 then 1 else n * fact(n-1) Here fact isn’t some construct that we find ourselves putting into that equation, “plugging it in” and checking if it works. No; it is completely defined by that equation. It is not a separate thing in any meaningful sense. The same applies for ones above: it is the thing that internalizes, infinitely, the action of prepending a 1. This note is really just a reminder to not treat the equation and the term so separately, since I seem to have that tendency in my latest re-reading of this material. Recursive terms like these are nothing but sponges that must absorb whatever structure shows up around it in the definition. ↩︎ I say confusing because I initially took this to mean something different based on the wording in the Wikibooks article. The article says Often, we only care about WHNF, so a function that evaluates its argument to at least WHNF is called strict and one that performs no evaluation is lazy. This confused me because it makes laziness sound like something we can see in the function body, and whether it uses that input in a particular way. As in, whether we explicitly take an evaluation step inside the function’s definition. The id function sounds like it’s lazy under this definition: it doesn’t evaluate x in the body, it just returns what it was given. While this definition is a little sloppy, my interpretation is what’s wrong. Whatever evaluation may take place in f’s body, it won’t take place until f itself is evaluated. So our quoted line is exactly the same thing as the “refined” statement Given that we’re forcing f x, does x get forced as a result? That is to say, once we actually try to make the term f x a concrete value, strictness means we’ll need to take x to a concrete value in order to get there. Another hint that the first line doesn’t make sense: just above we said evaluation only takes place during pattern matching. So unless f does some pattern matching when checking input values (in which case it’s obviously strict, regardless of how we’re interpreting our definition), the only evaluation that can take place in its body is if it passes that input off to another function call which needs to do some pattern matching. Point being, there’s no other canonical way for evaluation to even take place in the function body: we either evaluate when checking the input or call another function that does. So if our function f doesn’t pattern match on input, then our first definition is likening strictness to simply calling a function internally that does pattern matching. This would be a pretty shallow notion of strictness if that’s what we mean, i.e., strict functions are those that call functions that pattern match. id would not be strict under such a definition. ↩︎