Abstract algebra

Notice: Experimental page format This is a page from the latest site version, rolled out only on a select few preview pages (as of late 2025). Some things many be out of place or broken; sorry about that!
type wiki
created 2020-05-25
modified 2025-10-11 16:24
Images
placeholder
Sketches
placeholder

General properties

  • Associativity: the order in which operation evaluation is performed does not affect the result. That is, for a given expression, you can pick operations and “resolve” them, one by one, in any order. Note that we’re not saying we allow any order of the operands (that’s commutativity), but rather that the application order doesn’t matter.

    Associativity can be formally stated as follows: if, for all a,b,cSa,b,c\in S, the equation (ab)c=a(bc)(a\cdot b)\cdot c = a\cdot (b\cdot c) holds, then the binary operation \cdot on the set SS is associative. Note how this is the most simple distillation of our application invariance mentioned above: the operands remain in a fixed order, but parentheses can be re-written arbitrarily.

    Intuition: what does associativity buy us? Intuitively, it feels something like an associative operation can “see” the components of composite operands. The form (ab)c(a\cdot b)\cdot c takes aa and applies pieces step-by-step, i.e., first adding bb, then mixing in cc after. Associativity says we can first mix up bb and cc and give the result to aa, and it doesn’t change aa’s reaction: the bb and cc mixture is not a fundamentally new compound, but instead has transparent ingredients. In practice, this matters when we don’t want to have to worry about when we assemble the pieces of a composite result, so long as they have the same final order.

Algebraic structures

An algebraic structure is a mathematical object consisting of a set, operations defined over the set, and a set of axioms/identities satisfied by the operations.

Algebraic structures and their properties

General axioms:

  • Associativity:

  • Divisibility:

  • Identity: for every element in xXx\in X, there is an element eXe\in X such that xe=ex=xx\cdot e = e\cdot x = x.

    Note one obvious implication of this: our binary operation preserves the presence of all items in the base set, i.e., is “full.”

Structure types

  • Magma: (M,)(M, \cdot); a set MM with a single closed binary operation \cdot.

  • Quasigroup: a magma endowed with a notion of division, i.e., a pair (Q,)(Q, *) such that

    ax=b    ;    ya=ba*x = b \;\;;\;\; y*a = b

    are defined for unique x,yQx, y\in Q, with solutions written x=a\bx=a\backslash b and y=b/ay = b/a. The operations \\backslash and // are left and right division, respectively.

  • Unital magma

  • Semigroup: an associative magma, i.e., a set SS together with an associative binary operation.

  • Loop: a quasigroup with identity

  • Associative quasigroup

  • Monoid: a semigroup with identity, written as a triple (M,,e)(M, \cdot, e), where ee is the identity element.
    • A submonoid of a monoid (M,,e)(M, \cdot, e) is a monoid (N,,e)(N, \cdot, e) such that NMN \subseteq M. That is, it is a monoid defined around a base set that is a subset of the “supermonoid’s” base set, has the same identity, and remains closed under the inherited binary operation.

    • A subset SS is a generator of a monoid MM if the smallest submonoid of MM containing SS is MM. That is, SS needs to be a “tight base set” in order to be a generator; one can’t include SS in the base set to produce a monoid “smaller” than MM. Monoids with finite generators are said to be finitely generated.

  • Group:

Free monoid

The free monoid on a set (or alphabet) AA is the monoid (A,.,"")(A^*, ., ""), where .. is the string concatenation operator, the empty string is the identity element, and AA^* is the Kleene star (the unary operator *) applied to AA. We often refer to the free monoid as AA^* (which is a little confusing since one writes the monoid’s internal set as AA^* as well).

  • A graded monoid is a monoid that can be written as

    M=M0M1MnM = M_0\oplus M_1\oplus \cdots \oplus M_n

    which is to say it can be “factored” or “graded” as a collection of submonoids (i.e., it’s layered). All free monoids are graded; MiM_i contains the free monoid’s strings of length ii.

  • The “base set” AA over which the Kleene star (or Kleene plus) is applied contains the free generators of the free monoid AA^*. Here there’s a fairly clear analogy to vector bases: the free generators are “single-letter words” (i.e., characters) whose arbitrary combination forms all elements in the monoid. Further, the cardinality of the set of free generators is referred to as the rank of the free monoid (although each free monoid has exactly one set of free generators, while vector spaces can have many bases). We also call this free generating set a basis.

  • A code is a set of words CC whose Kleene star CC^* is a free monoid for which CC is a basis.

Free object

Free objects are to categories what bases are to vector spaces, roughly speaking. A basis is a fundamental characterization of a vector space, and linear transformations between spaces can be captured entirely by their values on that basis. Free objects are effectively objects with a known set of generators, analogous to a basis together with the vector space it spans.

In particular, let CC be a concrete category with a faithful (forgetful) functor U:CSetU: C \rightarrow \text{Set} (i.e., UU is an injection from objects in CC to sets). Let XX be a set, to serve as the basis of the free object. Then the free object on XX is a pair (A,i)(A, i), where AA is an object in CC and i:XU(A)i: X\rightarrow U(A) is the canonical injection. Here UU can be thought of as a map that “strips away” all structure but an object’s base set, and ii is therefore responsible for connecting AA’s basis (or generators) to items in AA’s base set. Free objects also satisfy the universal property that, for any other object BB in CC with a map g:XU(B)g: X \rightarrow U(B), there is a unique f:ABf: A\rightarrow B such that g=U(f)ig = U(f) \circ i. What this tells us is, if we want a map ff from objects AA to BB, it is uniquely determined by how we choose to map the AA’s basis XX into BB’s base set, U(B)U(B).

Intuitively, what we’re doing here is “peeling back” the layers of objects in arbitrary categories and establishing a fundamental link using the common language of sets. When a (free) object is generated by a known set XX, we can treat those items in XX as guiding axes that map into the core set-based representation of a target object. We can then “reapply” the layers/structure on top of this fundamental relation, getting a map between full objects for free.

Example: free monoids

Returning to free monoids, we can apply the details for free objects more generally as laid out above. Recall our general notation for the monoid as a triple M=(A,.,")M = (A^*, ., ``") defined over a set AA.

  • We have the forgetful functor U:MonSetU: \text{Mon}\rightarrow\text{Set} which simply forgets the operation and identity, i.e., mapping MM to AA^*.

  • AA is the base set of generators, the basis for the monoid.

  • The generator injection i:AU(M)i: A\rightarrow U(M) canonically maps each element of AA into the monoid. In this case, the symbols in xAx\in A are mapped to one-letter words, i.e., x[x]x\mapsto [x], as they appear in U(M)=AU(M) = A^*.

  • For any other monoid NN, along with a function g:AU(N)g: A\rightarrow U(N), we get a unique monoid homomorphism f:MNf: M \rightarrow N. This function has the specific form

    f(x1x2xn)=g(x1)g(x2)g(xn)f(x_1x_2\cdots x_n) = g(x_1)\cdot g(x_2) \cdots g(x_n)

    That is, the string x1x2xnx_1x_2\cdots x_n in MM gets mapped to the equivalent in NN after replacing each symbol xiAx_i\in A with the string in NN that it’s mapped to by gg.

For example, suppose we took the base set A={a,b,c}A = \{a, b, c\} for monoid MM, and B={x,y,z}B = \{x,y,z\} for monoid NN. Then a function

g(a)=x  ;  g(b)=y  ;  g(c)=zg(a)=x \; ;\; g(b)=y \; ;\; g(c)=z

determines a unique mapping between the entire monoids, and we can convert any string as was written above, e.g., aaccbbxxzzyyaaccbb \rightarrow xxzzyy. Note the flexibility in the assignment of MM’s base symbols to strings in NN (i.e., any element in U(N)U(N), which in this case is BB^*). For example, it’s just as valid to define gg as

g(a)=xx  ;  g(b)=yy  ;  g(c)=zz,g(a)=xx \; ;\; g(b)=yy \; ;\; g(c)=zz,

and conversion takes place as before under ff, e.g., accxxzzzzacc\rightarrow xxzzzz. Again, the universal property that applies here suggests that specifying gg uniquely determines how we map the monoid objects onto each other.

One additional observation to note here is how small the scope is for the morphism equivalence g=U(f)ig = U(f)\circ i. The domain of gg is MM’s alphabet (or base set), and that’s all we’re even allowed to specify. Therefore U(f)U(f) need only match gg at these points of definition, and it has no duty to uphold usual monoid homomorphism properties (namely identity and multiplication preservation). But once we “extend” that back to a full monoid homomorphism ff, it needs to match gg along the generators we’ve provided while also taking care to actually be a monoid homomorphism, by definition. All that to say, there is indeed extra structure that comes into the picture as we bring g=U(f)g = U(f) back into the category we’re working in, and those extra restrictions play a role in constraining the map to uniqueness1.

Natural transformations

Foreword: this section is adapted from a very large “development aside,” i.e., a section I wrote to address concerns with my intuition as I worked through relevant concepts. As such, it’s far less structured than the other sections on this page, but is central to my understanding of maps, in the most general sense.

Natural transformations feel so natural, I struggled to see why they were even explicitly defined. Perhaps more accurately, I had a poor understanding of the implications of the constraints that make up a natural transformation and what it’d mean to violate them.

As we’ve established, natural transformations are best introduced as morphisms in categories of functors, and the constraints that apply ensure that those morphisms behave in an appropriate way (according to the usual category rules, that is). In order to do that, we need to “break open” the objects they’re being applied to: the functors. This gives us the family of morphisms that apply to the objects inside the source/target categories of the functors. If you doubt this move, consider what else we really have to work with; to appropriately map between map-like things, one really must inspect what changes between the maps in question (i.e., for the same input, how do the maps’ outputs change?). If we can consistently characterize the change in outputs between two functors (which includes both objects and morphisms of the underlying categories), then we’ve got the thing we wanted.

The morphisms in the category of endofunctors are natural transformations

Intuitively, the naturality square ensures that we can naturally move around in categories before and after a functor is applied. It is a consistent bridge between categories induced by functors that implies the order of morphism and functor application doesn’t matter. The analogy I like here (although I came up with it, so it may be dubious) is that the coherence conditions ensure “safe passage” through chains of functors. If you’re moving from a source object to a target object via a series of morphisms across categories, as long as you have natural transformations between the functors linking those categories, you can apply your morphisms in whichever category you like: you can load them up all in the first category, do a morphism-functor-morphism-functor, or wait until the very end. They will all yield the same final object, and you never have to worry that you took the “wrong exit” via a particular morphism in a particular category that changes the dynamics for the rest of your journey.

Let’s look at a small, concrete example: functors between a few set-like objects. Here we have a category C\mathcal{C} with three sets X1X_1, X2X_2, X3X_3 and two morphisms fAf_A, fBf_B between them as depicted:

Example setup depicting categories ,  and functors  and  between them.

The diagram also shows two functors FF and GG that map between C\mathcal{C} and another category D\mathcal{D}, itself containing three set objects. You can see how the functors may change the objects/morphisms in different ways: over D\mathcal{D}’s objects (which, despite the labels of 1, 2, 3, don’t necessarily correspond to those in C\mathcal{C}; it’s just an implicit ordering so we can compare differences consistently), F(fA)F(f_A) and G(fA)G(f_A) apply to different source/target objects due to FF and GG mapping the objects differently (we’ll see this a bit more explicitly in the next diagram), i.e., FF maps X1X_1 in C\mathcal{C} to Y1Y_1 in D\mathcal{D} whereas GG maps X1X_1 in C\mathcal{C} to Y2Y_2 in D\mathcal{D}.

η\eta is a natural transformation between functors FF and GG…at least in name. We need to show that the relevant conditions are met; otherwise we might merely refer to it as a map between functors. Here we can look at the same example, but with additional detail relating objects:

Same setup as before, but with more explicit object mappings. Blue lines depict functor ’s object connections from  to , and green are ’s.

As stated before, natural transformations are defined over category components, and by definition we have a family of morphisms:

  • ηX:F(X)G(X)\eta_{X}: F(X) \rightarrow G(X) for any object XX in C\mathcal{C}. This is called the “component of η\eta at XX,” linking the two object targets under the functors for each source object XX.

    In D\mathcal{D}, it’s pretty easy to think of this collection/family of morphisms as the stitching holding the functors together. If we land anywhere in D\mathcal{D} by using FF, we can simply walk across these component morphisms to get to the equivalent under GG.

  • We additionally require that, for every morphism f:XYf: X\rightarrow Y in C\mathcal{C}, we have

    ηYF(f)=G(f)ηX\eta_Y\circ F(f) = G(f) \circ \eta_X

    In words, this says the following two processes are identical:

    • For any object F(X)F(X), map it to G(X)G(X) via ηX\eta_X and apply GG’s version of ff, therefore mapping into G(Y)G(Y) (recall that a functor AA on CC must map ff to A(f):A(X)A(Y)A(f): A(X) \rightarrow A(Y)).

    • For any object F(X)F(X), first apply FF’s version of ff thereby mapping into F(Y)F(Y), and then move to G(Y)G(Y) via ηY\eta_Y.

The last equivalence is often expressed by the following commutative diagram (where details are borrowed from our example, but it nevertheless reflects the general form of the naturality square):

Typical form for the naturality square.

So now we’ve contextualized the family of morphisms a bit more, and have a general sense for what it might mean to move “naturally” between functors. But the components ηX\eta_X seem to fall out pretty much automatically by simply associating functor target objects…right? When does this not hold? Let’s look at an example with concrete objects and morphisms.

Aside: morphisms, once and for all™

It’s worth noting that I needed this level of depth to bump into, yet again, a (seemingly persistent) fundamental misunderstanding of morphisms. Looking at the naturality square, I initially thought something like the following: “well both paths from Y1Y_1 get us to Y3Y_3, so things just seem to hold by definition.” The first part of that is perfectly accurate I suppose, but it’s missing the actual meaning of the statement. Morphisms are more than just a map from one object to another, as in they don’t just take in a whole object and produce another whole object. Getting to an object isn’t enough here, it’s getting to it in the same way. Put simply, I was thinking on too high a level, as if morphisms are necessarily maps that must ignore any inner components of the objects they’re acting on. This is likely me just misinterpreting the super high level of abstraction one faces when warming up to category theory. We write morphisms as maps f:XYf: X\rightarrow Y and rarely say anything more specific unless working in a specific category, so I pretty much took that mean what it says in the most general sense: a morphism ff takes in an object XX and gives out an object YY, as if there’s nothing more atomic to act on than entire objects. But of course, a morphism merely maps from an object to another with no explicit guarantees of completeness, and the way you get to the co-domain can vary. Morphisms in general can exhibit all the usual flexibilities of functions in Set\text{Set}, e.g., not strictly being surjective, and of course two functions f:XYf: X\rightarrow Y and g:XYg: X\rightarrow Y, despite sharing a domain and co-domain, need not actually map individual components to the same place (i.e., we can have f(x)g(x)f(x) \not= g(x) for any xXx\in X).

I faced similar issues over in Category theory§Pullback, with a similarly detailed accounting of what now feels like an incredibly basic point. This just goes to show how easy it can be to misconstrue even simple, foundational concepts. Ultimately I think this just stems from operating at a level of uncomfortable abstraction, and you tend to be very careful making assumptions about anything that isn’t otherwise stated outright. I think the fundamental disconnect here has been the fact that almost no category theory verbiage deals in elements smaller than objects, and I’ve hesitated thinking in terms of anything lower as a result. Perhaps this is a result of not having followed a clear introduction built on the back of sets, which perhaps would’ve made this whole thing a non-issue.

After a considerable amount of toiling, I finally feel okay with how we get from the very abstract and vague descriptions of morphisms to maps aware of object structure. I didn’t even think I took issue with this, but I got myself caught in a loop where I just couldn’t feel why maps shouldn’t be required to return full objects. We say generally that a morphism is a map from an object XX to an object YY, and that’s about as specific as we get (except for identity and associativity). If we’re operating at this level and saying nothing further about the structure of those objects, it feels pretty straightforward to interpret that as a map taking an object XX and giving us an object YY out. The problem is: it doesn’t mean that, or at least it doesn’t have to. Firstly, we stay at such an abstract level because it lets us say only as much as we need: as long as those top-level items are true, we have a viable morphism. Anything that falls under that jurisdiction should then be fair game; from here on out my problems mostly remained with the verbiage.

Here I was struggling with the idea that objects are atomic units that can’t be further decomposed. This is again missing nuance; we obviously have some kinds of objects, like sets, that can be seen as containers of other items. When we say objects are atomic, we effectively mean they’re atomic relative to the category theoretic statements being made. That is, they’re the “lowest level” construct those statements need to care about to be true, and they don’t care if there happens to be further structure when one looks closer. Now, actually defining classes of morphisms in specific categories is almost always a process of designing a map-like construct that is not only aware but often preserving of underlying object structure. So it’s not as if morphisms, in the way they’re actually applied, don’t look closer at the objects they apply to; they often must in order to remain consistent. It’s just that one can ignore those specifics and treat objects as black box units when it comes to making broad, category theoretic statements about the structures at play.

So decomposing is fine, but how about non-surjectivity? That still irked me a bit, when our top level statements seem to suggest we’re dealing in “whole objects.” It’s worth noting right away that any notion of “partial objects” can only present inside categories where objects can be decomposed further. Again, at the top level, we just don’t see inside the objects, and statements we make cannot apply to a deeper, internal level. So this non-surjectivity is merely a byproduct of behavior we allow inside of a category like Set, because it doesn’t break those top-level rules. If it did, sure, surjectivity would be important to apply everywhere, but our rules of composition and identity on morphisms work all the same whether we have our set function mapping onto the co-domain or not. It is of course perfectly natural to non-surjectively map between two sets, so we also actually want to be able to do this. Here I find embracing the word “to” is effective: morphisms map an object XX to an object YY. Under the hood, this may very well correspond to associating an element in YY to each element in XX; that’s perfectly allowed when defining morphisms on set-like objects. To get the entirety of this whole aside, the only thing you need to accept is the following: we are merely taking XX to YY, not (necessarily) transforming it 1-to-1, and that’s an acceptable meaning of the term “map.” Imagine how small XX can be and how large YY can be and this still make sense as a map. For example, let XX be my group of friends and YY be seat numbers in an NFL stadium. Our NFL tickets map each person to their one seat number, but that makes up a tiny fraction of all available seats and is obviously not “onto.” And it shouldn’t have to be for this relation to serve as a valid map between XX and YY, not only in the set-level or the category theoretic sense, but in the more pesky linguistic sense that got me into this mess in the first place. From the top-level, such a map really does get to be an arrow between the two whole objects XX and YY. How much the image of that morphism actually “fills out” YY just isn’t relevant: to even think about the items in the image requires set-level awareness that cannot be seen when dealing in statements where objects are atomic by definition.

Even with all this, I still find it slippery somehow. When I try to root out what’s wrong, I think it just comes down the notation of writing functions according to their domains/codomains, e.g., f:ABf: A\rightarrow B. It’s actually this that I take issue with, more than anything inherent to category theory or object level abstractions or whatever else. It really goes beyond functions: it’s with binary relations. The simple fact we can call a link between one object in AA to one object in BB a relation (which is actually a valid, minimal relation) is what I find kind of pesky and stupid. That just doesn’t feel like what we mean by relation. So something like the single pair (2, green) is a valid relation from integers to colors. By all accounts that’s an association from an integer to a color, but it doesn’t seem like what we’d ever mean by a relation R:IntColorR: \text{Int} \rightarrow \text{Color} (that notation is a little more conventionally functional than barebones relation, but the point stands). That is where I’m getting so hung up I think.

I’ve spent even more time on this, with a three part audio recording series to accompany it. But I’ve hit the root of the issue with the following, once and for all.

How morphisms look inside and outside of Set

(left side of diagram) Within Set, object structure is visible. Morphisms can be defined over the elements within the objects, and the above depiction includes (effectively) the simplest possible relation between the two (i.e., connecting a single element in X to a single element in Y). Two comments:

  • I’ve complained this doesn’t even feel like a relation (or what we’d mean by any such definition of one), let alone a morphism. But binary relations can be seen to act on all pairs in the Cartesian product between sets, mapping those with related items to True, and False otherwise. This is at least holistic; no elements in the domain or co-domain are systematically ignored just because they aren’t assigned a True value. That is, no elements “slip by” the relation: it can always formally be seen to make a Boolean decision for every possible pair.

  • If you don’t accept that this should “mature” to a morphism at the abstract level, notice that the line you’d then have to draw is arbitrary. How many items must be included for it to be enough, for it to count as a “map?” Notice that, regardless of how “full” your relation may be, as soon as you become blind to element-specific involvement, all you see is a bridge between the two objects. This is what we do as we move to the right diagram.

(right side of diagram) Outside of Set, back in the general abstract category theory viewpoint, we lack any insight into internal object structure. One can imagine masking off all element-wise knowledge we had while in Set, leaving the above: two atomic objects with some connection between them. How these objects are related is no longer even a well-posed question at this level of granularity; they simply are or are not connected, and the “endpoints” of the arrow can only be the entire object.

So all relations inside of Set, no matter how large, are viewed uniformly once we’re outside of Set. “Thin” relations mature to full morphisms in the abstract view (since they can’t be distinguished from any other relation), and from the abstract view a full morphism simply suggests any relation exists under the hood.

It’s worth noting that relations are not even valid morphisms in Set, as they do not generally respect associativity of composition. But once you accept that relations are, in the most basic sense, mappings between sets, then you’re past the issue of non-surjective functions and the idea of yielding “less than Y” via a morphism that doesn’t map onto the co-domain. That is to say: from the abstract point-of-view, any kind of connection between elements within Set simply looks like a connection between the set-objects themselves, and it says nothing about what that relation must look like once you’ve revealed the element-level details. Not involving all elements in your map is not something that can be seen outside of Set, so all relations between any source and target look the exact same.

Taking a look back at this some weeks later: I think I do a good job conveying what felt problematic and how I got past it. This is from the perspective of not really recalling (right away, anyway) what the problem was in the first place, and I feel satisfied by the end of this passage. Here I think it may be helpful to simply recap why we’re here in the first place: looking at diagrams like the naturality square, one gets the sense that simply following two trajectories that end up in the same object makes them equivalent, by the sheer fact they point to same place. But that alone doesn’t actually make those (composite) morphisms equivalent, since they’re allowed to have such distinct structure in arbitrary categories (like being functions in Set) while mapping from/to the same objects. So in commutative diagrams, like the naturality square, we’re not saying that any two composite morphisms are equivalent because they start and end at the same places, but instead that any two morphisms starting and ending at the sample place must be equivalent under the morphism structure in the category. That is, not just any morphisms with the right domain/co-domain will do; they must actually be the same in the underlying category in order to uphold the kind of constraints the diagram is talking about. This is annoying/confusing because the diagrams don’t really convey this: you simply see arrows between objects, and it seems to suggest that moving from/to the target objects is good enough. But in reality, they are merely abstracting away any category-level morphism specifics so as to make a category-independent statement, but crucially those specifics will still apply when operating in any such category. That is to say: once you’re actually operating in a specific category, the known structure of its objects and morphisms comes into view, and you cannot ignore that revealed structure. I routinely find myself thinking in terms of Set when staring at commutative diagrams, and it’s a mistake to mix the two levels of abstraction (e.g., allowing oneself to think of the objects as sets, but continuing to treat morphisms as vague, non-functional maps; if you embrace the former, you must also work with the latter, else you’re mixing levels of abstraction).

Here we’ll exploit a breaking condition for natural transformations: when the component ηX\eta_X (i.e., that which connects F(X)F(X) to G(X)G(X)) depends on static data from the object XX itself. Here we’re manufacturing an unnatural dependency, one that allows F(X)F(X) to map to G(X)G(X) but in a way that does not act “fairly” across XX (cheating with knowledge of XX’s values rather than just its structure).

(un)natural transformation example in Set.

The example setup is as follows: FF and GG are both the identity endofunctor on the category C\mathcal{C}. C\mathcal{C} contains two objects in Set: X={1,2}X=\{1,2\} and Y={3,4}Y=\{3,4\}, as well as a morphism f:X1X2f: X_1\rightarrow X_2 that is the constant function f(x)=4f(x) = 4. We then define the component morphisms ηX:XX\eta_X: X\rightarrow X and ηY:YY\eta_Y: Y\rightarrow Y to be the constant functions mapping to their respective set’s smallest element, i.e., ηX(x)=1\eta_X(x) = 1 and ηY(y)=3\eta_Y(y)= 3.

This is effectively a contrived example to ensure ηX\eta_X and ηY\eta_Y disagree somewhere along the chain from F(X)F(X) to G(Y)G(Y): XX’s values map to YY’s second value, whereas YY’s values map onto its own first value. Starting from the value of 11 in XX, we have the following two trajectories:

Disagreeing trajectories across (un)natural transformations and morphisms; naturality square does not commute.

An even simpler demonstration of this is to take ηX\eta_X as shown, and the constant function that maps to XX’s other element, i.e., f(x)=2f(x)=2. One can then see how the naturality square breaks just over the object XX when the order of those two morphisms are swapped: 1121\rightarrow 1\rightarrow 2 vs 1211\rightarrow 2\rightarrow 1

So what does this tell us? In this situation, we can see that the order of application matters. The map that is natural-transformation-then-morphism is different than the morphism-then-natural-transformation: their composite morphisms do not yield the same map in total. For consistency in component morphisms, we generally require that the choice of map acts on objects in a coordinate-free manner. That is, it must be equivariant with respect to the morphisms in C\mathcal{C} (by definition).

  • Breaks when the map is dependent on static data from the object itself

  • not sufficiently relative; knowledge of the operating object “overwrites” the work done

  • needs to be sufficiently local, sufficiently relative; shouldn’t depend on external object structure.

  • imagine first picking morphisms, then η\etas; and vice versa. This illuminates my confusion a bit since both feel fine but they should get us into an equal amount of trouble

On equivariance: invariance is observed when a property of a mathematical object remains unchanged under certain kinds of transformation. Such a property is called an invariant of the object with respect to the transformation. Take, for instance, the area of a triangle: this is an invariant property of the shape under transformations like rotation and reflection. Equivariance is a slightly looser notion of invariance, not requiring a property to remain wholly unchanged under transformation, but to itself be transformed along with it in a predictable way. That is, transforming the object and its property in the same way respects the definition of the property. Keeping with the triangle example, one can look at the triangle centroid. This effectively moves with the triangle and will change under translation. So while it’s not an invariant (numerically the same), it will change with the triangle under the same translation.

Structure-preserving maps

To preface: equivariance is easily framed from the perspective of structure-preserving maps, which is a term commonly thrown around when speaking about morphisms. Therefore, this too has been a slippery topic for me in the past, and I’ve spent a sizeable chunk of time ensuring I have a comfortable intuition around this. Structure preservation, equivariance, and natural transformations are heavily intertwined; I find it helpful to frame latter two as structure preservation through entire classes or groups of transforms; we’ll formalize this afterwards. For now, I want to clarify the many, many places I’ve found myself stuck on structure-preserving maps and leave no doubt about how we should interpret this moving forward.

For starters: what we mean by “structure-preserving,” in the most general sense, can’t really be boiled down to anything more precise. As mentioned, we often call such things, in the abstract, morphisms. In that sense, one might imagine a structure-preserving map need only operate from/to the same kind of object: the very fact we “end up” in the (structured) object suggests we’re upholding necessary structure. But this is not generally what we mean, or is at least often not sufficient. That is to say: being a map with the appropriate external “signature” (i.e., mapping to/from the right kind of object) is alone not enough, as we care about how we move between the objects (and whether that movement preserves some structure within them).

Take, for instance, addition over the integers. Suppose we have the groups G=(Z,+)G=(\mathbb{Z}, +) and H=(Z,+)H=(\mathbb{Z}, +), and a map f:GHf:G\rightarrow H defined f:xx+1f:x\mapsto x+1. Clearly “pushing” all integers through ff still just gives you Z\mathbb{Z}, and so in a very loose sense upholds “group structure” insofar as there is still a valid group on the other side. But the way we move under ff does not preserve addition (which is really all we mean by “structure” in this case): for any a+b=ca+b=c in GG, we do not have f(a)+f(b)=f(c)f(a)+f(b)=f(c) in HH. So even though ff moves from a valid group to a valid group, it does not align the elements of the groups in a manner that is consistent with the addition operator.

As a reminder, ff is unary in Z\mathbb{Z}. Even if our binary operation (addition) had arbitrary arity, ff is a still a bottleneck of sorts through which each integer individually passes. So applying ff to individual elements in the underlying set is the only way to apply the transform at all. I’ve found myself occasionally thinking that ff could have distinct joint behavior, similar to the operator, but this isn’t true and recognizing as much helps simplify things. When I look at the required equality here, i.e.,

f(a+b)=f(a)+f(b),f(a+b) = f(a) + f(b),

I’ve let myself question whether this is the only constraint that could mean ff is structure preserving. But recalling how ff can only operate on single items in the underlying set, it becomes clear this is really the only thing you even can say in the first place. We use ff to map the items of one group onto another one-by-one, and the question is simply whether related items in the source group remain related in the transformed group. aa must go to f(a)f(a), bb must go to f(b)f(b), and the only real question from there is whether a+ba+b goes to the same place as f(a)+f(b)f(a)+f(b).

That last bit, the only salient part, has made me nervous for reasons I can’t easily articulate. I think it just somehow feels mysterious every time: I have to unpack it and check that it makes sense, and I get lost along the way. Either that, or as I mentioned before, the idea that there could be some other way to express structure preservation that I’m not able to see bugs me; I don’t have good intuition that the form is “tight.” At risk of restating what has already been said, the following might help:

We’re looking for a way to relate the transform ff to the operation at hand. Both need to be involved for us to establish any such link, and it’s critical to note that we’re observing what happens with the trajectories themselves under ff. It’s not ff’s image, it’s not its co-domain; these are more like aggregate consequences of having applied ff to every item. Working with ff’s image or its co-domain ignores how we got there, and would therefore say nothing about ff at all. The entirety of the setup also avoids relating the transform and the operation: aa and bb are just any two items in the source set, f(a)f(a) and f(b)f(b) are the items we map them to, and both a+ba+b and f(a)+f(b)f(a)+f(b) are the results of applying our operation on those pairs. Thus far, we’ve said nothing about how ff and ++ interact, we’ve merely applied them to eligible operands/inputs. Now, if ff is going to respect ++, the only thing that matters is that it takes a valid application of the operation to another valid application of the operation: that’s just about all one could mean when saying “ff preserves the operation.” If it took three items a+b=ca+b=c to three items that couldn’t be related via addition, then ff would clearly not be compatible with the behavior of that operation. In such a case, I couldn’t rely on my operation as a means of relating items under the movement/flow of ff. The operation is like a stitching in the set-fabric between items (a,b,a+b)(a, b, a+b), and preserving it means the same stitching is present when moving the fibers through ff. This can only mean that I produce related items (f(a),f(b),f(a+b))(f(a), f(b), f(a+b)): after I’ve taken my stitches to their new locations via ff, I find they’re still intact.

That last bit I find the most helpful. Taking a concrete thing like ((a,b),a+b)((a, b), a+b) (the operands and their output) removes the burden of feeling like there’s some nebulous structure coming along with the operation for which I need to have developed intuition. That pair is the structure; it’s like an extensional view of the operation (basically its graph). So every such pair that holds true in the source set is a little knot/stitch that locally exhibits the structure we’re trying to preserve, and we cannot untie it under ff’s movement. If ff in fact preserves the desired structure, it will move every such knot to another valid knot, locally preserving our operation dynamics. Note that ff doesn’t directly map knots to knots, though; it applies uniformly on an element-level basis. Therefore it takes our knot

((a,b),a+b)((f(a),f(b)),f(a+b))((a, b), a+b) \rightarrow ((f(a), f(b)), f(a+b))

In order for that right-hand tuple to be a “valid knot” under ++, it must be consistent with the structure of the left-hand side, i.e., have an output element that is the sum of the two operands. So if ff preserves the knot, the item in the output slot, f(a+b)f(a+b), must be equal to the sum of the input items, i.e.,

f(a+b)=f(a)+f(b),f(a+b) = f(a)+f(b),

which is the usual constraint. Let’s take a look at a concrete example:

Transformation  not preserving the  operation. Moving from left to right, we see legitimate application of  for the shown operands;  doesn’t change that. But  does not respect the structure of , since it maps our operands and their sum to values that cannot be associated via addition.

Take the red bridge along the top: this is a “local knot” that ++ ties. That is, it exemplifies the kind of structure that ++ induces, and as a result that which we want ff to preserve. But as we move down the diagram via ff, we see that it “unties” the knot, giving us transformed operands whose sum does not match the transformed sum. The red bridge along the bottom shows the knot that would be valid under our transformed inputs. It would be equally reasonable to consider the kinds of inputs ff could’ve “chosen” to reach the needed output. In either case, ff does not align them correctly: it doesn’t map a ++-knot to a ++-knot.

For the linear transformation f(x)=2xf(x)=2x, on the other hand:

Transformation  preserving the  operation.

Here we find the map does respect the ++ operation (at least over the specific operands we’ve plotted): the knots remained tied through ff.

Moving back out, recall that we say generally that, for a transformation f:ABf:A\rightarrow B and kk-ary operations μA:AkA\mu_A: A^k \rightarrow A and μB:BkB\mu_B: B^k \rightarrow B defined over AA and BB, respectively, ff is said to be structure-preserving if the following holds:

f(μA(x1,,xk))=μB(f(x1),,f(xk))f(\mu_A(x_1,\dots,x_k)) = \mu_B(f(x_1),\dots,f(x_k))

This is very general, and worth exploring the dynamics when ff is more like a binary relation. In this case, I found myself thinking of addition in modular arithmetic. Here we have an operation that behaves similarly over entire classes of values: rather than observing the “paired” association of structure (x,μ(x))(x,\mu(x)) with (y,μ(y))(y,\mu(y)) through ff (paired as in we relate one “bit of structure” with just one other), we now say something like the structure (x,μ(x))(x,\mu(x)) can be associated with many occurrences (z,μ(z)),zx(z,\mu(z)), z\sim x, under the relation \sim. Nothing is fundamentally different here save for our map being “wider” in a sense: it links together more structure across objects.

Looking at a rough sketch of congruence modulo kk, we can imagine “folding up” Z\Bbb Z to give us kk new classes of items:

Congruence modulo

It’s over these equivalence classes that we observe consistent structure under addition. That is, the relation that induces this partition respects the operation: we can first add values in Z\Bbb Z and map to their equivalence classes, or first map to equivalence classes and “add” them2.

Structure preservation under a relation-like map

The diagram above attempts to show what’s happening as move from items to their equivalence classes (going down), and the ways we add those objects (moving right). Note the difference in the 2D representation here compared to similar diagrams above: x1x_1 and x2x_2 are both meant to be kk-dimensional, and we’re simply laying out our objects in the grid there. Note also the notion of addition used over equivalence classes is the Minkowski sum, showing how the operation being respected can show up in slightly different forms between μA\mu_A to μB\mu_B. In any case, the high-level take away here is that our structure-preserving maps can be more than just functions: they can associate many target objects to a single source, and this naturally abides by the same principles we’ve discussed above. In this particular case, we’re saying that whatever the equivalence class “allegiances” of ++’s operands and output are, they will be respected: addition is consistent up to class assignment. Put another way, ++ plays along with the new notion of identity induced by the equivalence relation (color in the diagram). If a blue item and green item add up to a red item (following the top bridge), then adding any blue and green items will yield a red item (following the bottom bridge).

In total, the relation-like map explored here helps broaden my intuition, shedding light on how general a structure-preserving map really can be. If we’ve allowed a generalization from pairwise associations through functional maps to more “class-like” associations via relational maps, you might naturally ask whether we can do something similar over the operation. That is, can a map respect a class of operations or transformations? This is pretty much exactly what we do with equivariant maps, and we rely on group theory to formalize valid collections of transformations through the same language of operations (i.e., the group action). We explore this in depth below; that’s what kicked off this aside in the first place.

One final remark to potentially keep in mind: the notion of structure preservation is really more of a loose hierarchy. Questions like which structure or how much of it can be naturally addressed by strengthening or weakening one’s requirements in a given setting. What is meant by “structure-preserving” is generally very flexible and, within the general framework we’ve formulated here, can accommodate any object, transformation, or interpretation of “map.” One might attempt to formulate a map between groups and find that it fails to be a group homomorphism, but it can perhaps still be seen as a map preserving structure of sets or even some other relaxed notion of symmetry. One also needn’t have maps that are particularly well-defined, e.g., analogous to surjectivity/injectivity; even trivial, information-destroying maps can be structure-preserving over the “places” where they’re defined. All in all: structure preservation is very fundamental in abstract algebra, and having a strong intuition for what it means is important (even if I don’t yet have that, but I sure have spent a lot of time toiling). But it is ultimately incredibly flexible and context-dependent, applying across many different layers of abstraction.

Emerging from the aside on structure-preserving maps, the definition of equivariance feels pretty familiar. An equivariant map is one that preserves some notion of symmetry over its source and target objects. We capture this expression of symmetry as a symmetry group, and the preserved operation is the group action on the objects in the domain and co-domain.

In particular, we say that an equivariant map ff respects the group action of a group GG over its GG-sets. This is similar in form to that of structure-preserving maps, but with a stronger requirement that structure is held up across all gGg\in G (rather than just a single operation; instead, we have a whole family of operations). That is, for a group action \cdot and for all gGg\in G, we have

gf(x)=f(gx)g\cdot f(x) = f(g\cdot x)

Again, this just says that the movement through ff doesn’t break the structures in place between objects, which in this case is the symmetry represented by the group’s transformations. If we label the output of the group action x=gxx^\prime = g\cdot x, we can think of this structure as manifesting concretely in the pair (x,x)(x, x^\prime). If our map is structure preserving, we need to see precisely the same structure on the other side of ff, i.e., when mapping xx and xx^\prime we find the results also “meet the criteria” for being bundled up as a tuple. Such a pair would be (f(x),f(x))(f(x), f(x^\prime)) which must exhibit the same relationship between the first and second element as we had before ff, i.e., f(x)=gf(x)f(x^\prime) = g\cdot f(x). Expanding xx^\prime gives us our form above.3

Note that a GG-set is just a set acted upon by GG: it’s effectively a formal construct that bundles those two together (the set and the group action) such that we get a standalone object. This is just the group equivalent of the “set plus structure” we saw before, e.g., (Z,+)(\mathbb{Z}, +): we need a way to bring along more than just a set. (Possibly confusingly, (Z,+)(\mathbb{Z}, +) is itself a group. The operation being respected there is what induces the group; with GG-sets we’re working with a group action rather than the group’s operation, ultimately more like a functional than a function.)

We can see a quick example with points (or shapes) in R2\mathbb{R}^2. We can apply the rotations in C2C_2 (i.e., just 0^\circ and 180^\circ) and leverage an arbitrary map ff between shapes. Here we have a square {1,1}2\{-1,1\}^2 and map to a triangle by projecting points with positive yy coordinates onto the yy-axis, i.e., (1,1)(0,1)(1,1) \mapsto (0, 1) and (1,1)(0,1)(-1,1) \mapsto (0, 1):

Square  triangle under

Note that the orientation of the diagram is transposed relative to the form we’ve been using for the naturality square and/or above diagrams. Here we see that ff maps the same orientation of the square to the same orientation of the triangle, but our 180^\circ rotation changes only the orientation of triangle. The map ff therefore does not respect the structure of C2C_2’s group action (the “family” of operations it represents). On the other hand, if ff maps to another shape with 180^\circ rotational symmetry, we would find the group action is respected. Take, for instance, an ff that simply projects the points onto the xx-axis:

Square  line under

This is equivariance: a map that changes related objects such that the objects on the other side remain related in the same way. And so we return to natural transformations:

Natural transformation components  and

Here μ\mu is our structure-preserving map: where there’s “structure” in F(C)F(\mathcal{C}) (which just means related object pairs and how they’re connected via morphisms), that same structure analog is present on the other side, i.e., in G(C)G(\mathcal{C}).

As seen with equivariant maps, μ\mu respects structure of not only a single operation, but instead a whole collection of transformations, which in this case are the morphisms in C\mathcal{C}, or HomC\text{Hom}_{\mathcal{C}}. To be precise, μ\mu respects how structure present in HomC\text{Hom}_{\mathcal{C}} shows up in D\mathcal{D} through the functors FF and GG; it doesn’t interact with that structure in C\mathcal{C} directly.

This lines up nicely with the group theory analog discussed for equivariant maps: we had an abstract (symmetric) group GG whose transformations were first made concrete through its group action on specific GG-sets. ff always maps from/to specific objects, and GG’s structure must first be “realized” on those objects (producing the concrete GG-sets) in order for us to check how ff will behave. That action of “realizing” structure is what matters here, since it’s coming from the same base group: however gg\cdot presented for both objects XX and YY, the resulting transformations in both “universes” can be fairly compared as they’re based on the same fundamental transformation in GG.

The exact same thing is happening here with our base category C\mathcal{C}: it can be seen as a kind of abstract category that is realized through the functors FF and GG into the category D\mathcal{D}. Everything from there on out takes place solely in D\mathcal{D}, just as we didn’t strictly need the group GG provided we had GG-sets (i.e., objects that already contain the relevant realizations of GG) in the case of equivariant maps. Put another way: once we “project” C\mathcal{C} onto D\mathcal{D} via functors FF and GG, we no longer need C\mathcal{C} directly. We only need to know which objects/morphisms correspond to the same structure across FF and GG, i.e., which realized objects/morphisms originate from the same objects/morphisms in C\mathcal{C}.

In total, we have a μ\mu that, through its applicable components (i.e., on objects in C\mathcal{C}), respects the structure induced on its domain/co-domain by C\mathcal{C}. In the diagram, XX and YY are any two objects in C\mathcal{C} related by morphism, and for all such pairs of objects together with each of their morphisms fHomC(X,Y)f\in \text{Hom}_{\mathcal{C}}(X, Y), μ\mu must uphold that structure in Im F\text{Im } F and Im G\text{Im } G , i.e., all of C\mathcal{C}’s structure.

Whiskering

Loosely speaking, whiskering is the composition of functors and natural transformations. Suppose we have a natural transformation η:F    G\eta:F\implies G between two functors F,G:CDF, G:C\rightarrow D, along with another functor H:DEH:D\rightarrow E.

Whiskering setup:  and

Whiskering HH and η\eta yields the natural transformation Hη:HF    HGH\eta: H\circ F\implies H\circ G, where (Hη)X=H(ηX)(H\eta)_X = H(\eta_X):

Whiskered map

This is simply the natural mapping between the two available composed functor trajectories. The new map HηH\eta has components (i.e., (Hη)X(H\eta)_X) that are the components of η\eta mapped through HH (i.e., H(ηX)H(\eta_X)). More explicitly:

Object-based whiskering setup:  and

This shows η\eta’s components as they’re mapped by HH, relating objects H(F(X))H(F(X)) and G(F(X))G(F(X)). HηH\eta is a natural transformation between functors HFH\circ F and HGH\circ G; this is a bit more clear when isolating the full passthrough:

Isolating the whiskered map between functors  and

Vertical composition

Suppose we have functors F,G,H:CDF, G, H: C \rightarrow D and natural transformations η:F    G\eta: F\implies G and ϵ:G    H\epsilon: G\implies H.

Vertical composition setup:

We can compose the transformations η\eta and ϵ\epsilon to produce the map ηϵ:F    H\eta\circ\epsilon: F\implies H. That is, we get a single map between FF and HH by gluing together intermediate natural transformations component-wise, i.e.,

(ϵη)X=ϵXηX(\epsilon\circ\eta)_X = \epsilon_X\circ\eta_X

Object-based vertical composition setup

Here we depict vertical composition as “stacking” functors with the same source and target categories, compacting them to produce direct maps between indirectly connected functors.

Horizontal composition

Suppose we have functors F,G:CDF, G: C \rightarrow D and J,K:DEJ, K: D \rightarrow E, with natural transformations η:F    G\eta: F\implies G and ϵ:J    K\epsilon: J\implies K.

Horizontal composition setup:  and

The horizontally composed natural transformation ϵη:JF    KG\epsilon * \eta: J\circ F\implies K\circ G is defined component-wise:

(ϵη)X=ϵG(X)J(ηX)=K(ηX)ϵF(X)(\epsilon * \eta)_X = \epsilon_{G(X)}\circ J(\eta_X) = K(\eta_X)\circ \epsilon_{F(X)}

Object-based horizontal composition setup

Notice how this is (loosely) something like generalized whiskering: rather than a single natural transformation, we connect composed functors through multiple natural transformations across different categories. The same principle applies: the composed map ϵη\epsilon *\eta lets you “stop off” and move through ϵ\epsilon or η\eta.

Horizontal composition setup

On the left, we see how ϵF(X)\epsilon_{F(X)} connects the same object, F(X)F(X), across both functors JJ and KK; this is our usual intuition for natural transformation components as morphisms. Once we’re at KK’s object for F(X)F(X), we can take KK’s version of ηX\eta_X to move from FF to GG. The right side takes an alternative route: first moving from F(X)F(X) to G(X)G(X) via JJ’s version of ηX\eta_X, then taking ϵG(X)\epsilon_{G(X)} to travel to KK.

Either way, it’s worth recalling that natural transformations are composed of morphisms between associated objects under the source and target functors. For an object XX, the morphism components link F(X)F(X) and G(X)G(X). In the case of horizontal composition, we’re defining a natural transformation component-wise that links objects (JF)(X)(J\circ F)(X) and (KG)(X)(K\circ G)(X) in the usual sense, connecting functors JFJ\circ F and KGK\circ G. We simply recognize there are multiple ways to establish this connection provided we have η\eta and ϵ\epsilon: move by J(η)J(\eta) then ϵ\epsilon, or ϵ\epsilon then K(η)K(\eta). Under the hood, these movements should be exactly the same, and constitute the morphisms in our new natural transformation.

Informal remark: this is clearly a bit more involved than vertical composition. Vertical movement (specifically moving down) in our diagrams represents movement between functors along a natural transformation. Vertical composition, then, in name alone, sounds more natural…and it is. There we simply stack transformations and objects pass straight down, the beginning and end of the route is easy to connect. Horizontal composition, however, requires movement across multiple categories, with routes that move both right and down. Tracking objects is therefore less trivial, with multiple possible paths. A new, abstracted natural transformation is possible to define all the same.

Summary:

  • Whiskering: maps between functors facilitates maps between composed functors; stop off to take the defined natural transformation, then complete the journey.

  • Vertical composition: indirectly connected functors (over the same categories) can be connected directly

  • Horizontal composition: generalized whiskering (roughly speaking); maps between functors can be chained to map between composed functors.

Monad

As the (apparently) infamous definition goes: a monad is a monoid in the category of endofunctors4. We can try to set this up step-by-step:

  • An endofunctor of a category CC is a functor from the category to itself.

  • The category of endofunctors treats those functors as its objects, and there are morphisms, as in any other category, between them5. The category of functors between two categories AA and BB is denoted [A,B][A, B]; therefore the category of CC’s endofunctors is written [C,C][C, C].

  • Taking that category, we recall what makes a monoid (at least, as we’ve studied it in Set\text{Set}): associativity and identity. Functor composition and the identity functor satisfy these requirements:
    • The identity functor 1C1_C in the category CC maps each object/morphism to itself (and is therefore an endofunctor). For an endofunctor FF in [C,C][C, C], F=F1C=1CFF = F \circ 1_C = 1_C \circ F. Therefore 1C1_C satisfies our requirements for identity on objects in [C,C][C, C].

    • By virtue of the fact that functors serve as morphisms in categories of categories, composition of functors is associative as is generally required by morphisms.

    We say that the identity functor and composition induce a monoidal structure on [C,C][C, C]. Note that we generally write monoidal categories as the triple (C,,I)(C, \otimes, I) (where \otimes is a bifunctor), containing monoid objects (T,μ,η)(T, \mu, \eta) where

    • μ:TTT\mu: T \otimes T \rightarrow T is the multiplication morphism

    • η:IT\eta: I \rightarrow T is the unit morphism

    subject to some associative and identity constraints (loosely). So our category of endofunctors over CC can be written explicitly as the monoidal category ([C,C],,IC)([C, C], \circ, I_C), and a monad on C is a monoid object (T,μ,η)(T, \mu, \eta) in that category.

  • The morphisms between functors are generally referred to as natural transformations. These transformations preserve the internal structure (i.e., composition of morphisms) in the underlying categories.

Let’s look at the monoid object a bit more closely:

(C,,I)(C, \otimes, I) is a monoidal category, i.e., a category CC endowed with an associative bifunctor :C×CC\otimes: C\times C\rightarrow C. C×CC\times C is a product category, with objects as pairs of objects from CC and morphisms as pairs of morphisms between individual objects in the object pairs. The monoidal part of this just follows from our set-based definition, i.e., having an associative binary operation (bifunctor) defined over elements (objects, morphisms) of the set (category).

Our monoidal category of interest is ([C,C],,IC)([C, C], \circ, I_C), i.e., the endofunctors on CC together with (associative) functor composition. An object TT in this category is an endofunctor T:CCT: C\rightarrow C, and we have some defined morphisms for that object (which are actually natural transformations; recall that natural transformations are morphisms in categories of functors):

  • Unit η:1CT\eta: 1_C\rightarrow T; tells us how every object/morphism in CC is involved or transformed by TT. It serves as an “injection” of values into the monad.

  • Multiplication μ:T2T\mu: T^2\rightarrow T; a reduction/flatten/join-like transformation, telling us how to flatten out doubly nested monadic structures back to “regular” monad objects.

Monads as -endofunctors, with its natural transformations  and
Object-level view of a monad, explicitly showing objects as functors. Put crudely: monads are associative maps between associative maps between associative maps.

Above we visualize these as morphisms between functor objects in the category of endofunctors on CC. These are the canonical “movements” between functors that need to be defined, but we can whisker in TT to move up and down levels of composition:

Whiskering combinations expanded

Nested levels of structure can always be “unwrapped” into their component pieces, until either μ\mu or η\eta can be applied to one of the items. The coherence conditions relate the different possible whiskering orders, requiring that composed maps are equal:

  • For natural transformations T3TT^3\rightarrow T:

    μTμ=μμT\mu\circ T\mu = \mu\circ \mu T

    Coherence condition for

    This condition is analogous to associativity for monoids. We’re starting from TTTT\circ T\circ T, and can write parentheses:

    • T(TT)T(μ)T\circ (T\circ T) \rightarrow T(\mu)

    • (TT)T(μ)T(T\circ T)\circ T \rightarrow (\mu) T

    Where we draw parentheses is where we’re allowed to apply μ\mu (this is not suggesting the left and right side are equivalent; just a depiction of how the forms fall out when it comes to associativity). This is initially a bit confusing because μT\mu T and TμT\mu just look like a commutative swap, when in fact they are two distinct maps resulting from application of μ\mu at different places.

    Clarity: this is something like doing

    (a+a)+a=a+(a+a)    (2a)+a=a+(2a)(a+a)+a = a+(a+a) \implies (2a)+a = a+(2a)

    When we look at the right side, the statement appears commutative in nature: we’re just swapping operands. But this is downstream from an associative statement, and the simplification on its own is a bit misleading.

  • For natural transformations TTT\rightarrow T:

    μTη=μηT=1T\mu\circ T\eta = \mu\circ \eta T = 1_T

    where 1T1_T is the identity natural transformation on TT.

    Coherence condition for

    This condition is analogous to identity for monoids.

So just like we wrote the monoid (M,,e)(M, \cdot, e) for sets, we have the categorical equivalent in (T,μ,η)(T, \mu, \eta). TT is the underlying object (an endofunctor), μ\mu is a binary operation (natural transformation on T2T^2), and η\eta is an identity element.

Generalizing arity

Small confusing point for me here: the analogy of μ\mu to a binary operation, in the usual sense, was unclear to me. We have μ:T2T\mu: T^2\rightarrow T, but T2T^2 is still just a functor, i.e., TTT\circ T, so it feels just like a unary operation defined over any other functor, e.g., some natural transformation f:TTf: T\rightarrow T. That is, μ\mu isn’t defined over pairs of objects, nor does it map explicitly from two functors; composition can at least be seen to operate on two functors (FA,FB)FC\circ(F_A, F_B) \mapsto F_C. μ\mu, however, doesn’t operate explicitly on two items.

A quick, perhaps obvious thing: maps can always be seen as operating on one object. Even multivariate functions can just be seen to map from a single nn-tuple, rather than being a construct capable of accepting nn separate objects. Of course, this changes nothing about what any map of this kind is capable of, but it’s a simplifying perspective that helps open new possible interpretation for the term “arity.”

We might now say the following: a map’s arity isn’t simply the number of arguments it can accept. Rather, it’s a quantifiable property of the domain of objects over which it’s defined. Addition over the reals, +:R2R+:\Bbb R^2\rightarrow\Bbb R, remains the same 2-ary operation in the usual sense: it maps from the space of 2-tuples, explicitly pairing up two items. Implicitly here R2\Bbb R^2 refers to the cartesian product R×R\Bbb R\times\Bbb R, and the arity of the resulting map is aligned with the dimensionality of the product space domain. But what if we had an operation other than ×\times, e.g., \otimes?

This is precisely the question that leads us to analogizing μ\mu with a (typical) binary operation. In monoidal categories, we saw above that :C×CC\otimes: C\times C\rightarrow C is a bifunctor. (Note how ×\times is used in the definition of \otimes; this whole “arity generalization” only really applies inside monoidal contexts, so we have to get there first with existing machinery.) This can be seen to operate on pairs of objects (and morphisms) in CC, mapping them to other objects (and morphisms) in CC. OOO\otimes O is simply us applying \otimes to the same object OO, and this is the new form of the domain for the binary operation-like map μ\mu. We’re simply generalizing the way we build the domain from two objects by letting our bifunctor \otimes “put things together” instead.

In total, for our multiplication morphism μ:OOO\mu: O\otimes O\rightarrow O defined for monoid objects in any monoidal category, we have a map that doesn’t strictly look like a binary operation in the usual sense (hence this messy aside). It is nevertheless constructed around a domain that is built from two objects with \otimes, and that’s good enough for a general analogy to binary operations. It’s worth noting that for monads, in particular, \otimes is fixed as functor composition \circ, and μ\mu’s domain T2=TTT^2 = T\circ T is still just a functor. μ\mu’s “arity” is therefore more akin to levels of nested composition rather than number of dimensions or arguments: it flattens.

Whiskering μ\mu and η\eta (detailed)
Whiskering  and
Whiskering  and

When grappling with the intuition behind the coherence conditions, or really just the general positioning of monoid objects, I find the following helpful:

Monoidal categories are associative safe spaces. They allow one to rewrite the rules of association, and with that new mechanism ensure things behave in a familiar, expected way. Very simply put: you start with a category CC, a collection of objects and relations between them. You then define the notion of association \otimes, a bifunctor, which combines pairs of objects in CC into new CC objects. You’re now in a world endowed with a new “product substrate” for defining “binary operations:” you take an object AA from that world, and a binary operation on the object maps from AAA\otimes A to AA.

Recall that typically, i.e., in Set\text{Set}, binary operations are defined on specific sets, having structure f:S×SSf: S\times S\rightarrow S. The movement to the categorical generalization…

  1. loosens the kinds of objects we use, relaxing to categories other than Set\text{Set}, and

  2. generalizes the means of association, relaxing from ×\times to \otimes, allowing for more than just simple object pairing.

In short: monoidal categories are universes with their own notion of “productization” (under the monoidal product \otimes). This establishes the foundation upon which binary operations on objects in that universe are built. Put another way: monoidal categories redefine the notion of association (what a pair of objects is), and binary operations define means of combination under that paradigm. The former describes how we put objects together (ambient productization; a,b(a,b)a,b\mapsto (a,b), f,ggff,g\mapsto g\circ f, whatever), and the latter is an action of combination hip to that notion of togetherness (internal combination; (a,b)ab(a,b)\mapsto a\cdot b, etc). (It’s not so easy to give a simple form for operating on combined map-like objects such as gfg\circ f; such an operation would typically be defined on an element-wise basis, i.e., if (gf)h(g\circ f)\mapsto h, we’d talk about how we map between items g(f(x))g(f(x)) and h(x)h(x) rather than something more abstract.)

Monads are what we get when we want our objects to “understand” a whole type of (other) object (i.e., a category): no choice of object AA in AAA\otimes A is partially defined. In monoids on Set\text{Set}, we must pick a particular set SS over which to define the binary operation (as discussed above), fundamentally allowing the exclusion of some objects that could be included in sets. That is, the set SS doesn’t strictly contain a whole class of object (on purpose): we have the flexibility to pick arbitrary sets over which the operation is defined. If we instead wanted SS to be representative of an entire type, we’d be effectively asking for SS to become a category itself (i.e., to fundamentally represent a certain kind of object if we’re going to define operations over it). So we let AA not be an arbitrary object in the desired type’s category, but we “lock that category in” by operating instead on endofunctors of that category. In this case, our objects are maps defined over the entire type: AAA\otimes A involves a specific functor, but each functor is defined over the entire type/category of interest.

Putting this all together: monads leverage the machinery of monoids to define a universe where “products” are sequences of composition; this is the fundamental thing monoidal product’s \otimes allow us to do. That composition takes place over not just any old functions defined between two objects, but instead on a map TT defined over an entire class of object (i.e., a functor). Such a map can be seen as “cladding” objects of the underling type/category in additional structure: it “reskins” the object class by coating every object and morphism individually.

A movement under that map represents a movement into a new structured world. The type of the underlying objects remains the same (it’s an endofunctor; we don’t leave the category), but the nature of the objects on the other side is different (and different in a consistent way, having the newly added structure). Note that TT indiscriminately adds structure to all objects in the category: it can be seen to map object XX to T(X)T(X), and since T(X)T(X) is already in the category, it’s also acted upon by TT in that single application, mapping to T(T(X))T(T(X)), and so on. Point is, TT adds structure indiscriminately, so multiple composition steps can add too much structure: “already structured” objects receive an extra layer of structure. μ\mu is a mediator or regulator to address this “bunching up,” flattening doubled up structure back to just a single layer.

Monads are structured computation environments. η\eta brings you into it (maps “plain” values into the structured landscape), the functor TT keeps you in it (brings all objects and morphisms along for the ride every time), and μ\mu flattens redundant build up (for those objects and morphisms that didn’t need any more of the functor’s effects). In aggregate, you have the necessary pieces for facilitating smooth, structured sequences of computation in TT’s world.

Example: power set monad

On Set\text{Set}, the power set monad is a monad P=(T,μ,η)\mathcal{P} = (T, \mu, \eta), where:

  • T(A)T(A) is the power set of a set AA, and T(f)T(f) is a function of direct images between power sets under a function f:ABf:A\rightarrow B

  • η\eta consists of component morphisms ηA:AT(A)\eta_A:A\rightarrow T(A), taking every aAa\in A to the singleton {a}\{a\}

  • μ\mu consists of component morphisms μA:T(T(A))T(A)\mu_A:T(T(A))\rightarrow T(A), taking a set of sets to its union.

For a concrete A={1,2}A=\{1,2\},

  • T(A)={{},{1},{2},{1,2}}T(A) = \{\{\},\{1\},\{2\},\{1,2\}\}

  • T(T(A))={{},{{}},{{1}},{{2}},{{1,2}},,{{},{1},{2},{1,2}}}T(T(A)) = \{\{\},\{\{\}\},\{\{1\}\},\{\{2\}\},\{\{1,2\}\},\dots,\{\{\},\{1\},\{2\},\{1,2\}\} \}

  • ηA(1)={1}\eta_A(1) = \{1\}, ηA(2)={2}\eta_A(2) = \{2\}

  • μA({{1},{2}})={1,2},\mu_A(\{\{1\},\{2\}\}) = \{1,2\}, \dots

Common examples from functional programming


  1. Felt the need to spell out more of this here because I find myself worried about the possible flexibility of ff. As in, it’s not exactly clear why things should automatically fall into place to ensure it’s the only option. But the fact you can’t really “fill in” all the gaps (outside of the options that gg is defined over) in arbitrary ways helps me accept this; you must fill them in a way that ensures ff respects the category’s morphisms.
    ↩︎
  2. This particular perspective lacks a little precision since our map is formulated more as a set-valued function [][\cdot] that takes a representative value to its equivalence class, but it’s nevertheless induced by an underlying relation.
    ↩︎
  3. Going to the trouble to formulate things this way is really meant to help me avoid falling into my typical trap: getting caught on the seeming open-endedness of the structure-preserving commutative form. When I “mask away” gxg\cdot x behind xx^\prime as simply our symmetry’s “output” when acting on xx, it becomes much easier to just follow ff’s application on both xx and xx^\prime. Those two have a clear relationship before ff, and that same relationship needs to be there after ff between whatever new items ff produces. But when I leave xx\prime as gxg\cdot x, I find I get distracted by our f(gx)f(g\cdot x) step and want to deconstruct it, reading more into it than is actually there. Really, gxg\cdot x is a new thing, and we write in terms that communicate precisely how it’s related to xx (which is of course clean and elegant)…it’s just that doing those two together often leads me astray.
    ↩︎
  4. Here I’m actually seeking a setup without some of the usual functional programming weight slapped on. That’s not to say it’s not helpful, but I want to feel the more abstract fundamentals hit home first (or at least now given all of the loose context I’ve absorbed with the 10 different tutorials that start with Maybe).
    ↩︎
  5. Right away this is a little confusing: morphisms in a category of functors are map-like things being applied to map-like things. Functors are the maps applying on the entire category (mapping all objects/morphisms in CC onto themselves, observing totality but not necessarily surjectivity), and the morphisms between them
    ↩︎