Abstract algebra

Notice: Experimental page format

This is a page from the latest site version, rolled out only on a select few preview pages (as of late 2025). Some things many be out of place or broken; sorry about that!

type	wiki
created	2020-05-25
modified	2025-10-11 16:24

Images

placeholder

Sketches

placeholder

General properties

Associativity: the order in which operation evaluation is performed does not affect the result. That is, for a given expression, you can pick operations and “resolve” them, one by one, in any order. Note that we’re not saying we allow any order of the operands (that’s commutativity), but rather that the application order doesn’t matter.

Associativity can be formally stated as follows: if, for all $a,b,c\in S$ , the equation $(a\cdot b)\cdot c = a\cdot (b\cdot c)$ holds, then the binary operation $\cdot$ on the set $S$ is associative. Note how this is the most simple distillation of our application invariance mentioned above: the operands remain in a fixed order, but parentheses can be re-written arbitrarily.

Intuition: what does associativity buy us? Intuitively, it feels something like an associative operation can “see” the components of composite operands. The form $(a\cdot b)\cdot c$ takes $a$ and applies pieces step-by-step, i.e., first adding $b$ , then mixing in $c$ after. Associativity says we can first mix up $b$ and $c$ and give the result to $a$ , and it doesn’t change $a$ ’s reaction: the $b$ and $c$ mixture is not a fundamentally new compound, but instead has transparent ingredients. In practice, this matters when we don’t want to have to worry about when we assemble the pieces of a composite result, so long as they have the same final order.

Algebraic structures

An algebraic structure is a mathematical object consisting of a set, operations defined over the set, and a set of axioms/identities satisfied by the operations.

General axioms:

Associativity:
Divisibility:
Identity: for every element in $x\in X$ , there is an element $e\in X$ such that $x\cdot e = e\cdot x = x$ .

Note one obvious implication of this: our binary operation preserves the presence of all items in the base set, i.e., is “full.”

Structure types

Magma: $(M, \cdot)$ ; a set $M$ with a single closed binary operation $\cdot$ .
Quasigroup: a magma endowed with a notion of division, i.e., a pair $(Q, *)$ such that

$a*x = b \;\;;\;\; y*a = b$

are defined for unique $x, y\in Q$ , with solutions written $x=a\backslash b$ and $y = b/a$ . The operations $\backslash$ and $/$ are left and right division, respectively.
Unital magma
Semigroup: an associative magma, i.e., a set $S$ together with an associative binary operation.
Loop: a quasigroup with identity
Associative quasigroup
Monoid: a semigroup with identity, written as a triple $(M, \cdot, e)$ $(M, \cdot, e)$ , where $e$ $e$ is the identity element.
- A submonoid of a monoid $(M, \cdot, e)$ is a monoid $(N, \cdot, e)$ such that $N \subseteq M$ . That is, it is a monoid defined around a base set that is a subset of the “supermonoid’s” base set, has the same identity, and remains closed under the inherited binary operation.
- A subset $S$ is a generator of a monoid $M$ if the smallest submonoid of $M$ containing $S$ is $M$ . That is, $S$ needs to be a “tight base set” in order to be a generator; one can’t include $S$ in the base set to produce a monoid “smaller” than $M$ . Monoids with finite generators are said to be finitely generated.
Group:

Free monoid

The free monoid on a set (or alphabet) $A$ is the monoid $(A^*, ., "")$ , where $.$ is the string concatenation operator, the empty string is the identity element, and $A^*$ is the Kleene star (the unary operator $*$ ) applied to $A$ . We often refer to the free monoid as $A^*$ (which is a little confusing since one writes the monoid’s internal set as $A^*$ as well).

A graded monoid is a monoid that can be written as

$M = M_0\oplus M_1\oplus \cdots \oplus M_n$

which is to say it can be “factored” or “graded” as a collection of submonoids (i.e., it’s layered). All free monoids are graded; $M_i$ contains the free monoid’s strings of length $i$ .
The “base set” $A$ over which the Kleene star (or Kleene plus) is applied contains the free generators of the free monoid $A^*$ . Here there’s a fairly clear analogy to vector bases: the free generators are “single-letter words” (i.e., characters) whose arbitrary combination forms all elements in the monoid. Further, the cardinality of the set of free generators is referred to as the rank of the free monoid (although each free monoid has exactly one set of free generators, while vector spaces can have many bases). We also call this free generating set a basis.
A code is a set of words $C$ whose Kleene star $C^*$ is a free monoid for which $C$ is a basis.

Free object

Free objects are to categories what bases are to vector spaces, roughly speaking. A basis is a fundamental characterization of a vector space, and linear transformations between spaces can be captured entirely by their values on that basis. Free objects are effectively objects with a known set of generators, analogous to a basis together with the vector space it spans.

In particular, let $C$ be a concrete category with a faithful (forgetful) functor $U: C \rightarrow \text{Set}$ (i.e., $U$ is an injection from objects in $C$ to sets). Let $X$ be a set, to serve as the basis of the free object. Then the free object on $X$ is a pair $(A, i)$ , where $A$ is an object in $C$ and $i: X\rightarrow U(A)$ is the canonical injection. Here $U$ can be thought of as a map that “strips away” all structure but an object’s base set, and $i$ is therefore responsible for connecting $A$ ’s basis (or generators) to items in $A$ ’s base set. Free objects also satisfy the universal property that, for any other object $B$ in $C$ with a map $g: X \rightarrow U(B)$ , there is a unique $f: A\rightarrow B$ such that $g = U(f) \circ i$ . What this tells us is, if we want a map $f$ from objects $A$ to $B$ , it is uniquely determined by how we choose to map the $A$ ’s basis $X$ into $B$ ’s base set, $U(B)$ .

Intuitively, what we’re doing here is “peeling back” the layers of objects in arbitrary categories and establishing a fundamental link using the common language of sets. When a (free) object is generated by a known set $X$ , we can treat those items in $X$ as guiding axes that map into the core set-based representation of a target object. We can then “reapply” the layers/structure on top of this fundamental relation, getting a map between full objects for free.

Example: free monoids

Returning to free monoids, we can apply the details for free objects more generally as laid out above. Recall our general notation for the monoid as a triple $M = (A^*, ., ``")$ defined over a set $A$ .

We have the forgetful functor $U: \text{Mon}\rightarrow\text{Set}$ which simply forgets the operation and identity, i.e., mapping $M$ to $A^*$ .
$A$ is the base set of generators, the basis for the monoid.
The generator injection $i: A\rightarrow U(M)$ canonically maps each element of $A$ into the monoid. In this case, the symbols in $x\in A$ are mapped to one-letter words, i.e., $x\mapsto [x]$ , as they appear in $U(M) = A^*$ .
For any other monoid $N$ , along with a function $g: A\rightarrow U(N)$ , we get a unique monoid homomorphism $f: M \rightarrow N$ . This function has the specific form

$f(x_1x_2\cdots x_n) = g(x_1)\cdot g(x_2) \cdots g(x_n)$

That is, the string $x_1x_2\cdots x_n$ in $M$ gets mapped to the equivalent in $N$ after replacing each symbol $x_i\in A$ with the string in $N$ that it’s mapped to by $g$ .

For example, suppose we took the base set $A = \{a, b, c\}$ for monoid $M$ , and $B = \{x,y,z\}$ for monoid $N$ . Then a function

$g(a)=x \; ;\; g(b)=y \; ;\; g(c)=z$

determines a unique mapping between the entire monoids, and we can convert any string as was written above, e.g., $aaccbb \rightarrow xxzzyy$ . Note the flexibility in the assignment of $M$ ’s base symbols to strings in $N$ (i.e., any element in $U(N)$ , which in this case is $B^*$ ). For example, it’s just as valid to define $g$ as

$g(a)=xx \; ;\; g(b)=yy \; ;\; g(c)=zz,$

and conversion takes place as before under $f$ , e.g., $acc\rightarrow xxzzzz$ . Again, the universal property that applies here suggests that specifying $g$ uniquely determines how we map the monoid objects onto each other.

One additional observation to note here is how small the scope is for the morphism equivalence $g = U(f)\circ i$ . The domain of $g$ is $M$ ’s alphabet (or base set), and that’s all we’re even allowed to specify. Therefore $U(f)$ need only match $g$ at these points of definition, and it has no duty to uphold usual monoid homomorphism properties (namely identity and multiplication preservation). But once we “extend” that back to a full monoid homomorphism $f$ , it needs to match $g$ along the generators we’ve provided while also taking care to actually be a monoid homomorphism, by definition. All that to say, there is indeed extra structure that comes into the picture as we bring $g = U(f)$ back into the category we’re working in, and those extra restrictions play a role in constraining the map to uniqueness¹.

Natural transformations

Foreword: this section is adapted from a very large “development aside,” i.e., a section I wrote to address concerns with my intuition as I worked through relevant concepts. As such, it’s far less structured than the other sections on this page, but is central to my understanding of maps, in the most general sense.

Natural transformations feel so natural, I struggled to see why they were even explicitly defined. Perhaps more accurately, I had a poor understanding of the implications of the constraints that make up a natural transformation and what it’d mean to violate them.

As we’ve established, natural transformations are best introduced as morphisms in categories of functors, and the constraints that apply ensure that those morphisms behave in an appropriate way (according to the usual category rules, that is). In order to do that, we need to “break open” the objects they’re being applied to: the functors. This gives us the family of morphisms that apply to the objects inside the source/target categories of the functors. If you doubt this move, consider what else we really have to work with; to appropriately map between map-like things, one really must inspect what changes between the maps in question (i.e., for the same input, how do the maps’ outputs change?). If we can consistently characterize the change in outputs between two functors (which includes both objects and morphisms of the underlying categories), then we’ve got the thing we wanted.

The morphisms in the category of endofunctors are natural transformations

Intuitively, the naturality square ensures that we can naturally move around in categories before and after a functor is applied. It is a consistent bridge between categories induced by functors that implies the order of morphism and functor application doesn’t matter. The analogy I like here (although I came up with it, so it may be dubious) is that the coherence conditions ensure “safe passage” through chains of functors. If you’re moving from a source object to a target object via a series of morphisms across categories, as long as you have natural transformations between the functors linking those categories, you can apply your morphisms in whichever category you like: you can load them up all in the first category, do a morphism-functor-morphism-functor, or wait until the very end. They will all yield the same final object, and you never have to worry that you took the “wrong exit” via a particular morphism in a particular category that changes the dynamics for the rest of your journey.

Let’s look at a small, concrete example: functors between a few set-like objects. Here we have a category $\mathcal{C}$ with three sets $X_1$ , $X_2$ , $X_3$ and two morphisms $f_A$ , $f_B$ between them as depicted:

Example setup depicting categories , and functors and between them. — Example setup depicting categories $\mathcal{C}$ , $\mathcal{D}$ and functors $F$ and $G$ between them.

The diagram also shows two functors $F$ and $G$ that map between $\mathcal{C}$ and another category $\mathcal{D}$ , itself containing three set objects. You can see how the functors may change the objects/morphisms in different ways: over $\mathcal{D}$ ’s objects (which, despite the labels of 1, 2, 3, don’t necessarily correspond to those in $\mathcal{C}$ ; it’s just an implicit ordering so we can compare differences consistently), $F(f_A)$ and $G(f_A)$ apply to different source/target objects due to $F$ and $G$ mapping the objects differently (we’ll see this a bit more explicitly in the next diagram), i.e., $F$ maps $X_1$ in $\mathcal{C}$ to $Y_1$ in $\mathcal{D}$ whereas $G$ maps $X_1$ in $\mathcal{C}$ to $Y_2$ in $\mathcal{D}$ .

$\eta$ is a natural transformation between functors $F$ and $G$ …at least in name. We need to show that the relevant conditions are met; otherwise we might merely refer to it as a map between functors. Here we can look at the same example, but with additional detail relating objects:

Same setup as before, but with more explicit object mappings. Blue lines depict functor ’s object connections from to , and green are ’s. — Same setup as before, but with more explicit object mappings. Blue lines depict functor $F$ ’s object connections from $\mathcal{C}$ to $\mathcal{D}$ , and green are $G$ ’s.

As stated before, natural transformations are defined over category components, and by definition we have a family of morphisms:

$\eta_{X}: F(X) \rightarrow G(X)$ for any object $X$ in $\mathcal{C}$ . This is called the “component of $\eta$ at $X$ ,” linking the two object targets under the functors for each source object $X$ .

In $\mathcal{D}$ , it’s pretty easy to think of this collection/family of morphisms as the stitching holding the functors together. If we land anywhere in $\mathcal{D}$ by using $F$ , we can simply walk across these component morphisms to get to the equivalent under $G$ .
We additionally require that, for every morphism $f: X\rightarrow Y$ in $\mathcal{C}$ , we have

$\eta_Y\circ F(f) = G(f) \circ \eta_X$

In words, this says the following two processes are identical:
- For any object $F(X)$ , map it to $G(X)$ via $\eta_X$ and apply $G$ ’s version of $f$ , therefore mapping into $G(Y)$ (recall that a functor $A$ on $C$ must map $f$ to $A(f): A(X) \rightarrow A(Y)$ ).
- For any object $F(X)$ , first apply $F$ ’s version of $f$ thereby mapping into $F(Y)$ , and then move to $G(Y)$ via $\eta_Y$ .

The last equivalence is often expressed by the following commutative diagram (where details are borrowed from our example, but it nevertheless reflects the general form of the naturality square):

So now we’ve contextualized the family of morphisms a bit more, and have a general sense for what it might mean to move “naturally” between functors. But the components $\eta_X$ seem to fall out pretty much automatically by simply associating functor target objects…right? When does this not hold? Let’s look at an example with concrete objects and morphisms.

Aside: morphisms, once and for all™

It’s worth noting that I needed this level of depth to bump into, yet again, a (seemingly persistent) fundamental misunderstanding of morphisms. Looking at the naturality square, I initially thought something like the following: “well both paths from $Y_1$ get us to $Y_3$ , so things just seem to hold by definition.” The first part of that is perfectly accurate I suppose, but it’s missing the actual meaning of the statement. Morphisms are more than just a map from one object to another, as in they don’t just take in a whole object and produce another whole object. Getting to an object isn’t enough here, it’s getting to it in the same way. Put simply, I was thinking on too high a level, as if morphisms are necessarily maps that must ignore any inner components of the objects they’re acting on. This is likely me just misinterpreting the super high level of abstraction one faces when warming up to category theory. We write morphisms as maps $f: X\rightarrow Y$ and rarely say anything more specific unless working in a specific category, so I pretty much took that mean what it says in the most general sense: a morphism $f$ takes in an object $X$ and gives out an object $Y$ , as if there’s nothing more atomic to act on than entire objects. But of course, a morphism merely maps from an object to another with no explicit guarantees of completeness, and the way you get to the co-domain can vary. Morphisms in general can exhibit all the usual flexibilities of functions in $\text{Set}$ , e.g., not strictly being surjective, and of course two functions $f: X\rightarrow Y$ and $g: X\rightarrow Y$ , despite sharing a domain and co-domain, need not actually map individual components to the same place (i.e., we can have $f(x) \not= g(x)$ for any $x\in X$ ).

I faced similar issues over in Category theory§Pullback, with a similarly detailed accounting of what now feels like an incredibly basic point. This just goes to show how easy it can be to misconstrue even simple, foundational concepts. Ultimately I think this just stems from operating at a level of uncomfortable abstraction, and you tend to be very careful making assumptions about anything that isn’t otherwise stated outright. I think the fundamental disconnect here has been the fact that almost no category theory verbiage deals in elements smaller than objects, and I’ve hesitated thinking in terms of anything lower as a result. Perhaps this is a result of not having followed a clear introduction built on the back of sets, which perhaps would’ve made this whole thing a non-issue.

After a considerable amount of toiling, I finally feel okay with how we get from the very abstract and vague descriptions of morphisms to maps aware of object structure. I didn’t even think I took issue with this, but I got myself caught in a loop where I just couldn’t feel why maps shouldn’t be required to return full objects. We say generally that a morphism is a map from an object $X$ to an object $Y$ , and that’s about as specific as we get (except for identity and associativity). If we’re operating at this level and saying nothing further about the structure of those objects, it feels pretty straightforward to interpret that as a map taking an object $X$ and giving us an object $Y$ out. The problem is: it doesn’t mean that, or at least it doesn’t have to. Firstly, we stay at such an abstract level because it lets us say only as much as we need: as long as those top-level items are true, we have a viable morphism. Anything that falls under that jurisdiction should then be fair game; from here on out my problems mostly remained with the verbiage.

Here I was struggling with the idea that objects are atomic units that can’t be further decomposed. This is again missing nuance; we obviously have some kinds of objects, like sets, that can be seen as containers of other items. When we say objects are atomic, we effectively mean they’re atomic relative to the category theoretic statements being made. That is, they’re the “lowest level” construct those statements need to care about to be true, and they don’t care if there happens to be further structure when one looks closer. Now, actually defining classes of morphisms in specific categories is almost always a process of designing a map-like construct that is not only aware but often preserving of underlying object structure. So it’s not as if morphisms, in the way they’re actually applied, don’t look closer at the objects they apply to; they often must in order to remain consistent. It’s just that one can ignore those specifics and treat objects as black box units when it comes to making broad, category theoretic statements about the structures at play.

So decomposing is fine, but how about non-surjectivity? That still irked me a bit, when our top level statements seem to suggest we’re dealing in “whole objects.” It’s worth noting right away that any notion of “partial objects” can only present inside categories where objects can be decomposed further. Again, at the top level, we just don’t see inside the objects, and statements we make cannot apply to a deeper, internal level. So this non-surjectivity is merely a byproduct of behavior we allow inside of a category like Set, because it doesn’t break those top-level rules. If it did, sure, surjectivity would be important to apply everywhere, but our rules of composition and identity on morphisms work all the same whether we have our set function mapping onto the co-domain or not. It is of course perfectly natural to non-surjectively map between two sets, so we also actually want to be able to do this. Here I find embracing the word “to” is effective: morphisms map an object $X$ to an object $Y$ . Under the hood, this may very well correspond to associating an element in $Y$ to each element in $X$ ; that’s perfectly allowed when defining morphisms on set-like objects. To get the entirety of this whole aside, the only thing you need to accept is the following: we are merely taking $X$ to $Y$ , not (necessarily) transforming it 1-to-1, and that’s an acceptable meaning of the term “map.” Imagine how small $X$ can be and how large $Y$ can be and this still make sense as a map. For example, let $X$ be my group of friends and $Y$ be seat numbers in an NFL stadium. Our NFL tickets map each person to their one seat number, but that makes up a tiny fraction of all available seats and is obviously not “onto.” And it shouldn’t have to be for this relation to serve as a valid map between $X$ and $Y$ , not only in the set-level or the category theoretic sense, but in the more pesky linguistic sense that got me into this mess in the first place. From the top-level, such a map really does get to be an arrow between the two whole objects $X$ and $Y$ . How much the image of that morphism actually “fills out” $Y$ just isn’t relevant: to even think about the items in the image requires set-level awareness that cannot be seen when dealing in statements where objects are atomic by definition.

Even with all this, I still find it slippery somehow. When I try to root out what’s wrong, I think it just comes down the notation of writing functions according to their domains/codomains, e.g., $f: A\rightarrow B$ . It’s actually this that I take issue with, more than anything inherent to category theory or object level abstractions or whatever else. It really goes beyond functions: it’s with binary relations. The simple fact we can call a link between one object in $A$ to one object in $B$ a relation (which is actually a valid, minimal relation) is what I find kind of pesky and stupid. That just doesn’t feel like what we mean by relation. So something like the single pair (2, green) is a valid relation from integers to colors. By all accounts that’s an association from an integer to a color, but it doesn’t seem like what we’d ever mean by a relation $R: \text{Int} \rightarrow \text{Color}$ (that notation is a little more conventionally functional than barebones relation, but the point stands). That is where I’m getting so hung up I think.

I’ve spent even more time on this, with a three part audio recording series to accompany it. But I’ve hit the root of the issue with the following, once and for all.

How morphisms look inside and outside of Set

(left side of diagram) Within Set, object structure is visible. Morphisms can be defined over the elements within the objects, and the above depiction includes (effectively) the simplest possible relation between the two (i.e., connecting a single element in X to a single element in Y). Two comments:

I’ve complained this doesn’t even feel like a relation (or what we’d mean by any such definition of one), let alone a morphism. But binary relations can be seen to act on all pairs in the Cartesian product between sets, mapping those with related items to True, and False otherwise. This is at least holistic; no elements in the domain or co-domain are systematically ignored just because they aren’t assigned a True value. That is, no elements “slip by” the relation: it can always formally be seen to make a Boolean decision for every possible pair.
If you don’t accept that this should “mature” to a morphism at the abstract level, notice that the line you’d then have to draw is arbitrary. How many items must be included for it to be enough, for it to count as a “map?” Notice that, regardless of how “full” your relation may be, as soon as you become blind to element-specific involvement, all you see is a bridge between the two objects. This is what we do as we move to the right diagram.

(right side of diagram) Outside of Set, back in the general abstract category theory viewpoint, we lack any insight into internal object structure. One can imagine masking off all element-wise knowledge we had while in Set, leaving the above: two atomic objects with some connection between them. How these objects are related is no longer even a well-posed question at this level of granularity; they simply are or are not connected, and the “endpoints” of the arrow can only be the entire object.

So all relations inside of Set, no matter how large, are viewed uniformly once we’re outside of Set. “Thin” relations mature to full morphisms in the abstract view (since they can’t be distinguished from any other relation), and from the abstract view a full morphism simply suggests any relation exists under the hood.

It’s worth noting that relations are not even valid morphisms in Set, as they do not generally respect associativity of composition. But once you accept that relations are, in the most basic sense, mappings between sets, then you’re past the issue of non-surjective functions and the idea of yielding “less than Y” via a morphism that doesn’t map onto the co-domain. That is to say: from the abstract point-of-view, any kind of connection between elements within Set simply looks like a connection between the set-objects themselves, and it says nothing about what that relation must look like once you’ve revealed the element-level details. Not involving all elements in your map is not something that can be seen outside of Set, so all relations between any source and target look the exact same.

Taking a look back at this some weeks later: I think I do a good job conveying what felt problematic and how I got past it. This is from the perspective of not really recalling (right away, anyway) what the problem was in the first place, and I feel satisfied by the end of this passage. Here I think it may be helpful to simply recap why we’re here in the first place: looking at diagrams like the naturality square, one gets the sense that simply following two trajectories that end up in the same object makes them equivalent, by the sheer fact they point to same place. But that alone doesn’t actually make those (composite) morphisms equivalent, since they’re allowed to have such distinct structure in arbitrary categories (like being functions in Set) while mapping from/to the same objects. So in commutative diagrams, like the naturality square, we’re not saying that any two composite morphisms are equivalent because they start and end at the same places, but instead that any two morphisms starting and ending at the sample place must be equivalent under the morphism structure in the category. That is, not just any morphisms with the right domain/co-domain will do; they must actually be the same in the underlying category in order to uphold the kind of constraints the diagram is talking about. This is annoying/confusing because the diagrams don’t really convey this: you simply see arrows between objects, and it seems to suggest that moving from/to the target objects is good enough. But in reality, they are merely abstracting away any category-level morphism specifics so as to make a category-independent statement, but crucially those specifics will still apply when operating in any such category. That is to say: once you’re actually operating in a specific category, the known structure of its objects and morphisms comes into view, and you cannot ignore that revealed structure. I routinely find myself thinking in terms of Set when staring at commutative diagrams, and it’s a mistake to mix the two levels of abstraction (e.g., allowing oneself to think of the objects as sets, but continuing to treat morphisms as vague, non-functional maps; if you embrace the former, you must also work with the latter, else you’re mixing levels of abstraction).

Here we’ll exploit a breaking condition for natural transformations: when the component $\eta_X$ (i.e., that which connects $F(X)$ to $G(X)$ ) depends on static data from the object $X$ itself. Here we’re manufacturing an unnatural dependency, one that allows $F(X)$ to map to $G(X)$ but in a way that does not act “fairly” across $X$ (cheating with knowledge of $X$ ’s values rather than just its structure).

(un)natural transformation example in Set.

The example setup is as follows: $F$ and $G$ are both the identity endofunctor on the category $\mathcal{C}$ . $\mathcal{C}$ contains two objects in Set: $X=\{1,2\}$ and $Y=\{3,4\}$ , as well as a morphism $f: X_1\rightarrow X_2$ that is the constant function $f(x) = 4$ . We then define the component morphisms $\eta_X: X\rightarrow X$ and $\eta_Y: Y\rightarrow Y$ to be the constant functions mapping to their respective set’s smallest element, i.e., $\eta_X(x) = 1$ and $\eta_Y(y)= 3$ .

This is effectively a contrived example to ensure $\eta_X$ and $\eta_Y$ disagree somewhere along the chain from $F(X)$ to $G(Y)$ : $X$ ’s values map to $Y$ ’s second value, whereas $Y$ ’s values map onto its own first value. Starting from the value of $1$ in $X$ , we have the following two trajectories:

Disagreeing trajectories across (un)natural transformations and morphisms; naturality square does not commute.

An even simpler demonstration of this is to take $\eta_X$ as shown, and the constant function that maps to $X$ ’s other element, i.e., $f(x)=2$ . One can then see how the naturality square breaks just over the object $X$ when the order of those two morphisms are swapped: $1\rightarrow 1\rightarrow 2$ vs $1\rightarrow 2\rightarrow 1$

So what does this tell us? In this situation, we can see that the order of application matters. The map that is natural-transformation-then-morphism is different than the morphism-then-natural-transformation: their composite morphisms do not yield the same map in total. For consistency in component morphisms, we generally require that the choice of map acts on objects in a coordinate-free manner. That is, it must be equivariant with respect to the morphisms in $\mathcal{C}$ (by definition).

Breaks when the map is dependent on static data from the object itself
not sufficiently relative; knowledge of the operating object “overwrites” the work done
needs to be sufficiently local, sufficiently relative; shouldn’t depend on external object structure.
imagine first picking morphisms, then $\eta$ s; and vice versa. This illuminates my confusion a bit since both feel fine but they should get us into an equal amount of trouble

On equivariance: invariance is observed when a property of a mathematical object remains unchanged under certain kinds of transformation. Such a property is called an invariant of the object with respect to the transformation. Take, for instance, the area of a triangle: this is an invariant property of the shape under transformations like rotation and reflection. Equivariance is a slightly looser notion of invariance, not requiring a property to remain wholly unchanged under transformation, but to itself be transformed along with it in a predictable way. That is, transforming the object and its property in the same way respects the definition of the property. Keeping with the triangle example, one can look at the triangle centroid. This effectively moves with the triangle and will change under translation. So while it’s not an invariant (numerically the same), it will change with the triangle under the same translation.

Structure-preserving maps

To preface: equivariance is easily framed from the perspective of structure-preserving maps, which is a term commonly thrown around when speaking about morphisms. Therefore, this too has been a slippery topic for me in the past, and I’ve spent a sizeable chunk of time ensuring I have a comfortable intuition around this. Structure preservation, equivariance, and natural transformations are heavily intertwined; I find it helpful to frame latter two as structure preservation through entire classes or groups of transforms; we’ll formalize this afterwards. For now, I want to clarify the many, many places I’ve found myself stuck on structure-preserving maps and leave no doubt about how we should interpret this moving forward.

For starters: what we mean by “structure-preserving,” in the most general sense, can’t really be boiled down to anything more precise. As mentioned, we often call such things, in the abstract, morphisms. In that sense, one might imagine a structure-preserving map need only operate from/to the same kind of object: the very fact we “end up” in the (structured) object suggests we’re upholding necessary structure. But this is not generally what we mean, or is at least often not sufficient. That is to say: being a map with the appropriate external “signature” (i.e., mapping to/from the right kind of object) is alone not enough, as we care about how we move between the objects (and whether that movement preserves some structure within them).

Take, for instance, addition over the integers. Suppose we have the groups $G=(\mathbb{Z}, +)$ and $H=(\mathbb{Z}, +)$ , and a map $f:G\rightarrow H$ defined $f:x\mapsto x+1$ . Clearly “pushing” all integers through $f$ still just gives you $\mathbb{Z}$ , and so in a very loose sense upholds “group structure” insofar as there is still a valid group on the other side. But the way we move under $f$ does not preserve addition (which is really all we mean by “structure” in this case): for any $a+b=c$ in $G$ , we do not have $f(a)+f(b)=f(c)$ in $H$ . So even though $f$ moves from a valid group to a valid group, it does not align the elements of the groups in a manner that is consistent with the addition operator.

As a reminder, $f$ is unary in $\mathbb{Z}$ . Even if our binary operation (addition) had arbitrary arity, $f$ is a still a bottleneck of sorts through which each integer individually passes. So applying $f$ to individual elements in the underlying set is the only way to apply the transform at all. I’ve found myself occasionally thinking that $f$ could have distinct joint behavior, similar to the operator, but this isn’t true and recognizing as much helps simplify things. When I look at the required equality here, i.e.,

$f(a+b) = f(a) + f(b),$

I’ve let myself question whether this is the only constraint that could mean $f$ is structure preserving. But recalling how $f$ can only operate on single items in the underlying set, it becomes clear this is really the only thing you even can say in the first place. We use $f$ to map the items of one group onto another one-by-one, and the question is simply whether related items in the source group remain related in the transformed group. $a$ must go to $f(a)$ , $b$ must go to $f(b)$ , and the only real question from there is whether $a+b$ goes to the same place as $f(a)+f(b)$ .

That last bit, the only salient part, has made me nervous for reasons I can’t easily articulate. I think it just somehow feels mysterious every time: I have to unpack it and check that it makes sense, and I get lost along the way. Either that, or as I mentioned before, the idea that there could be some other way to express structure preservation that I’m not able to see bugs me; I don’t have good intuition that the form is “tight.” At risk of restating what has already been said, the following might help:

We’re looking for a way to relate the transform $f$ to the operation at hand. Both need to be involved for us to establish any such link, and it’s critical to note that we’re observing what happens with the trajectories themselves under $f$ . It’s not $f$ ’s image, it’s not its co-domain; these are more like aggregate consequences of having applied $f$ to every item. Working with $f$ ’s image or its co-domain ignores how we got there, and would therefore say nothing about $f$ at all. The entirety of the setup also avoids relating the transform and the operation: $a$ and $b$ are just any two items in the source set, $f(a)$ and $f(b)$ are the items we map them to, and both $a+b$ and $f(a)+f(b)$ are the results of applying our operation on those pairs. Thus far, we’ve said nothing about how $f$ and $+$ interact, we’ve merely applied them to eligible operands/inputs. Now, if $f$ is going to respect $+$ , the only thing that matters is that it takes a valid application of the operation to another valid application of the operation: that’s just about all one could mean when saying “ $f$ preserves the operation.” If it took three items $a+b=c$ to three items that couldn’t be related via addition, then $f$ would clearly not be compatible with the behavior of that operation. In such a case, I couldn’t rely on my operation as a means of relating items under the movement/flow of $f$ . The operation is like a stitching in the set-fabric between items $(a, b, a+b)$ , and preserving it means the same stitching is present when moving the fibers through $f$ . This can only mean that I produce related items $(f(a), f(b), f(a+b))$ : after I’ve taken my stitches to their new locations via $f$ , I find they’re still intact.

That last bit I find the most helpful. Taking a concrete thing like $((a, b), a+b)$ (the operands and their output) removes the burden of feeling like there’s some nebulous structure coming along with the operation for which I need to have developed intuition. That pair is the structure; it’s like an extensional view of the operation (basically its graph). So every such pair that holds true in the source set is a little knot/stitch that locally exhibits the structure we’re trying to preserve, and we cannot untie it under $f$ ’s movement. If $f$ in fact preserves the desired structure, it will move every such knot to another valid knot, locally preserving our operation dynamics. Note that $f$ doesn’t directly map knots to knots, though; it applies uniformly on an element-level basis. Therefore it takes our knot

$((a, b), a+b) \rightarrow ((f(a), f(b)), f(a+b))$

In order for that right-hand tuple to be a “valid knot” under $+$ , it must be consistent with the structure of the left-hand side, i.e., have an output element that is the sum of the two operands. So if $f$ preserves the knot, the item in the output slot, $f(a+b)$ , must be equal to the sum of the input items, i.e.,

$f(a+b) = f(a)+f(b),$

which is the usual constraint. Let’s take a look at a concrete example:

Transformation not preserving the operation. Moving from left to right, we see legitimate application of for the shown operands; doesn’t change that. But does not respect the structure of , since it maps our operands and their sum to values that cannot be associated via addition. — Transformation $f(x)=x^2$ not preserving the $+$ operation. Moving from left to right, we see legitimate application of $+$ for the shown operands; $f$ doesn’t change that. But $f$ does not respect the structure of $+$ , since it maps our operands and their sum to values that cannot be associated via addition.

Take the red bridge along the top: this is a “local knot” that $+$ ties. That is, it exemplifies the kind of structure that $+$ induces, and as a result that which we want $f$ to preserve. But as we move down the diagram via $f$ , we see that it “unties” the knot, giving us transformed operands whose sum does not match the transformed sum. The red bridge along the bottom shows the knot that would be valid under our transformed inputs. It would be equally reasonable to consider the kinds of inputs $f$ could’ve “chosen” to reach the needed output. In either case, $f$ does not align them correctly: it doesn’t map a $+$ -knot to a $+$ -knot.

For the linear transformation $f(x)=2x$ , on the other hand:

Transformation preserving the operation. — Transformation $f(x)=2x$ preserving the $+$ operation.

Here we find the map does respect the $+$ operation (at least over the specific operands we’ve plotted): the knots remained tied through $f$ .

Moving back out, recall that we say generally that, for a transformation $f:A\rightarrow B$ and $k$ -ary operations $\mu_A: A^k \rightarrow A$ and $\mu_B: B^k \rightarrow B$ defined over $A$ and $B$ , respectively, $f$ is said to be structure-preserving if the following holds:

$f(\mu_A(x_1,\dots,x_k)) = \mu_B(f(x_1),\dots,f(x_k))$

This is very general, and worth exploring the dynamics when $f$ is more like a binary relation. In this case, I found myself thinking of addition in modular arithmetic. Here we have an operation that behaves similarly over entire classes of values: rather than observing the “paired” association of structure $(x,\mu(x))$ with $(y,\mu(y))$ through $f$ (paired as in we relate one “bit of structure” with just one other), we now say something like the structure $(x,\mu(x))$ can be associated with many occurrences $(z,\mu(z)), z\sim x$ , under the relation $\sim$ . Nothing is fundamentally different here save for our map being “wider” in a sense: it links together more structure across objects.

Looking at a rough sketch of congruence modulo $k$ , we can imagine “folding up” $\Bbb Z$ to give us $k$ new classes of items:

It’s over these equivalence classes that we observe consistent structure under addition. That is, the relation that induces this partition respects the operation: we can first add values in $\Bbb Z$ and map to their equivalence classes, or first map to equivalence classes and “add” them².

Structure preservation under a relation-like map

The diagram above attempts to show what’s happening as move from items to their equivalence classes (going down), and the ways we add those objects (moving right). Note the difference in the 2D representation here compared to similar diagrams above: $x_1$ and $x_2$ are both meant to be $k$ -dimensional, and we’re simply laying out our objects in the grid there. Note also the notion of addition used over equivalence classes is the Minkowski sum, showing how the operation being respected can show up in slightly different forms between $\mu_A$ to $\mu_B$ . In any case, the high-level take away here is that our structure-preserving maps can be more than just functions: they can associate many target objects to a single source, and this naturally abides by the same principles we’ve discussed above. In this particular case, we’re saying that whatever the equivalence class “allegiances” of $+$ ’s operands and output are, they will be respected: addition is consistent up to class assignment. Put another way, $+$ plays along with the new notion of identity induced by the equivalence relation (color in the diagram). If a blue item and green item add up to a red item (following the top bridge), then adding any blue and green items will yield a red item (following the bottom bridge).

In total, the relation-like map explored here helps broaden my intuition, shedding light on how general a structure-preserving map really can be. If we’ve allowed a generalization from pairwise associations through functional maps to more “class-like” associations via relational maps, you might naturally ask whether we can do something similar over the operation. That is, can a map respect a class of operations or transformations? This is pretty much exactly what we do with equivariant maps, and we rely on group theory to formalize valid collections of transformations through the same language of operations (i.e., the group action). We explore this in depth below; that’s what kicked off this aside in the first place.

One final remark to potentially keep in mind: the notion of structure preservation is really more of a loose hierarchy. Questions like which structure or how much of it can be naturally addressed by strengthening or weakening one’s requirements in a given setting. What is meant by “structure-preserving” is generally very flexible and, within the general framework we’ve formulated here, can accommodate any object, transformation, or interpretation of “map.” One might attempt to formulate a map between groups and find that it fails to be a group homomorphism, but it can perhaps still be seen as a map preserving structure of sets or even some other relaxed notion of symmetry. One also needn’t have maps that are particularly well-defined, e.g., analogous to surjectivity/injectivity; even trivial, information-destroying maps can be structure-preserving over the “places” where they’re defined. All in all: structure preservation is very fundamental in abstract algebra, and having a strong intuition for what it means is important (even if I don’t yet have that, but I sure have spent a lot of time toiling). But it is ultimately incredibly flexible and context-dependent, applying across many different layers of abstraction.

Emerging from the aside on structure-preserving maps, the definition of equivariance feels pretty familiar. An equivariant map is one that preserves some notion of symmetry over its source and target objects. We capture this expression of symmetry as a symmetry group, and the preserved operation is the group action on the objects in the domain and co-domain.

In particular, we say that an equivariant map $f$ respects the group action of a group $G$ over its $G$ -sets. This is similar in form to that of structure-preserving maps, but with a stronger requirement that structure is held up across all $g\in G$ (rather than just a single operation; instead, we have a whole family of operations). That is, for a group action $\cdot$ and for all $g\in G$ , we have

$g\cdot f(x) = f(g\cdot x)$

Again, this just says that the movement through $f$ doesn’t break the structures in place between objects, which in this case is the symmetry represented by the group’s transformations. If we label the output of the group action $x^\prime = g\cdot x$ , we can think of this structure as manifesting concretely in the pair $(x, x^\prime)$ . If our map is structure preserving, we need to see precisely the same structure on the other side of $f$ , i.e., when mapping $x$ and $x^\prime$ we find the results also “meet the criteria” for being bundled up as a tuple. Such a pair would be $(f(x), f(x^\prime))$ which must exhibit the same relationship between the first and second element as we had before $f$ , i.e., $f(x^\prime) = g\cdot f(x)$ . Expanding $x^\prime$ gives us our form above.³

Note that a $G$ -set is just a set acted upon by $G$ : it’s effectively a formal construct that bundles those two together (the set and the group action) such that we get a standalone object. This is just the group equivalent of the “set plus structure” we saw before, e.g., $(\mathbb{Z}, +)$ : we need a way to bring along more than just a set. (Possibly confusingly, $(\mathbb{Z}, +)$ is itself a group. The operation being respected there is what induces the group; with $G$ -sets we’re working with a group action rather than the group’s operation, ultimately more like a functional than a function.)

We can see a quick example with points (or shapes) in $\mathbb{R}^2$ . We can apply the rotations in $C_2$ (i.e., just 0 $^\circ$ and 180 $^\circ$ ) and leverage an arbitrary map $f$ between shapes. Here we have a square $\{-1,1\}^2$ and map to a triangle by projecting points with positive $y$ coordinates onto the $y$ -axis, i.e., $(1,1) \mapsto (0, 1)$ and $(-1,1) \mapsto (0, 1)$ :

Square triangle under — Square $\rightarrow$ triangle under $C_2$

Note that the orientation of the diagram is transposed relative to the form we’ve been using for the naturality square and/or above diagrams. Here we see that $f$ maps the same orientation of the square to the same orientation of the triangle, but our 180 $^\circ$ rotation changes only the orientation of triangle. The map $f$ therefore does not respect the structure of $C_2$ ’s group action (the “family” of operations it represents). On the other hand, if $f$ maps to another shape with 180 $^\circ$ rotational symmetry, we would find the group action is respected. Take, for instance, an $f$ that simply projects the points onto the $x$ -axis:

Square line under — Square $\rightarrow$ line under $C_2$

This is equivariance: a map that changes related objects such that the objects on the other side remain related in the same way. And so we return to natural transformations:

Natural transformation components and — Natural transformation components $\mu_X$ and $\mu_Y$

Here $\mu$ is our structure-preserving map: where there’s “structure” in $F(\mathcal{C})$ (which just means related object pairs and how they’re connected via morphisms), that same structure analog is present on the other side, i.e., in $G(\mathcal{C})$ .

As seen with equivariant maps, $\mu$ respects structure of not only a single operation, but instead a whole collection of transformations, which in this case are the morphisms in $\mathcal{C}$ , or $\text{Hom}_{\mathcal{C}}$ . To be precise, $\mu$ respects how structure present in $\text{Hom}_{\mathcal{C}}$ shows up in $\mathcal{D}$ through the functors $F$ and $G$ ; it doesn’t interact with that structure in $\mathcal{C}$ directly.

This lines up nicely with the group theory analog discussed for equivariant maps: we had an abstract (symmetric) group $G$ whose transformations were first made concrete through its group action on specific $G$ -sets. $f$ always maps from/to specific objects, and $G$ ’s structure must first be “realized” on those objects (producing the concrete $G$ -sets) in order for us to check how $f$ will behave. That action of “realizing” structure is what matters here, since it’s coming from the same base group: however $g\cdot$ presented for both objects $X$ and $Y$ , the resulting transformations in both “universes” can be fairly compared as they’re based on the same fundamental transformation in $G$ .

The exact same thing is happening here with our base category $\mathcal{C}$ : it can be seen as a kind of abstract category that is realized through the functors $F$ and $G$ into the category $\mathcal{D}$ . Everything from there on out takes place solely in $\mathcal{D}$ , just as we didn’t strictly need the group $G$ provided we had $G$ -sets (i.e., objects that already contain the relevant realizations of $G$ ) in the case of equivariant maps. Put another way: once we “project” $\mathcal{C}$ onto $\mathcal{D}$ via functors $F$ and $G$ , we no longer need $\mathcal{C}$ directly. We only need to know which objects/morphisms correspond to the same structure across $F$ and $G$ , i.e., which realized objects/morphisms originate from the same objects/morphisms in $\mathcal{C}$ .

In total, we have a $\mu$ that, through its applicable components (i.e., on objects in $\mathcal{C}$ ), respects the structure induced on its domain/co-domain by $\mathcal{C}$ . In the diagram, $X$ and $Y$ are any two objects in $\mathcal{C}$ related by morphism, and for all such pairs of objects together with each of their morphisms $f\in \text{Hom}_{\mathcal{C}}(X, Y)$ , $\mu$ must uphold that structure in $\text{Im } F$ and $\text{Im } G$ , i.e., all of $\mathcal{C}$ ’s structure.

Whiskering

Loosely speaking, whiskering is the composition of functors and natural transformations. Suppose we have a natural transformation $\eta:F\implies G$ between two functors $F, G:C\rightarrow D$ , along with another functor $H:D\rightarrow E$ .

Whiskering $H$ and $\eta$ yields the natural transformation $H\eta: H\circ F\implies H\circ G$ , where $(H\eta)_X = H(\eta_X)$ :

This is simply the natural mapping between the two available composed functor trajectories. The new map $H\eta$ has components (i.e., $(H\eta)_X$ ) that are the components of $\eta$ mapped through $H$ (i.e., $H(\eta_X)$ ). More explicitly:

Object-based whiskering setup: and — Object-based whiskering setup: $F, G, H,$ and $\eta$

This shows $\eta$ ’s components as they’re mapped by $H$ , relating objects $H(F(X))$ and $G(F(X))$ . $H\eta$ is a natural transformation between functors $H\circ F$ and $H\circ G$ ; this is a bit more clear when isolating the full passthrough:

Isolating the whiskered map between functors and — Isolating the whiskered map between functors $H\circ F$ and $H\circ G$

Vertical composition

Suppose we have functors $F, G, H: C \rightarrow D$ and natural transformations $\eta: F\implies G$ and $\epsilon: G\implies H$ .

We can compose the transformations $\eta$ and $\epsilon$ to produce the map $\eta\circ\epsilon: F\implies H$ . That is, we get a single map between $F$ and $H$ by gluing together intermediate natural transformations component-wise, i.e.,

$(\epsilon\circ\eta)_X = \epsilon_X\circ\eta_X$

Here we depict vertical composition as “stacking” functors with the same source and target categories, compacting them to produce direct maps between indirectly connected functors.

Horizontal composition

Suppose we have functors $F, G: C \rightarrow D$ and $J, K: D \rightarrow E$ , with natural transformations $\eta: F\implies G$ and $\epsilon: J\implies K$ .

The horizontally composed natural transformation $\epsilon * \eta: J\circ F\implies K\circ G$ is defined component-wise:

$(\epsilon * \eta)_X = \epsilon_{G(X)}\circ J(\eta_X) = K(\eta_X)\circ \epsilon_{F(X)}$

Object-based horizontal composition setup

Notice how this is (loosely) something like generalized whiskering: rather than a single natural transformation, we connect composed functors through multiple natural transformations across different categories. The same principle applies: the composed map $\epsilon *\eta$ lets you “stop off” and move through $\epsilon$ or $\eta$ .

On the left, we see how $\epsilon_{F(X)}$ connects the same object, $F(X)$ , across both functors $J$ and $K$ ; this is our usual intuition for natural transformation components as morphisms. Once we’re at $K$ ’s object for $F(X)$ , we can take $K$ ’s version of $\eta_X$ to move from $F$ to $G$ . The right side takes an alternative route: first moving from $F(X)$ to $G(X)$ via $J$ ’s version of $\eta_X$ , then taking $\epsilon_{G(X)}$ to travel to $K$ .

Either way, it’s worth recalling that natural transformations are composed of morphisms between associated objects under the source and target functors. For an object $X$ , the morphism components link $F(X)$ and $G(X)$ . In the case of horizontal composition, we’re defining a natural transformation component-wise that links objects $(J\circ F)(X)$ and $(K\circ G)(X)$ in the usual sense, connecting functors $J\circ F$ and $K\circ G$ . We simply recognize there are multiple ways to establish this connection provided we have $\eta$ and $\epsilon$ : move by $J(\eta)$ then $\epsilon$ , or $\epsilon$ then $K(\eta)$ . Under the hood, these movements should be exactly the same, and constitute the morphisms in our new natural transformation.

Informal remark: this is clearly a bit more involved than vertical composition. Vertical movement (specifically moving down) in our diagrams represents movement between functors along a natural transformation. Vertical composition, then, in name alone, sounds more natural…and it is. There we simply stack transformations and objects pass straight down, the beginning and end of the route is easy to connect. Horizontal composition, however, requires movement across multiple categories, with routes that move both right and down. Tracking objects is therefore less trivial, with multiple possible paths. A new, abstracted natural transformation is possible to define all the same.

Summary:

Whiskering: maps between functors facilitates maps between composed functors; stop off to take the defined natural transformation, then complete the journey.
Vertical composition: indirectly connected functors (over the same categories) can be connected directly
Horizontal composition: generalized whiskering (roughly speaking); maps between functors can be chained to map between composed functors.

Monad

As the (apparently) infamous definition goes: a monad is a monoid in the category of endofunctors⁴. We can try to set this up step-by-step:

An endofunctor of a category $C$ is a functor from the category to itself.
The category of endofunctors treats those functors as its objects, and there are morphisms, as in any other category, between them⁵. The category of functors between two categories $A$ and $B$ is denoted $[A, B]$ ; therefore the category of $C$ ’s endofunctors is written $[C, C]$ .
Taking that category, we recall what makes a monoid (at least, as we’ve studied it in $\text{Set}$ $Set$ ): associativity and identity. Functor composition and the identity functor satisfy these requirements:
- The identity functor $1_C$ in the category $C$ maps each object/morphism to itself (and is therefore an endofunctor). For an endofunctor $F$ in $[C, C]$ , $F = F \circ 1_C = 1_C \circ F$ . Therefore $1_C$ satisfies our requirements for identity on objects in $[C, C]$ .
- By virtue of the fact that functors serve as morphisms in categories of categories, composition of functors is associative as is generally required by morphisms.
We say that the identity functor and composition induce a monoidal structure on $[C, C]$ . Note that we generally write monoidal categories as the triple $(C, \otimes, I)$ (where $\otimes$ is a bifunctor), containing monoid objects $(T, \mu, \eta)$ where
- $\mu: T \otimes T \rightarrow T$ is the multiplication morphism
- $\eta: I \rightarrow T$ is the unit morphism
subject to some associative and identity constraints (loosely). So our category of endofunctors over $C$ can be written explicitly as the monoidal category $([C, C], \circ, I_C)$ , and a monad on C is a monoid object $(T, \mu, \eta)$ in that category.
The morphisms between functors are generally referred to as natural transformations. These transformations preserve the internal structure (i.e., composition of morphisms) in the underlying categories.

Let’s look at the monoid object a bit more closely:

$(C, \otimes, I)$ is a monoidal category, i.e., a category $C$ endowed with an associative bifunctor $\otimes: C\times C\rightarrow C$ . $C\times C$ is a product category, with objects as pairs of objects from $C$ and morphisms as pairs of morphisms between individual objects in the object pairs. The monoidal part of this just follows from our set-based definition, i.e., having an associative binary operation (bifunctor) defined over elements (objects, morphisms) of the set (category).

Our monoidal category of interest is $([C, C], \circ, I_C)$ , i.e., the endofunctors on $C$ together with (associative) functor composition. An object $T$ in this category is an endofunctor $T: C\rightarrow C$ , and we have some defined morphisms for that object (which are actually natural transformations; recall that natural transformations are morphisms in categories of functors):

Unit $\eta: 1_C\rightarrow T$ ; tells us how every object/morphism in $C$ is involved or transformed by $T$ . It serves as an “injection” of values into the monad.
Multiplication $\mu: T^2\rightarrow T$ ; a reduction/flatten/join-like transformation, telling us how to flatten out doubly nested monadic structures back to “regular” monad objects.

Monads as -endofunctors, with its natural transformations and — Monads as $C$ -endofunctors, with its natural transformations $\eta$ and $\mu$

Object-level view of a monad, explicitly showing objects as functors. Put crudely: monads are associative maps between associative maps between associative maps.

Above we visualize these as morphisms between functor objects in the category of endofunctors on $C$ . These are the canonical “movements” between functors that need to be defined, but we can whisker in $T$ to move up and down levels of composition:

Nested levels of structure can always be “unwrapped” into their component pieces, until either $\mu$ or $\eta$ can be applied to one of the items. The coherence conditions relate the different possible whiskering orders, requiring that composed maps are equal:

For natural transformations $T^3\rightarrow T$ :

$\mu\circ T\mu = \mu\circ \mu T$

This condition is analogous to associativity for monoids. We’re starting from $T\circ T\circ T$ , and can write parentheses:
- $T\circ (T\circ T) \rightarrow T(\mu)$
- $(T\circ T)\circ T \rightarrow (\mu) T$
Where we draw parentheses is where we’re allowed to apply $\mu$ (this is not suggesting the left and right side are equivalent; just a depiction of how the forms fall out when it comes to associativity). This is initially a bit confusing because $\mu T$ and $T\mu$ just look like a commutative swap, when in fact they are two distinct maps resulting from application of $\mu$ at different places.

Clarity: this is something like doing

$(a+a)+a = a+(a+a) \implies (2a)+a = a+(2a)$

When we look at the right side, the statement appears commutative in nature: we’re just swapping operands. But this is downstream from an associative statement, and the simplification on its own is a bit misleading.
For natural transformations $T\rightarrow T$ :

$\mu\circ T\eta = \mu\circ \eta T = 1_T$

where $1_T$ is the identity natural transformation on $T$ .

This condition is analogous to identity for monoids.

So just like we wrote the monoid $(M, \cdot, e)$ for sets, we have the categorical equivalent in $(T, \mu, \eta)$ . $T$ is the underlying object (an endofunctor), $\mu$ is a binary operation (natural transformation on $T^2$ ), and $\eta$ is an identity element.

Generalizing arity

Small confusing point for me here: the analogy of $\mu$ to a binary operation, in the usual sense, was unclear to me. We have $\mu: T^2\rightarrow T$ , but $T^2$ is still just a functor, i.e., $T\circ T$ , so it feels just like a unary operation defined over any other functor, e.g., some natural transformation $f: T\rightarrow T$ . That is, $\mu$ isn’t defined over pairs of objects, nor does it map explicitly from two functors; composition can at least be seen to operate on two functors $\circ(F_A, F_B) \mapsto F_C$ . $\mu$ , however, doesn’t operate explicitly on two items.

A quick, perhaps obvious thing: maps can always be seen as operating on one object. Even multivariate functions can just be seen to map from a single $n$ -tuple, rather than being a construct capable of accepting $n$ separate objects. Of course, this changes nothing about what any map of this kind is capable of, but it’s a simplifying perspective that helps open new possible interpretation for the term “arity.”

We might now say the following: a map’s arity isn’t simply the number of arguments it can accept. Rather, it’s a quantifiable property of the domain of objects over which it’s defined. Addition over the reals, $+:\Bbb R^2\rightarrow\Bbb R$ , remains the same 2-ary operation in the usual sense: it maps from the space of 2-tuples, explicitly pairing up two items. Implicitly here $\Bbb R^2$ refers to the cartesian product $\Bbb R\times\Bbb R$ , and the arity of the resulting map is aligned with the dimensionality of the product space domain. But what if we had an operation other than $\times$ , e.g., $\otimes$ ?

This is precisely the question that leads us to analogizing $\mu$ with a (typical) binary operation. In monoidal categories, we saw above that $\otimes: C\times C\rightarrow C$ is a bifunctor. (Note how $\times$ is used in the definition of $\otimes$ ; this whole “arity generalization” only really applies inside monoidal contexts, so we have to get there first with existing machinery.) This can be seen to operate on pairs of objects (and morphisms) in $C$ , mapping them to other objects (and morphisms) in $C$ . $O\otimes O$ is simply us applying $\otimes$ to the same object $O$ , and this is the new form of the domain for the binary operation-like map $\mu$ . We’re simply generalizing the way we build the domain from two objects by letting our bifunctor $\otimes$ “put things together” instead.

In total, for our multiplication morphism $\mu: O\otimes O\rightarrow O$ defined for monoid objects in any monoidal category, we have a map that doesn’t strictly look like a binary operation in the usual sense (hence this messy aside). It is nevertheless constructed around a domain that is built from two objects with $\otimes$ , and that’s good enough for a general analogy to binary operations. It’s worth noting that for monads, in particular, $\otimes$ is fixed as functor composition $\circ$ , and $\mu$ ’s domain $T^2 = T\circ T$ is still just a functor. $\mu$ ’s “arity” is therefore more akin to levels of nested composition rather than number of dimensions or arguments: it flattens.

Whiskering

\mu

and

\eta

(detailed)

Whiskering and — Whiskering $\mu$ and $T$

When grappling with the intuition behind the coherence conditions, or really just the general positioning of monoid objects, I find the following helpful:

Monoidal categories are associative safe spaces. They allow one to rewrite the rules of association, and with that new mechanism ensure things behave in a familiar, expected way. Very simply put: you start with a category $C$ , a collection of objects and relations between them. You then define the notion of association $\otimes$ , a bifunctor, which combines pairs of objects in $C$ into new $C$ objects. You’re now in a world endowed with a new “product substrate” for defining “binary operations:” you take an object $A$ from that world, and a binary operation on the object maps from $A\otimes A$ to $A$ .

Recall that typically, i.e., in $\text{Set}$ , binary operations are defined on specific sets, having structure $f: S\times S\rightarrow S$ . The movement to the categorical generalization…

loosens the kinds of objects we use, relaxing to categories other than $\text{Set}$ , and
generalizes the means of association, relaxing from $\times$ to $\otimes$ , allowing for more than just simple object pairing.

In short: monoidal categories are universes with their own notion of “productization” (under the monoidal product $\otimes$ ). This establishes the foundation upon which binary operations on objects in that universe are built. Put another way: monoidal categories redefine the notion of association (what a pair of objects is), and binary operations define means of combination under that paradigm. The former describes how we put objects together (ambient productization; $a,b\mapsto (a,b)$ , $f,g\mapsto g\circ f$ , whatever), and the latter is an action of combination hip to that notion of togetherness (internal combination; $(a,b)\mapsto a\cdot b$ , etc). (It’s not so easy to give a simple form for operating on combined map-like objects such as $g\circ f$ ; such an operation would typically be defined on an element-wise basis, i.e., if $(g\circ f)\mapsto h$ , we’d talk about how we map between items $g(f(x))$ and $h(x)$ rather than something more abstract.)

Monads are what we get when we want our objects to “understand” a whole type of (other) object (i.e., a category): no choice of object $A$ in $A\otimes A$ is partially defined. In monoids on $\text{Set}$ , we must pick a particular set $S$ over which to define the binary operation (as discussed above), fundamentally allowing the exclusion of some objects that could be included in sets. That is, the set $S$ doesn’t strictly contain a whole class of object (on purpose): we have the flexibility to pick arbitrary sets over which the operation is defined. If we instead wanted $S$ to be representative of an entire type, we’d be effectively asking for $S$ to become a category itself (i.e., to fundamentally represent a certain kind of object if we’re going to define operations over it). So we let $A$ not be an arbitrary object in the desired type’s category, but we “lock that category in” by operating instead on endofunctors of that category. In this case, our objects are maps defined over the entire type: $A\otimes A$ involves a specific functor, but each functor is defined over the entire type/category of interest.

Putting this all together: monads leverage the machinery of monoids to define a universe where “products” are sequences of composition; this is the fundamental thing monoidal product’s $\otimes$ allow us to do. That composition takes place over not just any old functions defined between two objects, but instead on a map $T$ defined over an entire class of object (i.e., a functor). Such a map can be seen as “cladding” objects of the underling type/category in additional structure: it “reskins” the object class by coating every object and morphism individually.

A movement under that map represents a movement into a new structured world. The type of the underlying objects remains the same (it’s an endofunctor; we don’t leave the category), but the nature of the objects on the other side is different (and different in a consistent way, having the newly added structure). Note that $T$ indiscriminately adds structure to all objects in the category: it can be seen to map object $X$ to $T(X)$ , and since $T(X)$ is already in the category, it’s also acted upon by $T$ in that single application, mapping to $T(T(X))$ , and so on. Point is, $T$ adds structure indiscriminately, so multiple composition steps can add too much structure: “already structured” objects receive an extra layer of structure. $\mu$ is a mediator or regulator to address this “bunching up,” flattening doubled up structure back to just a single layer.

Monads are structured computation environments. $\eta$ brings you into it (maps “plain” values into the structured landscape), the functor $T$ keeps you in it (brings all objects and morphisms along for the ride every time), and $\mu$ flattens redundant build up (for those objects and morphisms that didn’t need any more of the functor’s effects). In aggregate, you have the necessary pieces for facilitating smooth, structured sequences of computation in $T$ ’s world.

Example: power set monad

On $\text{Set}$ , the power set monad is a monad $\mathcal{P} = (T, \mu, \eta)$ , where:

$T(A)$ is the power set of a set $A$ , and $T(f)$ is a function of direct images between power sets under a function $f:A\rightarrow B$
$\eta$ consists of component morphisms $\eta_A:A\rightarrow T(A)$ , taking every $a\in A$ to the singleton $\{a\}$
$\mu$ consists of component morphisms $\mu_A:T(T(A))\rightarrow T(A)$ , taking a set of sets to its union.

For a concrete $A=\{1,2\}$ ,

$T(A) = \{\{\},\{1\},\{2\},\{1,2\}\}$
$T(T(A)) = \{\{\},\{\{\}\},\{\{1\}\},\{\{2\}\},\{\{1,2\}\},\dots,\{\{\},\{1\},\{2\},\{1,2\}\} \}$
$\eta_A(1) = \{1\}$ , $\eta_A(2) = \{2\}$
$\mu_A(\{\{1\},\{2\}\}) = \{1,2\}, \dots$

Common examples from functional programming

Felt the need to spell out more of this here because I find myself worried about the possible flexibility of $f$ . As in, it’s not exactly clear why things should automatically fall into place to ensure it’s the only option. But the fact you can’t really “fill in” all the gaps (outside of the options that $g$ is defined over) in arbitrary ways helps me accept this; you must fill them in a way that ensures $f$ respects the category’s morphisms.
↩︎
This particular perspective lacks a little precision since our map is formulated more as a set-valued function $[\cdot]$ that takes a representative value to its equivalence class, but it’s nevertheless induced by an underlying relation.
↩︎
Going to the trouble to formulate things this way is really meant to help me avoid falling into my typical trap: getting caught on the seeming open-endedness of the structure-preserving commutative form. When I “mask away” $g\cdot x$ behind $x^\prime$ as simply our symmetry’s “output” when acting on $x$ , it becomes much easier to just follow $f$ ’s application on both $x$ and $x^\prime$ . Those two have a clear relationship before $f$ , and that same relationship needs to be there after $f$ between whatever new items $f$ produces. But when I leave $x\prime$ as $g\cdot x$ , I find I get distracted by our $f(g\cdot x)$ step and want to deconstruct it, reading more into it than is actually there. Really, $g\cdot x$ is a new thing, and we write in terms that communicate precisely how it’s related to $x$ (which is of course clean and elegant)…it’s just that doing those two together often leads me astray.
↩︎
Here I’m actually seeking a setup without some of the usual functional programming weight slapped on. That’s not to say it’s not helpful, but I want to feel the more abstract fundamentals hit home first (or at least now given all of the loose context I’ve absorbed with the 10 different tutorials that start with Maybe).
↩︎
Right away this is a little confusing: morphisms in a category of functors are map-like things being applied to map-like things. Functors are the maps applying on the entire category (mapping all objects/morphisms in $C$ onto themselves, observing totality but not necessarily surjectivity), and the morphisms between them
↩︎

debug mode

key	value
id	39989
path	/home/smgr/Documents/notes/Abstract_algebra.md
rpath	Abstract_algebra.md
name	Abstract_algebra.md
title	Abstract algebra
link	Abstract_algebra
ftype	md
ctime	1760225077.31
mtime	1760225053.0
atime	1760225053.0
id_1	4738
name_1	Abstract_algebra.md
type	wiki
yaml_text	title: Abstract algebra created: 2020-05-25 modified: 2025-10-11 16:24 datelink: [[2020-05-25]] type: wiki summary:
id_2	10973
name_fmt	Abstract_algebra.md+html5
name_2	Abstract_algebra.md
format	html5
content	General properties Associativity: the order in which operation evaluation is performed does not affect the result. That is, for a given expression, you can pick operations and “resolve” them, one by one, in any order. Note that we’re not saying we allow any order of the operands (that’s commutativity), but rather that the application order doesn’t matter. Associativity can be formally stated as follows: if, for all $a,b,c\in S$ , the equation $(a\cdot b)\cdot c = a\cdot (b\cdot c)$ holds, then the binary operation $\cdot$ on the set $S$ is associative. Note how this is the most simple distillation of our application invariance mentioned above: the operands remain in a fixed order, but parentheses can be re-written arbitrarily. Intuition: what does associativity buy us? Intuitively, it feels something like an associative operation can “see” the components of composite operands. The form $(a\cdot b)\cdot c$ takes $a$ and applies pieces step-by-step, i.e., first adding $b$ , then mixing in $c$ after. Associativity says we can first mix up $b$ and $c$ and give the result to $a$ , and it doesn’t change $a$ ’s reaction: the $b$ and $c$ mixture is not a fundamentally new compound, but instead has transparent ingredients. In practice, this matters when we don’t want to have to worry about when we assemble the pieces of a composite result, so long as they have the same final order. Algebraic structures An algebraic structure is a mathematical object consisting of a set, operations defined over the set, and a set of axioms/identities satisfied by the operations. General axioms: Associativity: Divisibility: Identity: for every element in $x\in X$ , there is an element $e\in X$ such that $x\cdot e = e\cdot x = x$ . Note one obvious implication of this: our binary operation preserves the presence of all items in the base set, i.e., is “full.” Structure types Magma: $(M, \cdot)$ ; a set $M$ with a single closed binary operation $\cdot$ . Quasigroup: a magma endowed with a notion of division, i.e., a pair $(Q, )$ such that $ax = b \;\;;\;\; ya = b$ are defined for unique $x, y\in Q$ , with solutions written $x=a\backslash b$ and $y = b/a$ . The operations $\backslash$ and $/$ are left and right division, respectively. Unital magma* Semigroup: an associative magma, i.e., a set $S$ together with an associative binary operation. Loop: a quasigroup with identity Associative quasigroup Monoid: a semigroup with identity, written as a triple $(M, \cdot, e)$ , where $e$ is the identity element. A submonoid of a monoid $(M, \cdot, e)$ is a monoid $(N, \cdot, e)$ such that $N \subseteq M$ . That is, it is a monoid defined around a base set that is a subset of the “supermonoid’s” base set, has the same identity, and remains closed under the inherited binary operation. A subset $S$ is a generator of a monoid $M$ if the smallest submonoid of $M$ containing $S$ is $M$ . That is, $S$ needs to be a “tight base set” in order to be a generator; one can’t include $S$ in the base set to produce a monoid “smaller” than $M$ . Monoids with finite generators are said to be finitely generated. Group: Free monoid The free monoid on a set (or alphabet) $A$ is the monoid $(A^, ., "")$ , where $.$ is the string concatenation operator, the empty string is the identity element, and $A^$ is the Kleene star (the unary operator $$ ) applied to $A$ . We often refer to the free monoid as $A^$ (which is a little confusing since one writes the monoid’s internal set as $A^$ as well). A graded monoid* is a monoid that can be written as $M = M_0\oplus M_1\oplus \cdots \oplus M_n$ which is to say it can be “factored” or “graded” as a collection of submonoids (i.e., it’s layered). All free monoids are graded; $M_i$ contains the free monoid’s strings of length $i$ . The “base set” $A$ over which the Kleene star (or Kleene plus) is applied contains the free generators of the free monoid $A^$ . Here there’s a fairly clear analogy to vector bases: the free generators are “single-letter words” (i.e., characters) whose arbitrary combination forms all elements in the monoid. Further, the cardinality of the set of free generators is referred to as the rank* of the free monoid (although each free monoid has exactly one set of free generators, while vector spaces can have many bases). We also call this free generating set a basis. A code is a set of words $C$ whose Kleene star $C^$ is a free monoid for which $C$ is a basis. Free object Free objects are to categories what bases are to vector spaces, roughly speaking. A basis is a fundamental characterization of a vector space, and linear transformations between spaces can be captured entirely by their values on that basis. Free objects are effectively objects with a known set of generators, analogous to a basis together with the vector space it spans. In particular, let $C$ be a concrete category* with a faithful (forgetful) functor $U: C \rightarrow \text{Set}$ (i.e., $U$ is an injection from objects in $C$ to sets). Let $X$ be a set, to serve as the basis of the free object. Then the free object on $X$ is a pair $(A, i)$ , where $A$ is an object in $C$ and $i: X\rightarrow U(A)$ is the canonical injection. Here $U$ can be thought of as a map that “strips away” all structure but an object’s base set, and $i$ is therefore responsible for connecting $A$ ’s basis (or generators) to items in $A$ ’s base set. Free objects also satisfy the universal property that, for any other object $B$ in $C$ with a map $g: X \rightarrow U(B)$ , there is a unique $f: A\rightarrow B$ such that $g = U(f) \circ i$ . What this tells us is, if we want a map $f$ from objects $A$ to $B$ , it is uniquely determined by how we choose to map the $A$ ’s basis $X$ into $B$ ’s base set, $U(B)$ . Intuitively, what we’re doing here is “peeling back” the layers of objects in arbitrary categories and establishing a fundamental link using the common language of sets. When a (free) object is generated by a known set $X$ , we can treat those items in $X$ as guiding axes that map into the core set-based representation of a target object. We can then “reapply” the layers/structure on top of this fundamental relation, getting a map between full objects for free. Example: free monoids Returning to free monoids, we can apply the details for free objects more generally as laid out above. Recall our general notation for the monoid as a triple $M = (A^, ., ``")$ defined over a set $A$ . We have the forgetful functor $U: \text{Mon}\rightarrow\text{Set}$ which simply forgets the operation and identity, i.e., mapping $M$ to $A^$ . $A$ is the base set of generators, the basis for the monoid. The generator injection $i: A\rightarrow U(M)$ canonically maps each element of $A$ into the monoid. In this case, the symbols in $x\in A$ are mapped to one-letter words, i.e., $x\mapsto [x]$ , as they appear in $U(M) = A^$ . For any other monoid $N$ , along with a function $g: A\rightarrow U(N)$ , we get a unique monoid homomorphism $f: M \rightarrow N$ . This function has the specific form $f(x_1x_2\cdots x_n) = g(x_1)\cdot g(x_2) \cdots g(x_n)$ That is, the string $x_1x_2\cdots x_n$ in $M$ gets mapped to the equivalent in $N$ after replacing each symbol $x_i\in A$ with the string in $N$ that it’s mapped to by $g$ . For example, suppose we took the base set $A = \{a, b, c\}$ for monoid $M$ , and $B = \{x,y,z\}$ for monoid $N$ . Then a function $g(a)=x \; ;\; g(b)=y \; ;\; g(c)=z$ determines a unique mapping between the entire monoids, and we can convert any string as was written above, e.g., $aaccbb \rightarrow xxzzyy$ . Note the flexibility in the assignment of $M$ ’s base symbols to strings* in $N$ (i.e., any element in $U(N)$ , which in this case is $B^$ ). For example, it’s just as valid to define $g$ as $g(a)=xx \; ;\; g(b)=yy \; ;\; g(c)=zz,$ and conversion takes place as before under $f$ , e.g., $acc\rightarrow xxzzzz$ . Again, the universal property that applies here suggests that specifying $g$ uniquely determines* how we map the monoid objects onto each other. One additional observation to note here is how small the scope is for the morphism equivalence $g = U(f)\circ i$ . The domain of $g$ is $M$ ’s alphabet (or base set), and that’s all we’re even allowed to specify. Therefore $U(f)$ need only match $g$ at these points of definition, and it has no duty to uphold usual monoid homomorphism properties (namely identity and multiplication preservation). But once we “extend” that back to a full monoid homomorphism $f$ , it needs to match $g$ along the generators we’ve provided while also taking care to actually be a monoid homomorphism, by definition. All that to say, there is indeed extra structure that comes into the picture as we bring $g = U(f)$ back into the category we’re working in, and those extra restrictions play a role in constraining the map to uniqueness¹. Natural transformations Foreword: this section is adapted from a very large “development aside,” i.e., a section I wrote to address concerns with my intuition as I worked through relevant concepts. As such, it’s far less structured than the other sections on this page, but is central to my understanding of maps, in the most general sense. Natural transformations feel so natural, I struggled to see why they were even explicitly defined. Perhaps more accurately, I had a poor understanding of the implications of the constraints that make up a natural transformation and what it’d mean to violate them. As we’ve established, natural transformations are best introduced as morphisms in categories of functors, and the constraints that apply ensure that those morphisms behave in an appropriate way (according to the usual category rules, that is). In order to do that, we need to “break open” the objects they’re being applied to: the functors. This gives us the family of morphisms that apply to the objects inside the source/target categories of the functors. If you doubt this move, consider what else we really have to work with; to appropriately map between map-like things, one really must inspect what changes between the maps in question (i.e., for the same input, how do the maps’ outputs change?). If we can consistently characterize the change in outputs between two functors (which includes both objects and morphisms of the underlying categories), then we’ve got the thing we wanted. Intuitively, the naturality square ensures that we can naturally move around in categories before and after a functor is applied. It is a consistent bridge between categories induced by functors that implies the order of morphism and functor application doesn’t matter. The analogy I like here (although I came up with it, so it may be dubious) is that the coherence conditions ensure “safe passage” through chains of functors. If you’re moving from a source object to a target object via a series of morphisms across categories, as long as you have natural transformations between the functors linking those categories, you can apply your morphisms in whichever category you like: you can load them up all in the first category, do a morphism-functor-morphism-functor, or wait until the very end. They will all yield the same final object, and you never have to worry that you took the “wrong exit” via a particular morphism in a particular category that changes the dynamics for the rest of your journey. Let’s look at a small, concrete example: functors between a few set-like objects. Here we have a category $\mathcal{C}$ with three sets $X_1$ , $X_2$ , $X_3$ and two morphisms $f_A$ , $f_B$ between them as depicted: The diagram also shows two functors $F$ and $G$ that map between $\mathcal{C}$ and another category $\mathcal{D}$ , itself containing three set objects. You can see how the functors may change the objects/morphisms in different ways: over $\mathcal{D}$ ’s objects (which, despite the labels of 1, 2, 3, don’t necessarily correspond to those in $\mathcal{C}$ ; it’s just an implicit ordering so we can compare differences consistently), $F(f_A)$ and $G(f_A)$ apply to different source/target objects due to $F$ and $G$ mapping the objects differently (we’ll see this a bit more explicitly in the next diagram), i.e., $F$ maps $X_1$ in $\mathcal{C}$ to $Y_1$ in $\mathcal{D}$ whereas $G$ maps $X_1$ in $\mathcal{C}$ to $Y_2$ in $\mathcal{D}$ . $\eta$ is a natural transformation between functors $F$ and $G$ …at least in name. We need to show that the relevant conditions are met; otherwise we might merely refer to it as a map between functors. Here we can look at the same example, but with additional detail relating objects: As stated before, natural transformations are defined over category components, and by definition we have a family of morphisms: $\eta_{X}: F(X) \rightarrow G(X)$ for any object $X$ in $\mathcal{C}$ . This is called the “component of $\eta$ at $X$ ,” linking the two object targets under the functors for each source object $X$ . In $\mathcal{D}$ , it’s pretty easy to think of this collection/family of morphisms as the stitching holding the functors together. If we land anywhere in $\mathcal{D}$ by using $F$ , we can simply walk across these component morphisms to get to the equivalent under $G$ . We additionally require that, for every morphism $f: X\rightarrow Y$ in $\mathcal{C}$ , we have $\eta_Y\circ F(f) = G(f) \circ \eta_X$ In words, this says the following two processes are identical: For any object $F(X)$ , map it to $G(X)$ via $\eta_X$ and apply $G$ ’s version of $f$ , therefore mapping into $G(Y)$ (recall that a functor $A$ on $C$ must map $f$ to $A(f): A(X) \rightarrow A(Y)$ ). For any object $F(X)$ , first apply $F$ ’s version of $f$ thereby mapping into $F(Y)$ , and then move to $G(Y)$ via $\eta_Y$ . The last equivalence is often expressed by the following commutative diagram (where details are borrowed from our example, but it nevertheless reflects the general form of the naturality square): So now we’ve contextualized the family of morphisms a bit more, and have a general sense for what it might mean to move “naturally” between functors. But the components $\eta_X$ seem to fall out pretty much automatically by simply associating functor target objects…right? When does this not hold? Let’s look at an example with concrete objects and morphisms. Aside: morphisms, once and for all™ It’s worth noting that I needed this level of depth to bump into, yet again, a (seemingly persistent) fundamental misunderstanding of morphisms. Looking at the naturality square, I initially thought something like the following: “well both paths from $Y_1$ get us to $Y_3$ , so things just seem to hold by definition.” The first part of that is perfectly accurate I suppose, but it’s missing the actual meaning of the statement. Morphisms are more than just a map from one object to another, as in they don’t just take in a whole object and produce another whole object. Getting to an object isn’t enough here, it’s getting to it in the same way. Put simply, I was thinking on too high a level, as if morphisms are necessarily maps that must ignore any inner components of the objects they’re acting on. This is likely me just misinterpreting the super high level of abstraction one faces when warming up to category theory. We write morphisms as maps $f: X\rightarrow Y$ and rarely say anything more specific unless working in a specific category, so I pretty much took that mean what it says in the most general sense: a morphism $f$ takes in an object $X$ and gives out an object $Y$ , as if there’s nothing more atomic to act on than entire objects. But of course, a morphism merely maps from an object to another with no explicit guarantees of completeness, and the way you get to the co-domain can vary. Morphisms in general can exhibit all the usual flexibilities of functions in $\text{Set}$ , e.g., not strictly being surjective, and of course two functions $f: X\rightarrow Y$ and $g: X\rightarrow Y$ , despite sharing a domain and co-domain, need not actually map individual components to the same place (i.e., we can have $f(x) \not= g(x)$ for any $x\in X$ ). I faced similar issues over in Category theory§Pullback, with a similarly detailed accounting of what now feels like an incredibly basic point. This just goes to show how easy it can be to misconstrue even simple, foundational concepts. Ultimately I think this just stems from operating at a level of uncomfortable abstraction, and you tend to be very careful making assumptions about anything that isn’t otherwise stated outright. I think the fundamental disconnect here has been the fact that almost no category theory verbiage deals in elements smaller than objects, and I’ve hesitated thinking in terms of anything lower as a result. Perhaps this is a result of not having followed a clear introduction built on the back of sets, which perhaps would’ve made this whole thing a non-issue. After a considerable amount of toiling, I finally feel okay with how we get from the very abstract and vague descriptions of morphisms to maps aware of object structure. I didn’t even think I took issue with this, but I got myself caught in a loop where I just couldn’t feel why maps shouldn’t be required to return full objects. We say generally that a morphism is a map from an object $X$ to an object $Y$ , and that’s about as specific as we get (except for identity and associativity). If we’re operating at this level and saying nothing further about the structure of those objects, it feels pretty straightforward to interpret that as a map taking an object $X$ and giving us an object $Y$ out. The problem is: it doesn’t mean that, or at least it doesn’t have to. Firstly, we stay at such an abstract level because it lets us say only as much as we need: as long as those top-level items are true, we have a viable morphism. Anything that falls under that jurisdiction should then be fair game; from here on out my problems mostly remained with the verbiage. Here I was struggling with the idea that objects are atomic units that can’t be further decomposed. This is again missing nuance; we obviously have some kinds of objects, like sets, that can be seen as containers of other items. When we say objects are atomic, we effectively mean they’re atomic relative to the category theoretic statements being made. That is, they’re the “lowest level” construct those statements need to care about to be true, and they don’t care if there happens to be further structure when one looks closer. Now, actually defining classes of morphisms in specific categories is almost always a process of designing a map-like construct that is not only aware but often preserving of underlying object structure. So it’s not as if morphisms, in the way they’re actually applied, don’t look closer at the objects they apply to; they often must in order to remain consistent. It’s just that one can ignore those specifics and treat objects as black box units when it comes to making broad, category theoretic statements about the structures at play. So decomposing is fine, but how about non-surjectivity? That still irked me a bit, when our top level statements seem to suggest we’re dealing in “whole objects.” It’s worth noting right away that any notion of “partial objects” can only present inside categories where objects can be decomposed further. Again, at the top level, we just don’t see inside the objects, and statements we make cannot apply to a deeper, internal level. So this non-surjectivity is merely a byproduct of behavior we allow inside of a category like Set, because it doesn’t break those top-level rules. If it did, sure, surjectivity would be important to apply everywhere, but our rules of composition and identity on morphisms work all the same whether we have our set function mapping onto the co-domain or not. It is of course perfectly natural to non-surjectively map between two sets, so we also actually want to be able to do this. Here I find embracing the word “to” is effective: morphisms map an object $X$ to an object $Y$ . Under the hood, this may very well correspond to associating an element in $Y$ to each element in $X$ ; that’s perfectly allowed when defining morphisms on set-like objects. To get the entirety of this whole aside, the only thing you need to accept is the following: we are merely taking $X$ to $Y$ , not (necessarily) transforming it 1-to-1, and that’s an acceptable meaning of the term “map.” Imagine how small $X$ can be and how large $Y$ can be and this still make sense as a map. For example, let $X$ be my group of friends and $Y$ be seat numbers in an NFL stadium. Our NFL tickets map each person to their one seat number, but that makes up a tiny fraction of all available seats and is obviously not “onto.” And it shouldn’t have to be for this relation to serve as a valid map between $X$ and $Y$ , not only in the set-level or the category theoretic sense, but in the more pesky linguistic sense that got me into this mess in the first place. From the top-level, such a map really does get to be an arrow between the two whole objects $X$ and $Y$ . How much the image of that morphism actually “fills out” $Y$ just isn’t relevant: to even think about the items in the image requires set-level awareness that cannot be seen when dealing in statements where objects are atomic by definition. Even with all this, I still find it slippery somehow. When I try to root out what’s wrong, I think it just comes down the notation of writing functions according to their domains/codomains, e.g., $f: A\rightarrow B$ . It’s actually this that I take issue with, more than anything inherent to category theory or object level abstractions or whatever else. It really goes beyond functions: it’s with binary relations. The simple fact we can call a link between one object in $A$ to one object in $B$ a relation (which is actually a valid, minimal relation) is what I find kind of pesky and stupid. That just doesn’t feel like what we mean by relation. So something like the single pair (2, green) is a valid relation from integers to colors. By all accounts that’s an association from an integer to a color, but it doesn’t seem like what we’d ever mean by a relation $R: \text{Int} \rightarrow \text{Color}$ (that notation is a little more conventionally functional than barebones relation, but the point stands). That is where I’m getting so hung up I think. I’ve spent even more time on this, with a three part audio recording series to accompany it. But I’ve hit the root of the issue with the following, once and for all. (left side of diagram) Within Set, object structure is visible. Morphisms can be defined over the elements within the objects, and the above depiction includes (effectively) the simplest possible relation between the two (i.e., connecting a single element in X to a single element in Y). Two comments: I’ve complained this doesn’t even feel like a relation (or what we’d mean by any such definition of one), let alone a morphism. But binary relations can be seen to act on all pairs in the Cartesian product between sets, mapping those with related items to True, and False otherwise. This is at least holistic; no elements in the domain or co-domain are systematically ignored just because they aren’t assigned a True value. That is, no elements “slip by” the relation: it can always formally be seen to make a Boolean decision for every possible pair. If you don’t accept that this should “mature” to a morphism at the abstract level, notice that the line you’d then have to draw is arbitrary. How many items must be included for it to be enough, for it to count as a “map?” Notice that, regardless of how “full” your relation may be, as soon as you become blind to element-specific involvement, all you see is a bridge between the two objects. This is what we do as we move to the right diagram. (right side of diagram) Outside of Set, back in the general abstract category theory viewpoint, we lack any insight into internal object structure. One can imagine masking off all element-wise knowledge we had while in Set, leaving the above: two atomic objects with some connection between them. How these objects are related is no longer even a well-posed question at this level of granularity; they simply are or are not connected, and the “endpoints” of the arrow can only be the entire object. So all relations inside of Set, no matter how large, are viewed uniformly once we’re outside of Set. “Thin” relations mature to full morphisms in the abstract view (since they can’t be distinguished from any other relation), and from the abstract view a full morphism simply suggests any relation exists under the hood. It’s worth noting that relations are not even valid morphisms in Set, as they do not generally respect associativity of composition. But once you accept that relations are, in the most basic sense, mappings between sets, then you’re past the issue of non-surjective functions and the idea of yielding “less than Y” via a morphism that doesn’t map onto the co-domain. That is to say: from the abstract point-of-view, any kind of connection between elements within Set simply looks like a connection between the set-objects themselves, and it says nothing about what that relation must look like once you’ve revealed the element-level details. Not involving all elements in your map is not something that can be seen outside of Set, so all relations between any source and target look the exact same. Taking a look back at this some weeks later: I think I do a good job conveying what felt problematic and how I got past it. This is from the perspective of not really recalling (right away, anyway) what the problem was in the first place, and I feel satisfied by the end of this passage. Here I think it may be helpful to simply recap why we’re here in the first place: looking at diagrams like the naturality square, one gets the sense that simply following two trajectories that end up in the same object makes them equivalent, by the sheer fact they point to same place. But that alone doesn’t actually make those (composite) morphisms equivalent, since they’re allowed to have such distinct structure in arbitrary categories (like being functions in Set) while mapping from/to the same objects. So in commutative diagrams, like the naturality square, we’re not saying that any two composite morphisms are equivalent because they start and end at the same places, but instead that any two morphisms starting and ending at the sample place must be equivalent under the morphism structure in the category. That is, not just any morphisms with the right domain/co-domain will do; they must actually be the same in the underlying category in order to uphold the kind of constraints the diagram is talking about. This is annoying/confusing because the diagrams don’t really convey this: you simply see arrows between objects, and it seems to suggest that moving from/to the target objects is good enough. But in reality, they are merely abstracting away any category-level morphism specifics so as to make a category-independent statement, but crucially those specifics will still apply when operating in any such category. That is to say: once you’re actually operating in a specific category, the known structure of its objects and morphisms comes into view, and you cannot ignore that revealed structure. I routinely find myself thinking in terms of Set when staring at commutative diagrams, and it’s a mistake to mix the two levels of abstraction (e.g., allowing oneself to think of the objects as sets, but continuing to treat morphisms as vague, non-functional maps; if you embrace the former, you must also work with the latter, else you’re mixing levels of abstraction). Here we’ll exploit a breaking condition for natural transformations: when the component $\eta_X$ (i.e., that which connects $F(X)$ to $G(X)$ ) depends on static data from the object $X$ itself. Here we’re manufacturing an unnatural dependency, one that allows $F(X)$ to map to $G(X)$ but in a way that does not act “fairly” across $X$ (cheating with knowledge of $X$ ’s values rather than just its structure). The example setup is as follows: $F$ and $G$ are both the identity endofunctor on the category $\mathcal{C}$ . $\mathcal{C}$ contains two objects in Set: $X=\{1,2\}$ and $Y=\{3,4\}$ , as well as a morphism $f: X_1\rightarrow X_2$ that is the constant function $f(x) = 4$ . We then define the component morphisms $\eta_X: X\rightarrow X$ and $\eta_Y: Y\rightarrow Y$ to be the constant functions mapping to their respective set’s smallest element, i.e., $\eta_X(x) = 1$ and $\eta_Y(y)= 3$ . This is effectively a contrived example to ensure $\eta_X$ and $\eta_Y$ disagree somewhere along the chain from $F(X)$ to $G(Y)$ : $X$ ’s values map to $Y$ ’s second value, whereas $Y$ ’s values map onto its own first value. Starting from the value of $1$ in $X$ , we have the following two trajectories: An even simpler demonstration of this is to take $\eta_X$ as shown, and the constant function that maps to $X$ ’s other element, i.e., $f(x)=2$ . One can then see how the naturality square breaks just over the object $X$ when the order of those two morphisms are swapped: $1\rightarrow 1\rightarrow 2$ vs $1\rightarrow 2\rightarrow 1$ So what does this tell us? In this situation, we can see that the order of application matters. The map that is natural-transformation-then-morphism is different than the morphism-then-natural-transformation: their composite morphisms do not yield the same map in total. For consistency in component morphisms, we generally require that the choice of map acts on objects in a coordinate-free manner. That is, it must be equivariant with respect to the morphisms in $\mathcal{C}$ (by definition). Breaks when the map is dependent on static data from the object itself not sufficiently relative; knowledge of the operating object “overwrites” the work done needs to be sufficiently local, sufficiently relative; shouldn’t depend on external object structure. imagine first picking morphisms, then $\eta$ s; and vice versa. This illuminates my confusion a bit since both feel fine but they should get us into an equal amount of trouble On equivariance: invariance is observed when a property of a mathematical object remains unchanged under certain kinds of transformation. Such a property is called an invariant of the object with respect to the transformation. Take, for instance, the area of a triangle: this is an invariant property of the shape under transformations like rotation and reflection. Equivariance is a slightly looser notion of invariance, not requiring a property to remain wholly unchanged under transformation, but to itself be transformed along with it in a predictable way. That is, transforming the object and its property in the same way respects the definition of the property. Keeping with the triangle example, one can look at the triangle centroid. This effectively moves with the triangle and will change under translation. So while it’s not an invariant (numerically the same), it will change with the triangle under the same translation. Structure-preserving maps To preface: equivariance is easily framed from the perspective of structure-preserving maps, which is a term commonly thrown around when speaking about morphisms. Therefore, this too has been a slippery topic for me in the past, and I’ve spent a sizeable chunk of time ensuring I have a comfortable intuition around this. Structure preservation, equivariance, and natural transformations are heavily intertwined; I find it helpful to frame latter two as structure preservation through entire classes or groups of transforms; we’ll formalize this afterwards. For now, I want to clarify the many, many places I’ve found myself stuck on structure-preserving maps and leave no doubt about how we should interpret this moving forward. For starters: what we mean by “structure-preserving,” in the most general sense, can’t really be boiled down to anything more precise. As mentioned, we often call such things, in the abstract, morphisms. In that sense, one might imagine a structure-preserving map need only operate from/to the same kind of object: the very fact we “end up” in the (structured) object suggests we’re upholding necessary structure. But this is not generally what we mean, or is at least often not sufficient. That is to say: being a map with the appropriate external “signature” (i.e., mapping to/from the right kind of object) is alone not enough, as we care about how we move between the objects (and whether that movement preserves some structure within them). Take, for instance, addition over the integers. Suppose we have the groups $G=(\mathbb{Z}, +)$ and $H=(\mathbb{Z}, +)$ , and a map $f:G\rightarrow H$ defined $f:x\mapsto x+1$ . Clearly “pushing” all integers through $f$ still just gives you $\mathbb{Z}$ , and so in a very loose sense upholds “group structure” insofar as there is still a valid group on the other side. But the way we move under $f$ does not preserve addition (which is really all we mean by “structure” in this case): for any $a+b=c$ in $G$ , we do not have $f(a)+f(b)=f(c)$ in $H$ . So even though $f$ moves from a valid group to a valid group, it does not align the elements of the groups in a manner that is consistent with the addition operator. As a reminder, $f$ is unary in $\mathbb{Z}$ . Even if our binary operation (addition) had arbitrary arity, $f$ is a still a bottleneck of sorts through which each integer individually passes. So applying $f$ to individual elements in the underlying set is the only way to apply the transform at all. I’ve found myself occasionally thinking that $f$ could have distinct joint behavior, similar to the operator, but this isn’t true and recognizing as much helps simplify things. When I look at the required equality here, i.e., $f(a+b) = f(a) + f(b),$ I’ve let myself question whether this is the only constraint that could mean $f$ is structure preserving. But recalling how $f$ can only operate on single items in the underlying set, it becomes clear this is really the only thing you even can say in the first place. We use $f$ to map the items of one group onto another one-by-one, and the question is simply whether related items in the source group remain related in the transformed group. $a$ must go to $f(a)$ , $b$ must go to $f(b)$ , and the only real question from there is whether $a+b$ goes to the same place as $f(a)+f(b)$ . That last bit, the only salient part, has made me nervous for reasons I can’t easily articulate. I think it just somehow feels mysterious every time: I have to unpack it and check that it makes sense, and I get lost along the way. Either that, or as I mentioned before, the idea that there could be some other way to express structure preservation that I’m not able to see bugs me; I don’t have good intuition that the form is “tight.” At risk of restating what has already been said, the following might help: We’re looking for a way to relate the transform $f$ to the operation at hand. Both need to be involved for us to establish any such link, and it’s critical to note that we’re observing what happens with the trajectories themselves under $f$ . It’s not $f$ ’s image, it’s not its co-domain; these are more like aggregate consequences of having applied $f$ to every item. Working with $f$ ’s image or its co-domain ignores how we got there, and would therefore say nothing about $f$ at all. The entirety of the setup also avoids relating the transform and the operation: $a$ and $b$ are just any two items in the source set, $f(a)$ and $f(b)$ are the items we map them to, and both $a+b$ and $f(a)+f(b)$ are the results of applying our operation on those pairs. Thus far, we’ve said nothing about how $f$ and $+$ interact, we’ve merely applied them to eligible operands/inputs. Now, if $f$ is going to respect $+$ , the only thing that matters is that it takes a valid application of the operation to another valid application of the operation: that’s just about all one could mean when saying “ $f$ preserves the operation.” If it took three items $a+b=c$ to three items that couldn’t be related via addition, then $f$ would clearly not be compatible with the behavior of that operation. In such a case, I couldn’t rely on my operation as a means of relating items under the movement/flow of $f$ . The operation is like a stitching in the set-fabric between items $(a, b, a+b)$ , and preserving it means the same stitching is present when moving the fibers through $f$ . This can only mean that I produce related items $(f(a), f(b), f(a+b))$ : after I’ve taken my stitches to their new locations via $f$ , I find they’re still intact. That last bit I find the most helpful. Taking a concrete thing like $((a, b), a+b)$ (the operands and their output) removes the burden of feeling like there’s some nebulous structure coming along with the operation for which I need to have developed intuition. That pair is the structure; it’s like an extensional view of the operation (basically its graph). So every such pair that holds true in the source set is a little knot/stitch that locally exhibits the structure we’re trying to preserve, and we cannot untie it under $f$ ’s movement. If $f$ in fact preserves the desired structure, it will move every such knot to another valid knot, locally preserving our operation dynamics. Note that $f$ doesn’t directly map knots to knots, though; it applies uniformly on an element-level basis. Therefore it takes our knot $((a, b), a+b) \rightarrow ((f(a), f(b)), f(a+b))$ In order for that right-hand tuple to be a “valid knot” under $+$ , it must be consistent with the structure of the left-hand side, i.e., have an output element that is the sum of the two operands. So if $f$ preserves the knot, the item in the output slot, $f(a+b)$ , must be equal to the sum of the input items, i.e., $f(a+b) = f(a)+f(b),$ which is the usual constraint. Let’s take a look at a concrete example: Take the red bridge along the top: this is a “local knot” that $+$ ties. That is, it exemplifies the kind of structure that $+$ induces, and as a result that which we want $f$ to preserve. But as we move down the diagram via $f$ , we see that it “unties” the knot, giving us transformed operands whose sum does not match the transformed sum. The red bridge along the bottom shows the knot that would be valid under our transformed inputs. It would be equally reasonable to consider the kinds of inputs $f$ could’ve “chosen” to reach the needed output. In either case, $f$ does not align them correctly: it doesn’t map a $+$ -knot to a $+$ -knot. For the linear transformation $f(x)=2x$ , on the other hand: Here we find the map does respect the $+$ operation (at least over the specific operands we’ve plotted): the knots remained tied through $f$ . Moving back out, recall that we say generally that, for a transformation $f:A\rightarrow B$ and $k$ -ary operations $\mu_A: A^k \rightarrow A$ and $\mu_B: B^k \rightarrow B$ defined over $A$ and $B$ , respectively, $f$ is said to be structure-preserving if the following holds: $f(\mu_A(x_1,\dots,x_k)) = \mu_B(f(x_1),\dots,f(x_k))$ This is very general, and worth exploring the dynamics when $f$ is more like a binary relation. In this case, I found myself thinking of addition in modular arithmetic. Here we have an operation that behaves similarly over entire classes of values: rather than observing the “paired” association of structure $(x,\mu(x))$ with $(y,\mu(y))$ through $f$ (paired as in we relate one “bit of structure” with just one other), we now say something like the structure $(x,\mu(x))$ can be associated with many occurrences $(z,\mu(z)), z\sim x$ , under the relation $\sim$ . Nothing is fundamentally different here save for our map being “wider” in a sense: it links together more structure across objects. Looking at a rough sketch of congruence modulo $k$ , we can imagine “folding up” $\Bbb Z$ to give us $k$ new classes of items: It’s over these equivalence classes that we observe consistent structure under addition. That is, the relation that induces this partition respects the operation: we can first add values in $\Bbb Z$ and map to their equivalence classes, or first map to equivalence classes and “add” them². The diagram above attempts to show what’s happening as move from items to their equivalence classes (going down), and the ways we add those objects (moving right). Note the difference in the 2D representation here compared to similar diagrams above: $x_1$ and $x_2$ are both meant to be $k$ -dimensional, and we’re simply laying out our objects in the grid there. Note also the notion of addition used over equivalence classes is the Minkowski sum, showing how the operation being respected can show up in slightly different forms between $\mu_A$ to $\mu_B$ . In any case, the high-level take away here is that our structure-preserving maps can be more than just functions: they can associate many target objects to a single source, and this naturally abides by the same principles we’ve discussed above. In this particular case, we’re saying that whatever the equivalence class “allegiances” of $+$ ’s operands and output are, they will be respected: addition is consistent up to class assignment. Put another way, $+$ plays along with the new notion of identity induced by the equivalence relation (color in the diagram). If a blue item and green item add up to a red item (following the top bridge), then adding any blue and green items will yield a red item (following the bottom bridge). In total, the relation-like map explored here helps broaden my intuition, shedding light on how general a structure-preserving map really can be. If we’ve allowed a generalization from pairwise associations through functional maps to more “class-like” associations via relational maps, you might naturally ask whether we can do something similar over the operation. That is, can a map respect a class of operations or transformations? This is pretty much exactly what we do with equivariant maps, and we rely on group theory to formalize valid collections of transformations through the same language of operations (i.e., the group action). We explore this in depth below; that’s what kicked off this aside in the first place. One final remark to potentially keep in mind: the notion of structure preservation is really more of a loose hierarchy. Questions like which structure or how much of it can be naturally addressed by strengthening or weakening one’s requirements in a given setting. What is meant by “structure-preserving” is generally very flexible and, within the general framework we’ve formulated here, can accommodate any object, transformation, or interpretation of “map.” One might attempt to formulate a map between groups and find that it fails to be a group homomorphism, but it can perhaps still be seen as a map preserving structure of sets or even some other relaxed notion of symmetry. One also needn’t have maps that are particularly well-defined, e.g., analogous to surjectivity/injectivity; even trivial, information-destroying maps can be structure-preserving over the “places” where they’re defined. All in all: structure preservation is very fundamental in abstract algebra, and having a strong intuition for what it means is important (even if I don’t yet have that, but I sure have spent a lot of time toiling). But it is ultimately incredibly flexible and context-dependent, applying across many different layers of abstraction. Emerging from the aside on structure-preserving maps, the definition of equivariance feels pretty familiar. An equivariant map is one that preserves some notion of symmetry over its source and target objects. We capture this expression of symmetry as a symmetry group, and the preserved operation is the group action on the objects in the domain and co-domain. In particular, we say that an equivariant map $f$ respects the group action of a group $G$ over its $G$ -sets. This is similar in form to that of structure-preserving maps, but with a stronger requirement that structure is held up across all $g\in G$ (rather than just a single operation; instead, we have a whole family of operations). That is, for a group action $\cdot$ and for all $g\in G$ , we have $g\cdot f(x) = f(g\cdot x)$ Again, this just says that the movement through $f$ doesn’t break the structures in place between objects, which in this case is the symmetry represented by the group’s transformations. If we label the output of the group action $x^\prime = g\cdot x$ , we can think of this structure as manifesting concretely in the pair $(x, x^\prime)$ . If our map is structure preserving, we need to see precisely the same structure on the other side of $f$ , i.e., when mapping $x$ and $x^\prime$ we find the results also “meet the criteria” for being bundled up as a tuple. Such a pair would be $(f(x), f(x^\prime))$ which must exhibit the same relationship between the first and second element as we had before $f$ , i.e., $f(x^\prime) = g\cdot f(x)$ . Expanding $x^\prime$ gives us our form above.³ Note that a $G$ -set is just a set acted upon by $G$ : it’s effectively a formal construct that bundles those two together (the set and the group action) such that we get a standalone object. This is just the group equivalent of the “set plus structure” we saw before, e.g., $(\mathbb{Z}, +)$ : we need a way to bring along more than just a set. (Possibly confusingly, $(\mathbb{Z}, +)$ is itself a group. The operation being respected there is what induces the group; with $G$ -sets we’re working with a group action rather than the group’s operation, ultimately more like a functional than a function.) We can see a quick example with points (or shapes) in $\mathbb{R}^2$ . We can apply the rotations in $C_2$ (i.e., just 0 $^\circ$ and 180 $^\circ$ ) and leverage an arbitrary map $f$ between shapes. Here we have a square $\{-1,1\}^2$ and map to a triangle by projecting points with positive $y$ coordinates onto the $y$ -axis, i.e., $(1,1) \mapsto (0, 1)$ and $(-1,1) \mapsto (0, 1)$ : Note that the orientation of the diagram is transposed relative to the form we’ve been using for the naturality square and/or above diagrams. Here we see that $f$ maps the same orientation of the square to the same orientation of the triangle, but our 180 $^\circ$ rotation changes only the orientation of triangle. The map $f$ therefore does not respect the structure of $C_2$ ’s group action (the “family” of operations it represents). On the other hand, if $f$ maps to another shape with 180 $^\circ$ rotational symmetry, we would find the group action is respected. Take, for instance, an $f$ that simply projects the points onto the $x$ -axis: This is equivariance: a map that changes related objects such that the objects on the other side remain related in the same way. And so we return to natural transformations: Here $\mu$ is our structure-preserving map: where there’s “structure” in $F(\mathcal{C})$ (which just means related object pairs and how they’re connected via morphisms), that same structure analog is present on the other side, i.e., in $G(\mathcal{C})$ . As seen with equivariant maps, $\mu$ respects structure of not only a single operation, but instead a whole collection of transformations, which in this case are the morphisms in $\mathcal{C}$ , or $\text{Hom}_{\mathcal{C}}$ . To be precise, $\mu$ respects how structure present in $\text{Hom}_{\mathcal{C}}$ shows up in $\mathcal{D}$ through the functors $F$ and $G$ ; it doesn’t interact with that structure in $\mathcal{C}$ directly. This lines up nicely with the group theory analog discussed for equivariant maps: we had an abstract (symmetric) group $G$ whose transformations were first made concrete through its group action on specific $G$ -sets. $f$ always maps from/to specific objects, and $G$ ’s structure must first be “realized” on those objects (producing the concrete $G$ -sets) in order for us to check how $f$ will behave. That action of “realizing” structure is what matters here, since it’s coming from the same base group: however $g\cdot$ presented for both objects $X$ and $Y$ , the resulting transformations in both “universes” can be fairly compared as they’re based on the same fundamental transformation in $G$ . The exact same thing is happening here with our base category $\mathcal{C}$ : it can be seen as a kind of abstract category that is realized through the functors $F$ and $G$ into the category $\mathcal{D}$ . Everything from there on out takes place solely in $\mathcal{D}$ , just as we didn’t strictly need the group $G$ provided we had $G$ -sets (i.e., objects that already contain the relevant realizations of $G$ ) in the case of equivariant maps. Put another way: once we “project” $\mathcal{C}$ onto $\mathcal{D}$ via functors $F$ and $G$ , we no longer need $\mathcal{C}$ directly. We only need to know which objects/morphisms correspond to the same structure across $F$ and $G$ , i.e., which realized objects/morphisms originate from the same objects/morphisms in $\mathcal{C}$ . In total, we have a $\mu$ that, through its applicable components (i.e., on objects in $\mathcal{C}$ ), respects the structure induced on its domain/co-domain by $\mathcal{C}$ . In the diagram, $X$ and $Y$ are any two objects in $\mathcal{C}$ related by morphism, and for all such pairs of objects together with each of their morphisms $f\in \text{Hom}_{\mathcal{C}}(X, Y)$ , $\mu$ must uphold that structure in $\text{Im } F$ and $\text{Im } G$ , i.e., all of $\mathcal{C}$ ’s structure. Whiskering Loosely speaking, whiskering is the composition of functors and natural transformations. Suppose we have a natural transformation $\eta:F\implies G$ between two functors $F, G:C\rightarrow D$ , along with another functor $H:D\rightarrow E$ . Whiskering $H$ and $\eta$ yields the natural transformation $H\eta: H\circ F\implies H\circ G$ , where $(H\eta)_X = H(\eta_X)$ : This is simply the natural mapping between the two available composed functor trajectories. The new map $H\eta$ has components (i.e., $(H\eta)_X$ ) that are the components of $\eta$ mapped through $H$ (i.e., $H(\eta_X)$ ). More explicitly: This shows $\eta$ ’s components as they’re mapped by $H$ , relating objects $H(F(X))$ and $G(F(X))$ . $H\eta$ is a natural transformation between functors $H\circ F$ and $H\circ G$ ; this is a bit more clear when isolating the full passthrough: Vertical composition Suppose we have functors $F, G, H: C \rightarrow D$ and natural transformations $\eta: F\implies G$ and $\epsilon: G\implies H$ . We can compose the transformations $\eta$ and $\epsilon$ to produce the map $\eta\circ\epsilon: F\implies H$ . That is, we get a single map between $F$ and $H$ by gluing together intermediate natural transformations component-wise, i.e., $(\epsilon\circ\eta)_X = \epsilon_X\circ\eta_X$ Here we depict vertical composition as “stacking” functors with the same source and target categories, compacting them to produce direct maps between indirectly connected functors. Horizontal composition Suppose we have functors $F, G: C \rightarrow D$ and $J, K: D \rightarrow E$ , with natural transformations $\eta: F\implies G$ and $\epsilon: J\implies K$ . The horizontally composed natural transformation $\epsilon * \eta: J\circ F\implies K\circ G$ is defined component-wise: $(\epsilon * \eta)_X = \epsilon_{G(X)}\circ J(\eta_X) = K(\eta_X)\circ \epsilon_{F(X)}$ Notice how this is (loosely) something like generalized whiskering: rather than a single natural transformation, we connect composed functors through multiple natural transformations across different categories. The same principle applies: the composed map $\epsilon \eta$ lets you “stop off” and move through $\epsilon$ or $\eta$ . On the left, we see how $\epsilon_{F(X)}$ connects the same object, $F(X)$ , across both functors $J$ and $K$ ; this is our usual intuition for natural transformation components as morphisms. Once we’re at $K$ ’s object for $F(X)$ , we can take $K$ ’s version of $\eta_X$ to move from $F$ to $G$ . The right side takes an alternative route: first moving from $F(X)$ to $G(X)$ via $J$ ’s version of $\eta_X$ , then taking $\epsilon_{G(X)}$ to travel to $K$ . Either way, it’s worth recalling that natural transformations are composed of morphisms between associated objects under the source and target functors. For an object $X$ , the morphism components link $F(X)$ and $G(X)$ . In the case of horizontal composition, we’re defining a natural transformation component-wise that links objects $(J\circ F)(X)$ and $(K\circ G)(X)$ in the usual sense, connecting functors $J\circ F$ and $K\circ G$ . We simply recognize there are multiple ways to establish this connection provided we have $\eta$ and $\epsilon$ : move by $J(\eta)$ then $\epsilon$ , or $\epsilon$ then $K(\eta)$ . Under the hood, these movements should be exactly the same, and constitute the morphisms in our new natural transformation. Informal remark: this is clearly a bit more involved than vertical composition. Vertical movement (specifically moving down) in our diagrams represents movement between functors along a natural transformation. Vertical composition, then, in name alone, sounds more natural…and it is. There we simply stack transformations and objects pass straight down, the beginning and end of the route is easy to connect. Horizontal composition, however, requires movement across multiple categories, with routes that move both right and down. Tracking objects is therefore less trivial, with multiple possible paths. A new, abstracted natural transformation is possible to define all the same. Summary: Whiskering: maps between functors facilitates maps between composed functors; stop off to take the defined natural transformation, then complete the journey. Vertical composition: indirectly connected functors (over the same categories) can be connected directly Horizontal composition: generalized whiskering (roughly speaking); maps between functors can be chained to map between composed functors. Monad As the (apparently) infamous definition goes: a monad* is a monoid in the category of endofunctors⁴. We can try to set this up step-by-step: An endofunctor of a category $C$ is a functor from the category to itself. The category of endofunctors treats those functors as its objects, and there are morphisms, as in any other category, between them⁵. The category of functors between two categories $A$ and $B$ is denoted $[A, B]$ ; therefore the category of $C$ ’s endofunctors is written $[C, C]$ . Taking that category, we recall what makes a monoid (at least, as we’ve studied it in $\text{Set}$ ): associativity and identity. Functor composition and the identity functor satisfy these requirements: The identity functor $1_C$ in the category $C$ maps each object/morphism to itself (and is therefore an endofunctor). For an endofunctor $F$ in $[C, C]$ , $F = F \circ 1_C = 1_C \circ F$ . Therefore $1_C$ satisfies our requirements for identity on objects in $[C, C]$ . By virtue of the fact that functors serve as morphisms in categories of categories, composition of functors is associative as is generally required by morphisms. We say that the identity functor and composition induce a monoidal structure on $[C, C]$ . Note that we generally write monoidal categories as the triple $(C, \otimes, I)$ (where $\otimes$ is a bifunctor), containing monoid objects $(T, \mu, \eta)$ where $\mu: T \otimes T \rightarrow T$ is the multiplication morphism $\eta: I \rightarrow T$ is the unit morphism subject to some associative and identity constraints (loosely). So our category of endofunctors over $C$ can be written explicitly as the monoidal category $([C, C], \circ, I_C)$ , and a monad on C is a monoid object $(T, \mu, \eta)$ in that category. The morphisms between functors are generally referred to as natural transformations. These transformations preserve the internal structure (i.e., composition of morphisms) in the underlying categories. Let’s look at the monoid object a bit more closely: $(C, \otimes, I)$ is a monoidal category, i.e., a category $C$ endowed with an associative bifunctor $\otimes: C\times C\rightarrow C$ . $C\times C$ is a product category, with objects as pairs of objects from $C$ and morphisms as pairs of morphisms between individual objects in the object pairs. The monoidal part of this just follows from our set-based definition, i.e., having an associative binary operation (bifunctor) defined over elements (objects, morphisms) of the set (category). Our monoidal category of interest is $([C, C], \circ, I_C)$ , i.e., the endofunctors on $C$ together with (associative) functor composition. An object $T$ in this category is an endofunctor $T: C\rightarrow C$ , and we have some defined morphisms for that object (which are actually natural transformations; recall that natural transformations are morphisms in categories of functors): Unit $\eta: 1_C\rightarrow T$ ; tells us how every object/morphism in $C$ is involved or transformed by $T$ . It serves as an “injection” of values into the monad. Multiplication $\mu: T^2\rightarrow T$ ; a reduction/flatten/join-like transformation, telling us how to flatten out doubly nested monadic structures back to “regular” monad objects. Above we visualize these as morphisms between functor objects in the category of endofunctors on $C$ . These are the canonical “movements” between functors that need to be defined, but we can whisker in $T$ to move up and down levels of composition: Nested levels of structure can always be “unwrapped” into their component pieces, until either $\mu$ or $\eta$ can be applied to one of the items. The coherence conditions relate the different possible whiskering orders, requiring that composed maps are equal: For natural transformations $T^3\rightarrow T$ : $\mu\circ T\mu = \mu\circ \mu T$ This condition is analogous to associativity for monoids. We’re starting from $T\circ T\circ T$ , and can write parentheses: $T\circ (T\circ T) \rightarrow T(\mu)$ $(T\circ T)\circ T \rightarrow (\mu) T$ Where we draw parentheses is where we’re allowed to apply $\mu$ (this is not suggesting the left and right side are equivalent; just a depiction of how the forms fall out when it comes to associativity). This is initially a bit confusing because $\mu T$ and $T\mu$ just look like a commutative swap, when in fact they are two distinct maps resulting from application of $\mu$ at different places. Clarity: this is something like doing $(a+a)+a = a+(a+a) \implies (2a)+a = a+(2a)$ When we look at the right side, the statement appears commutative in nature: we’re just swapping operands. But this is downstream from an associative statement, and the simplification on its own is a bit misleading. For natural transformations $T\rightarrow T$ : $\mu\circ T\eta = \mu\circ \eta T = 1_T$ where $1_T$ is the identity natural transformation on $T$ . This condition is analogous to identity for monoids. So just like we wrote the monoid $(M, \cdot, e)$ for sets, we have the categorical equivalent in $(T, \mu, \eta)$ . $T$ is the underlying object (an endofunctor), $\mu$ is a binary operation (natural transformation on $T^2$ ), and $\eta$ is an identity element. Generalizing arity Small confusing point for me here: the analogy of $\mu$ to a binary operation, in the usual sense, was unclear to me. We have $\mu: T^2\rightarrow T$ , but $T^2$ is still just a functor, i.e., $T\circ T$ , so it feels just like a unary operation defined over any other functor, e.g., some natural transformation $f: T\rightarrow T$ . That is, $\mu$ isn’t defined over pairs of objects, nor does it map explicitly from two functors; composition can at least be seen to operate on two functors $\circ(F_A, F_B) \mapsto F_C$ . $\mu$ , however, doesn’t operate explicitly on two items. A quick, perhaps obvious thing: maps can always be seen as operating on one object. Even multivariate functions can just be seen to map from a single $n$ -tuple, rather than being a construct capable of accepting $n$ separate objects. Of course, this changes nothing about what any map of this kind is capable of, but it’s a simplifying perspective that helps open new possible interpretation for the term “arity.” We might now say the following: a map’s arity isn’t simply the number of arguments it can accept. Rather, it’s a quantifiable property of the domain of objects over which it’s defined. Addition over the reals, $+:\Bbb R^2\rightarrow\Bbb R$ , remains the same 2-ary operation in the usual sense: it maps from the space of 2-tuples, explicitly pairing up two items. Implicitly here $\Bbb R^2$ refers to the cartesian product $\Bbb R\times\Bbb R$ , and the arity of the resulting map is aligned with the dimensionality of the product space domain. But what if we had an operation other than $\times$ , e.g., $\otimes$ ? This is precisely the question that leads us to analogizing $\mu$ with a (typical) binary operation. In monoidal categories, we saw above that $\otimes: C\times C\rightarrow C$ is a bifunctor. (Note how $\times$ is used in the definition of $\otimes$ ; this whole “arity generalization” only really applies inside monoidal contexts, so we have to get there first with existing machinery.) This can be seen to operate on pairs of objects (and morphisms) in $C$ , mapping them to other objects (and morphisms) in $C$ . $O\otimes O$ is simply us applying $\otimes$ to the same object $O$ , and this is the new form of the domain for the binary operation-like map $\mu$ . We’re simply generalizing the way we build the domain from two objects by letting our bifunctor $\otimes$ “put things together” instead. In total, for our multiplication morphism $\mu: O\otimes O\rightarrow O$ defined for monoid objects in any monoidal category, we have a map that doesn’t strictly look like a binary operation in the usual sense (hence this messy aside). It is nevertheless constructed around a domain that is built from two objects with $\otimes$ , and that’s good enough for a general analogy to binary operations. It’s worth noting that for monads, in particular, $\otimes$ is fixed as functor composition $\circ$ , and $\mu$ ’s domain $T^2 = T\circ T$ is still just a functor. $\mu$ ’s “arity” is therefore more akin to levels of nested composition rather than number of dimensions or arguments: it flattens. Whiskering $\mu$ and $\eta$ (detailed) When grappling with the intuition behind the coherence conditions, or really just the general positioning of monoid objects, I find the following helpful: Monoidal categories are associative safe spaces. They allow one to rewrite the rules of association, and with that new mechanism ensure things behave in a familiar, expected way. Very simply put: you start with a category $C$ , a collection of objects and relations between them. You then define the notion of association $\otimes$ , a bifunctor, which combines pairs of objects in $C$ into new $C$ objects. You’re now in a world endowed with a new “product substrate” for defining “binary operations:” you take an object $A$ from that world, and a binary operation on the object maps from $A\otimes A$ to $A$ . Recall that typically, i.e., in $\text{Set}$ , binary operations are defined on specific sets, having structure $f: S\times S\rightarrow S$ . The movement to the categorical generalization… loosens the kinds of objects we use, relaxing to categories other than $\text{Set}$ , and generalizes the means of association, relaxing from $\times$ to $\otimes$ , allowing for more than just simple object pairing. In short: monoidal categories are universes with their own notion of “productization” (under the monoidal product $\otimes$ ). This establishes the foundation upon which binary operations on objects in that universe are built. Put another way: monoidal categories redefine the notion of association (what a pair of objects is), and binary operations define means of combination under that paradigm. The former describes how we put objects together (ambient productization; $a,b\mapsto (a,b)$ , $f,g\mapsto g\circ f$ , whatever), and the latter is an action of combination hip to that notion of togetherness (internal combination; $(a,b)\mapsto a\cdot b$ , etc). (It’s not so easy to give a simple form for operating on combined map-like objects such as $g\circ f$ ; such an operation would typically be defined on an element-wise basis, i.e., if $(g\circ f)\mapsto h$ , we’d talk about how we map between items $g(f(x))$ and $h(x)$ rather than something more abstract.) Monads are what we get when we want our objects to “understand” a whole type of (other) object (i.e., a category): no choice of object $A$ in $A\otimes A$ is partially defined. In monoids on $\text{Set}$ , we must pick a particular set $S$ over which to define the binary operation (as discussed above), fundamentally allowing the exclusion of some objects that could be included in sets. That is, the set $S$ doesn’t strictly contain a whole class of object (on purpose): we have the flexibility to pick arbitrary sets over which the operation is defined. If we instead wanted $S$ to be representative of an entire type, we’d be effectively asking for $S$ to become a category itself (i.e., to fundamentally represent a certain kind of object if we’re going to define operations over it). So we let $A$ not be an arbitrary object in the desired type’s category, but we “lock that category in” by operating instead on endofunctors of that category. In this case, our objects are maps defined over the entire type: $A\otimes A$ involves a specific functor, but each functor is defined over the entire type/category of interest. Putting this all together: monads leverage the machinery of monoids to define a universe where “products” are sequences of composition; this is the fundamental thing monoidal product’s $\otimes$ allow us to do. That composition takes place over not just any old functions defined between two objects, but instead on a map $T$ defined over an entire class of object (i.e., a functor). Such a map can be seen as “cladding” objects of the underling type/category in additional structure: it “reskins” the object class by coating every object and morphism individually. A movement under that map represents a movement into a new structured world. The type of the underlying objects remains the same (it’s an endofunctor; we don’t leave the category), but the nature of the objects on the other side is different (and different in a consistent way, having the newly added structure). Note that $T$ indiscriminately adds structure to all objects in the category: it can be seen to map object $X$ to $T(X)$ , and since $T(X)$ is already in the category, it’s also acted upon by $T$ in that single application, mapping to $T(T(X))$ , and so on. Point is, $T$ adds structure indiscriminately, so multiple composition steps can add too much structure: “already structured” objects receive an extra layer of structure. $\mu$ is a mediator or regulator to address this “bunching up,” flattening doubled up structure back to just a single layer. Monads are structured computation environments. $\eta$ brings you into it (maps “plain” values into the structured landscape), the functor $T$ keeps you in it (brings all objects and morphisms along for the ride every time), and $\mu$ flattens redundant build up (for those objects and morphisms that didn’t need any more of the functor’s effects). In aggregate, you have the necessary pieces for facilitating smooth, structured sequences of computation in $T$ ’s world. Example: power set monad On $\text{Set}$ , the power set monad is a monad $\mathcal{P} = (T, \mu, \eta)$ , where: $T(A)$ is the power set of a set $A$ , and $T(f)$ is a function of direct images between power sets under a function $f:A\rightarrow B$ $\eta$ consists of component morphisms $\eta_A:A\rightarrow T(A)$ , taking every $a\in A$ to the singleton $\{a\}$ $\mu$ consists of component morphisms $\mu_A:T(T(A))\rightarrow T(A)$ , taking a set of sets to its union. For a concrete $A=\{1,2\}$ , $T(A) = \{\{\},\{1\},\{2\},\{1,2\}\}$ $T(T(A)) = \{\{\},\{\{\}\},\{\{1\}\},\{\{2\}\},\{\{1,2\}\},\dots,\{\{\},\{1\},\{2\},\{1,2\}\} \}$ $\eta_A(1) = \{1\}$ , $\eta_A(2) = \{2\}$ $\mu_A(\{\{1\},\{2\}\}) = \{1,2\}, \dots$ Common examples from functional programming en.wikipedia.org/wiki/Free_monoid en.wikipedia.org/wiki/Syntactic_monoid en.wikipedia.org/wiki/Monoid en.wikipedia.org/wiki/Monoid_(category_theory) en.wikipedia.org/wiki/Monad_(category_theory) en.wikipedia.org/wiki/Natural_transformation en.wikipedia.org/wiki/Monoidal_category Felt the need to spell out more of this here because I find myself worried about the possible flexibility of $f$ . As in, it’s not exactly clear why things should automatically fall into place to ensure it’s the only option. But the fact you can’t really “fill in” all the gaps (outside of the options that $g$ is defined over) in arbitrary ways helps me accept this; you must fill them in a way that ensures $f$ respects the category’s morphisms. ↩︎ This particular perspective lacks a little precision since our map is formulated more as a set-valued function $[\cdot]$ that takes a representative value to its equivalence class, but it’s nevertheless induced by an underlying relation. ↩︎ Going to the trouble to formulate things this way is really meant to help me avoid falling into my typical trap: getting caught on the seeming open-endedness of the structure-preserving commutative form. When I “mask away” $g\cdot x$ behind $x^\prime$ as simply our symmetry’s “output” when acting on $x$ , it becomes much easier to just follow $f$ ’s application on both $x$ and $x^\prime$ . Those two have a clear relationship before $f$ , and that same relationship needs to be there after $f$ between whatever new items $f$ produces. But when I leave $x\prime$ as $g\cdot x$ , I find I get distracted by our $f(g\cdot x)$ step and want to deconstruct it, reading more into it than is actually there. Really, $g\cdot x$ is a new thing, and we write in terms that communicate precisely how it’s related to $x$ (which is of course clean and elegant)…it’s just that doing those two together often leads me astray. ↩︎ Here I’m actually seeking a setup without some of the usual functional programming weight slapped on. That’s not to say it’s not helpful, but I want to feel the more abstract fundamentals hit home first (or at least now given all of the loose context I’ve absorbed with the 10 different tutorials that start with `Maybe`). ↩︎ Right away this is a little confusing: morphisms in a category of functors are map-like things being applied to map-like things. Functors are the maps applying on the entire category (mapping all objects/morphisms in $C$ onto themselves, observing totality but not necessarily surjectivity), and the morphisms between them ↩︎
name_fmt_1	Abstract_algebra.md+html5
toc	General properties Algebraic structures Structure types Free monoid Free object Example: free monoids Natural transformations Whiskering Vertical composition Horizontal composition Monad Example: power set monad Common examples from functional programming
type_1	wiki
created	2020-05-25
modified	2025-10-11 16:24
summary
abstract
series
aggregates	[{'id': 10973, 'path': '/home/smgr/Documents/notes/Abstract_algebra.md', 'rpath': 'Abstract_algebra.md', 'name': 'Abstract_algebra.md', 'title': 'Abstract algebra', 'link': 'Abstract_algebra', 'ftype': 'md', 'ctime': '1760225077.31', 'mtime': '1760225053.0', 'atime': '1760225053.0', 'type': 'wiki', 'yaml_text': 'title: Abstract algebra\ncreated: 2020-05-25\nmodified: 2025-10-11 16:24\ndatelink: [[2020-05-25]]\ntype: wiki\nsummary:', 'name_fmt': 'Abstract_algebra.md+html5', 'format': 'html5', 'content': '\n \n \n General properties \n \n \n Associativity: the order in which operation\nevaluation is performed does not affect the result. That is, for a given\nexpression, you can pick operations and “resolve” them, one by one, in\nany order. Note that we’re not saying we allow any order of the\noperands (that’s commutativity), but rather that the\napplication order doesn’t matter. \n Associativity can be formally stated as follows: if, for all\n $a,b,c\\in S$ ,\nthe equation\n $(a\\cdot b)\\cdot c = a\\cdot (b\\cdot c)$ \nholds, then the binary operation\n $\\cdot$ \non the set\n $S$ \nis associative. Note how this is the most simple distillation of our\napplication invariance mentioned above: the operands remain in a fixed\norder, but parentheses can be re-written arbitrarily. \n Intuition: what does associativity buy us? Intuitively, it feels\nsomething like an associative operation can “see” the components of\ncomposite operands. The form\n $(a\\cdot b)\\cdot c$ \ntakes\n $a$ \nand applies pieces step-by-step, i.e., first adding\n $b$ ,\nthen mixing in\n $c$ \nafter. Associativity says we can first mix up\n $b$ \nand\n $c$ \nand give the result to\n $a$ ,\nand it doesn’t change\n $a$ ’s\nreaction: the\n $b$ \nand\n $c$ \nmixture is not a fundamentally new compound, but instead has transparent\ningredients. In practice, this matters when we don’t want to have to\nworry about when we assemble the pieces of a composite result, so long\nas they have the same final order. \n \n \n \n \n Algebraic structures \n \n An algebraic structure is a mathematical object consisting of a set,\noperations defined over the set, and a set of axioms/identities\nsatisfied by the operations. \n \n \n \n\n\n \n \n \n General axioms: \n \n \n \n Associativity: \n \n \n Divisibility: \n \n \n Identity: for every element in\n $x\\in X$ ,\nthere is an element\n $e\\in X$ \nsuch that\n $x\\cdot e = e\\cdot x = x$ . \n Note one obvious implication of this: our binary operation preserves\nthe presence of all items in the base set, i.e., is “full.” \n \n \n \n Structure types \n \n \n Magma:\n $(M, \\cdot)$ ;\na set\n $M$ \nwith a single closed binary operation\n $\\cdot$ . \n \n \n Quasigroup: a magma endowed with a notion of\ndivision, i.e., a pair\n $(Q, )$ \nsuch that \n $ax = b \\;\\;;\\;\\; ya = b$ \n are defined for unique\n $x, y\\in Q$ ,\nwith solutions written\n $x=a\\backslash b$ \nand\n $y = b/a$ .\nThe operations\n $\\backslash$ \nand\n $/$ \nare left and right division, respectively. \n \n \n Unital magma* \n \n \n Semigroup: an associative magma, i.e., a set\n $S$ \ntogether with an associative binary operation. \n \n \n Loop: a quasigroup with identity \n \n \n Associative quasigroup \n \n \nMonoid: a semigroup with identity, written as a triple\n $(M, \\cdot, e)$ ,\nwhere\n $e$ \nis the identity element.\n \n \n A submonoid of a monoid\n $(M, \\cdot, e)$ \nis a monoid\n $(N, \\cdot, e)$ \nsuch that\n $N \\subseteq M$ .\nThat is, it is a monoid defined around a base set that is a subset of\nthe “supermonoid’s” base set, has the same identity, and remains closed\nunder the inherited binary operation. \n \n \n A subset\n $S$ \nis a generator of a monoid\n $M$ \nif the smallest submonoid of\n $M$ \ncontaining\n $S$ \nis\n $M$ .\nThat is,\n $S$ \nneeds to be a “tight base set” in order to be a generator; one can’t\ninclude\n $S$ \nin the base set to produce a monoid “smaller” than\n $M$ .\nMonoids with finite generators are said to be finitely\ngenerated. \n \n \n \n \n Group: \n \n \n \n \n Free monoid \n \n The free monoid on a set (or alphabet)\n $A$ \nis the monoid\n $(A^, ., "")$ ,\nwhere\n $.$ \nis the string concatenation operator, the empty string is the identity\nelement, and\n $A^$ \nis the Kleene star (the unary operator\n $$ )\napplied to\n $A$ .\nWe often refer to the free monoid as\n $A^$ \n(which is a little confusing since one writes the monoid’s internal set\nas\n $A^$ \nas well). \n \n \n \n A graded monoid* is a monoid that can be written\nas \n $M = M_0\\oplus M_1\\oplus \\cdots \\oplus M_n$ \n which is to say it can be “factored” or “graded” as a collection of\nsubmonoids (i.e., it’s layered). All free monoids are graded;\n $M_i$ \ncontains the free monoid’s strings of length\n $i$ . \n \n \n The “base set”\n $A$ \nover which the Kleene star (or Kleene plus) is applied contains the\nfree generators of the free monoid\n $A^$ .\nHere there’s a fairly clear analogy to vector bases: the free generators\nare “single-letter words” (i.e., characters) whose arbitrary combination\nforms all elements in the monoid. Further, the cardinality of the set of\nfree generators is referred to as the rank* of the free\nmonoid (although each free monoid has exactly one set of free\ngenerators, while vector spaces can have many bases). We also call this\nfree generating set a basis. \n \n \n A code is a set of words\n $C$ \nwhose Kleene star\n $C^$ \nis a free monoid for which\n $C$ \nis a basis. \n \n \n \n \n \n Free object \n \n Free objects are to categories what bases are to vector spaces,\nroughly speaking. A basis is a fundamental characterization of a vector\nspace, and linear transformations between spaces can be captured\nentirely by their values on that basis. Free objects are effectively\nobjects with a known set of generators, analogous to a basis together\nwith the vector space it spans. \n \n \n In particular, let\n $C$ \nbe a concrete category* with a faithful (forgetful) functor\n $U: C \\rightarrow \\text{Set}$ \n(i.e.,\n $U$ \nis an injection from objects in\n $C$ \nto sets). Let\n $X$ \nbe a set, to serve as the basis of the free object. Then the free\nobject on\n $X$ \nis a pair\n $(A, i)$ ,\nwhere\n $A$ \nis an object in\n $C$ \nand\n $i: X\\rightarrow U(A)$ \nis the canonical injection. Here\n $U$ \ncan be thought of as a map that “strips away” all structure but an\nobject’s base set, and\n $i$ \nis therefore responsible for connecting\n $A$ ’s\nbasis (or generators) to items in\n $A$ ’s\nbase set. Free objects also satisfy the universal property that, for any\nother object\n $B$ \nin\n $C$ \nwith a map\n $g: X \\rightarrow U(B)$ ,\nthere is a unique\n $f: A\\rightarrow B$ \nsuch that\n $g = U(f) \\circ i$ .\nWhat this tells us is, if we want a map\n $f$ \nfrom objects\n $A$ \nto\n $B$ ,\nit is uniquely determined by how we choose to map the\n $A$ ’s\nbasis\n $X$ \ninto\n $B$ ’s\nbase set,\n $U(B)$ . \n \n \n Intuitively, what we’re doing here is “peeling back” the layers of\nobjects in arbitrary categories and establishing a fundamental link\nusing the common language of sets. When a (free) object is\ngenerated by a known set\n $X$ ,\nwe can treat those items in\n $X$ \nas guiding axes that map into the core set-based representation of a\ntarget object. We can then “reapply” the layers/structure on top of this\nfundamental relation, getting a map between full objects for free. \n \n \n Example: free monoids \n \n Returning to free monoids, we can apply the details for free objects\nmore generally as laid out above. Recall our general notation for the\nmonoid as a triple\n $M = (A^, ., ``")$ \ndefined over a set\n $A$ . \n \n \n \n We have the forgetful functor\n $U: \\text{Mon}\\rightarrow\\text{Set}$ \nwhich simply forgets the operation and identity, i.e., mapping\n $M$ \nto\n $A^$ . \n \n \n $A$ \nis the base set of generators, the basis for the monoid. \n \n \n The generator injection\n $i: A\\rightarrow U(M)$ \ncanonically maps each element of\n $A$ \ninto the monoid. In this case, the symbols in\n $x\\in A$ \nare mapped to one-letter words, i.e.,\n $x\\mapsto [x]$ ,\nas they appear in\n $U(M) = A^$ . \n \n \n For any other monoid\n $N$ ,\nalong with a function\n $g: A\\rightarrow U(N)$ ,\nwe get a unique monoid homomorphism\n $f: M \\rightarrow N$ .\nThis function has the specific form \n $f(x_1x_2\\cdots x_n) = g(x_1)\\cdot g(x_2) \\cdots g(x_n)$ \n That is, the string\n $x_1x_2\\cdots x_n$ \nin\n $M$ \ngets mapped to the equivalent in\n $N$ \nafter replacing each symbol\n $x_i\\in A$ \nwith the string in\n $N$ \nthat it’s mapped to by\n $g$ . \n \n \n \n For example, suppose we took the base set\n $A = \\{a, b, c\\}$ \nfor monoid\n $M$ ,\nand\n $B = \\{x,y,z\\}$ \nfor monoid\n $N$ .\nThen a function \n \n \n $g(a)=x \\; ;\\; g(b)=y \\; ;\\; g(c)=z$ \n \n \n determines a unique mapping between the entire monoids, and we can\nconvert any string as was written above, e.g.,\n $aaccbb \\rightarrow xxzzyy$ .\nNote the flexibility in the assignment of\n $M$ ’s\nbase symbols to strings* in\n $N$ \n(i.e., any element in\n $U(N)$ ,\nwhich in this case is\n $B^$ ).\nFor example, it’s just as valid to define\n $g$ \nas \n \n \n $g(a)=xx \\; ;\\; g(b)=yy \\; ;\\; g(c)=zz,$ \n \n \n and conversion takes place as before under\n $f$ ,\ne.g.,\n $acc\\rightarrow xxzzzz$ .\nAgain, the universal property that applies here suggests that specifying\n $g$ \nuniquely determines how we map the monoid objects* onto\neach other. \n \n \n One additional observation to note here is how small the scope is for\nthe morphism equivalence\n $g = U(f)\\circ i$ .\nThe domain of\n $g$ \nis\n $M$ ’s\nalphabet (or base set), and that’s all we’re even allowed to specify.\nTherefore\n $U(f)$ \nneed only match\n $g$ \nat these points of definition, and it has no duty to uphold usual monoid\nhomomorphism properties (namely identity and multiplication\npreservation). But once we “extend” that back to a full monoid\nhomomorphism\n $f$ ,\nit needs to match\n $g$ \nalong the generators we’ve provided while also taking care to actually\nbe a monoid homomorphism, by definition. All that to say, there is\nindeed extra structure that comes into the picture as we bring\n $g\n= U(f)$ \nback into the category we’re working in, and those extra restrictions\nplay a role in constraining the map to uniqueness¹. \n \n \n \n \n Natural transformations \n \n Foreword: this section is adapted from a very large\n“development aside,” i.e., a section I wrote to address concerns with my\nintuition as I worked through relevant concepts. As such, it’s far less\nstructured than the other sections on this page, but is central\nto my understanding of maps, in the most general sense. \n \n \n Natural transformations feel so natural, I struggled to see why they\nwere even explicitly defined. Perhaps more accurately, I had a poor\nunderstanding of the implications of the constraints that make up a\nnatural transformation and what it’d mean to violate them. \n \n \n As we’ve established, natural transformations are best introduced as\nmorphisms in categories of functors, and the constraints that apply\nensure that those morphisms behave in an appropriate way (according to\nthe usual category rules, that is). In order to do that, we need to\n“break open” the objects they’re being applied to: the functors. This\ngives us the family of morphisms that apply to the objects\ninside the source/target categories of the functors. If you\ndoubt this move, consider what else we really have to work with; to\nappropriately map between map-like things, one really must inspect what\nchanges between the maps in question (i.e., for the same input, how do\nthe maps’ outputs change?). If we can consistently characterize the\nchange in outputs between two functors (which includes both objects and\nmorphisms of the underlying categories), then we’ve got the thing we\nwanted. \n \n \n \n\n\n \n \n \n Intuitively, the naturality square ensures that we can naturally move\naround in categories before and after a functor is applied. It is a\nconsistent bridge between categories induced by functors that\nimplies the order of morphism and functor application doesn’t matter.\nThe analogy I like here (although I came up with it, so it may be\ndubious) is that the coherence conditions ensure “safe passage” through\nchains of functors. If you’re moving from a source object to a target\nobject via a series of morphisms across categories, as long as you have\nnatural transformations between the functors linking those categories,\nyou can apply your morphisms in whichever category you like: you can\nload them up all in the first category, do a\nmorphism-functor-morphism-functor, or wait until the very end. They will\nall yield the same final object, and you never have to worry that you\ntook the “wrong exit” via a particular morphism in a particular category\nthat changes the dynamics for the rest of your journey. \n \n \n Let’s look at a small, concrete example: functors between a few\nset-like objects. Here we have a category\n $\\mathcal{C}$ \nwith three sets\n $X_1$ ,\n $X_2$ ,\n $X_3$ \nand two morphisms\n $f_A$ ,\n $f_B$ \nbetween them as depicted: \n \n \n \n\n\n \n \n \n The diagram also shows two functors\n $F$ \nand\n $G$ \nthat map between\n $\\mathcal{C}$ \nand another category\n $\\mathcal{D}$ ,\nitself containing three set objects. You can see how the functors may\nchange the objects/morphisms in different ways: over\n $\\mathcal{D}$ ’s\nobjects (which, despite the labels of 1, 2, 3, don’t necessarily\ncorrespond to those in\n $\\mathcal{C}$ ;\nit’s just an implicit ordering so we can compare differences\nconsistently),\n $F(f_A)$ \nand\n $G(f_A)$ \napply to different source/target objects due to\n $F$ \nand\n $G$ \nmapping the objects differently (we’ll see this a bit more explicitly in\nthe next diagram), i.e.,\n $F$ \nmaps\n $X_1$ \nin\n $\\mathcal{C}$ \nto\n $Y_1$ \nin\n $\\mathcal{D}$ \nwhereas\n $G$ \nmaps\n $X_1$ \nin\n $\\mathcal{C}$ \nto\n $Y_2$ \nin\n $\\mathcal{D}$ . \n \n \n $\\eta$ \nis a natural transformation between functors\n $F$ \nand\n $G$ …at\nleast in name. We need to show that the relevant conditions are met;\notherwise we might merely refer to it as a map between functors. Here we\ncan look at the same example, but with additional detail relating\nobjects: \n \n \n \n\n\n \n \n \n As stated before, natural transformations are defined over category\ncomponents, and by definition we have a family of morphisms: \n \n \n \n $\\eta_{X}: F(X) \\rightarrow G(X)$ \nfor any object\n $X$ \nin\n $\\mathcal{C}$ .\nThis is called the “component of\n $\\eta$ \nat\n $X$ ,”\nlinking the two object targets under the functors for each source object\n $X$ . \n In\n $\\mathcal{D}$ ,\nit’s pretty easy to think of this collection/family of morphisms as the\nstitching holding the functors together. If we land anywhere in\n $\\mathcal{D}$ \nby using\n $F$ ,\nwe can simply walk across these component morphisms to get to the\nequivalent under\n $G$ . \n \n \n We additionally require that, for every morphism\n $f: X\\rightarrow Y$ \nin\n $\\mathcal{C}$ ,\nwe have \n $\\eta_Y\\circ F(f) = G(f) \\circ \\eta_X$ \n In words, this says the following two processes are identical: \n \n \n For any object\n $F(X)$ ,\nmap it to\n $G(X)$ \nvia\n $\\eta_X$ \nand apply\n $G$ ’s\nversion of\n $f$ ,\ntherefore mapping into\n $G(Y)$ \n(recall that a functor\n $A$ \non\n $C$ \nmust map\n $f$ \nto\n $A(f): A(X) \\rightarrow A(Y)$ ). \n \n \n For any object\n $F(X)$ ,\nfirst apply\n $F$ ’s\nversion of\n $f$ \nthereby mapping into\n $F(Y)$ ,\nand then move to\n $G(Y)$ \nvia\n $\\eta_Y$ . \n \n \n \n \n \n The last equivalence is often expressed by the following commutative\ndiagram (where details are borrowed from our example, but it\nnevertheless reflects the general form of the naturality square): \n \n \n \n\n\n \n \n \n So now we’ve contextualized the family of morphisms a bit more, and\nhave a general sense for what it might mean to move “naturally” between\nfunctors. But the components\n $\\eta_X$ \nseem to fall out pretty much automatically by simply associating functor\ntarget objects…right? When does this not hold? Let’s look at an example\nwith concrete objects and morphisms. \n \n \n \nAside: morphisms, once and for all™\n \n \n It’s worth noting that I needed this level of depth to bump into, yet\nagain, a (seemingly persistent) fundamental misunderstanding of\nmorphisms. Looking at the naturality square, I initially thought\nsomething like the following: “well both paths from\n $Y_1$ \nget us to\n $Y_3$ ,\nso things just seem to hold by definition.” The first part of that is\nperfectly accurate I suppose, but it’s missing the actual meaning of the\nstatement. Morphisms are more than just a map from one object to\nanother, as in they don’t just take in a whole object and produce\nanother whole object. Getting to an object isn’t enough here, it’s\ngetting to it in the same way. Put simply, I was thinking on\ntoo high a level, as if morphisms are necessarily maps that must ignore\nany inner components of the objects they’re acting on. This is likely me\njust misinterpreting the super high level of abstraction one faces when\nwarming up to category theory. We write morphisms as maps\n $f: X\\rightarrow Y$ \nand rarely say anything more specific unless working in a specific\ncategory, so I pretty much took that mean what it says in the most\ngeneral sense: a morphism\n $f$ \ntakes in an object\n $X$ \nand gives out an object\n $Y$ ,\nas if there’s nothing more atomic to act on than entire objects. But of\ncourse, a morphism merely maps from an object to another with no\nexplicit guarantees of completeness, and the way you get to the\nco-domain can vary. Morphisms in general can exhibit all the usual\nflexibilities of functions in\n $\\text{Set}$ ,\ne.g., not strictly being surjective, and of course two functions\n $f: X\\rightarrow Y$ \nand\n $g:\nX\\rightarrow Y$ ,\ndespite sharing a domain and co-domain, need not actually map individual\ncomponents to the same place (i.e., we can have\n $f(x) \\not=\ng(x)$ \nfor any\n $x\\in X$ ). \n \n \n I faced similar issues over in Category theory§Pullback, with a\nsimilarly detailed accounting of what now feels like an incredibly basic\npoint. This just goes to show how easy it can be to misconstrue even\nsimple, foundational concepts. Ultimately I think this just stems from\noperating at a level of uncomfortable abstraction, and you tend to be\nvery careful making assumptions about anything that isn’t otherwise\nstated outright. I think the fundamental disconnect here has been the\nfact that almost no category theory verbiage deals in elements smaller\nthan objects, and I’ve hesitated thinking in terms of anything lower as\na result. Perhaps this is a result of not having followed a clear\nintroduction built on the back of sets, which perhaps would’ve made this\nwhole thing a non-issue. \n \n \n After a considerable amount of toiling, I finally feel okay with how\nwe get from the very abstract and vague descriptions of morphisms to\nmaps aware of object structure. I didn’t even think I took issue with\nthis, but I got myself caught in a loop where I just couldn’t feel why\nmaps shouldn’t be required to return full objects. We say generally that\na morphism is a map from an object\n $X$ \nto an object\n $Y$ ,\nand that’s about as specific as we get (except for identity and\nassociativity). If we’re operating at this level and saying nothing\nfurther about the structure of those objects, it feels pretty\nstraightforward to interpret that as a map taking an object\n $X$ \nand giving us an object\n $Y$ \nout. The problem is: it doesn’t mean that, or at least it doesn’t have\nto. Firstly, we stay at such an abstract level because it lets us say\nonly as much as we need: as long as those top-level items are true, we\nhave a viable morphism. Anything that falls under that\njurisdiction should then be fair game; from here on out my problems\nmostly remained with the verbiage. \n \n \n Here I was struggling with the idea that objects are atomic units\nthat can’t be further decomposed. This is again missing nuance; we\nobviously have some kinds of objects, like sets, that can be seen as\ncontainers of other items. When we say objects are atomic, we\neffectively mean they’re atomic relative to the category\ntheoretic statements being made. That is, they’re the “lowest level”\nconstruct those statements need to care about to be true, and they don’t\ncare if there happens to be further structure when one looks closer.\nNow, actually defining classes of morphisms in specific categories is\nalmost always a process of designing a map-like construct that is not\nonly aware but often preserving of underlying object structure.\nSo it’s not as if morphisms, in the way they’re actually applied, don’t\nlook closer at the objects they apply to; they often must in order to\nremain consistent. It’s just that one can ignore those specifics and\ntreat objects as black box units when it comes to making broad, category\ntheoretic statements about the structures at play. \n \n \n So decomposing is fine, but how about non-surjectivity? That still\nirked me a bit, when our top level statements seem to suggest we’re\ndealing in “whole objects.” It’s worth noting right away that any notion\nof “partial objects” can only present inside categories where\nobjects can be decomposed further. Again, at the top level, we just\ndon’t see inside the objects, and statements we make cannot apply to a\ndeeper, internal level. So this non-surjectivity is merely a byproduct\nof behavior we allow inside of a category like Set, because it\ndoesn’t break those top-level rules. If it did, sure,\nsurjectivity would be important to apply everywhere, but our rules of\ncomposition and identity on morphisms work all the same whether we have\nour set function mapping onto the co-domain or not. It is of course\nperfectly natural to non-surjectively map between two sets, so we also\nactually want to be able to do this. Here I find embracing the word “to”\nis effective: morphisms map an object\n $X$ \nto an object\n $Y$ .\nUnder the hood, this may very well correspond to associating an element\nin\n $Y$ \nto each element in\n $X$ ;\nthat’s perfectly allowed when defining morphisms on set-like objects. To\nget the entirety of this whole aside, the only thing you need\nto accept is the following: we are merely taking\n $X$ \nto\n $Y$ ,\nnot (necessarily) transforming it 1-to-1, and that’s an acceptable\nmeaning of the term “map.” Imagine how small\n $X$ \ncan be and how large\n $Y$ \ncan be and this still make sense as a map. For example, let\n $X$ \nbe my group of friends and\n $Y$ \nbe seat numbers in an NFL stadium. Our NFL tickets map each person to\ntheir one seat number, but that makes up a tiny fraction of all\navailable seats and is obviously not “onto.” And it shouldn’t have to be\nfor this relation to serve as a valid map between\n $X$ \nand\n $Y$ ,\nnot only in the set-level or the category theoretic sense, but in the\nmore pesky linguistic sense that got me into this mess in the first\nplace. From the top-level, such a map really does get to be an arrow\nbetween the two whole objects\n $X$ \nand\n $Y$ .\nHow much the image of that morphism actually “fills out”\n $Y$ \njust isn’t relevant: to even think about the items in the image requires\nset-level awareness that cannot be seen when dealing in statements where\nobjects are atomic by definition. \n \n \n Even with all this, I still find it slippery somehow. When I try to\nroot out what’s wrong, I think it just comes down the notation of\nwriting functions according to their domains/codomains, e.g.,\n $f: A\\rightarrow B$ .\nIt’s actually this that I take issue with, more than anything inherent\nto category theory or object level abstractions or whatever else. It\nreally goes beyond functions: it’s with binary relations. The simple\nfact we can call a link between one object in\n $A$ \nto one object in\n $B$ \na relation (which is actually a valid, minimal relation) is what I find\nkind of pesky and stupid. That just doesn’t feel like what we\nmean by relation. So something like the single pair (2, green) is a\nvalid relation from integers to colors. By all accounts that’s an\nassociation from an integer to a color, but it doesn’t\nseem like what we’d ever mean by a relation\n $R: \\text{Int} \\rightarrow \\text{Color}$ \n(that notation is a little more conventionally functional than barebones\nrelation, but the point stands). That is where I’m\ngetting so hung up I think. \n \n \n I’ve spent even more time on this, with a three part audio\nrecording series to accompany it. But I’ve hit the root of the issue\nwith the following, once and for all. \n \n \n \n\n\n \n \n \n (left side of diagram) Within Set, object structure is visible.\nMorphisms can be defined over the elements within the objects, and the\nabove depiction includes (effectively) the simplest possible relation\nbetween the two (i.e., connecting a single element in X to a single\nelement in Y). Two comments: \n \n \n \n I’ve complained this doesn’t even feel like a relation (or what we’d\nmean by any such definition of one), let alone a morphism. But binary\nrelations can be seen to act on all pairs in the Cartesian product\nbetween sets, mapping those with related items to True, and False\notherwise. This is at least holistic; no elements in the domain or\nco-domain are systematically ignored just because they aren’t assigned a\nTrue value. That is, no elements “slip by” the relation: it can always\nformally be seen to make a Boolean decision for every possible pair. \n \n \n If you don’t accept that this should “mature” to a morphism at the\nabstract level, notice that the line you’d then have to draw is\narbitrary. How many items must be included for it to be enough, for it\nto count as a “map?” Notice that, regardless of how “full” your relation\nmay be, as soon as you become blind to element-specific involvement, all\nyou see is a bridge between the two objects. This is what we do as we\nmove to the right diagram. \n \n \n \n (right side of diagram) Outside of Set, back in the general abstract\ncategory theory viewpoint, we lack any insight into internal object\nstructure. One can imagine masking off all element-wise knowledge we had\nwhile in Set, leaving the above: two atomic objects with some\nconnection between them. How these objects are related is no\nlonger even a well-posed question at this level of granularity; they\nsimply are or are not connected, and the “endpoints” of the arrow can\nonly be the entire object. \n \n \n So all relations inside of Set, no matter how large, are viewed\nuniformly once we’re outside of Set. “Thin” relations mature to full\nmorphisms in the abstract view (since they can’t be distinguished from\nany other relation), and from the abstract view a full morphism simply\nsuggests any relation exists under the hood. \n \n \n It’s worth noting that relations are not even valid morphisms in Set,\nas they do not generally respect associativity of composition. But once\nyou accept that relations are, in the most basic sense, mappings between\nsets, then you’re past the issue of non-surjective functions and the\nidea of yielding “less than Y” via a morphism that doesn’t map onto the\nco-domain. That is to say: from the abstract point-of-view, any kind of\nconnection between elements within Set simply looks like a connection\nbetween the set-objects themselves, and it says nothing about what that\nrelation must look like once you’ve revealed the element-level details.\nNot involving all elements in your map is not something that can be seen\noutside of Set, so all relations between any source and target look the\nexact same. \n \n \n Taking a look back at this some weeks later: I think I do a good job\nconveying what felt problematic and how I got past it. This is from the\nperspective of not really recalling (right away, anyway) what the\nproblem was in the first place, and I feel satisfied by the end of this\npassage. Here I think it may be helpful to simply recap why we’re here\nin the first place: looking at diagrams like the naturality square, one\ngets the sense that simply following two trajectories that end up in the\nsame object makes them equivalent, by the sheer fact they point to same\nplace. But that alone doesn’t actually make those (composite) morphisms\nequivalent, since they’re allowed to have such distinct structure in\narbitrary categories (like being functions in Set) while\nmapping from/to the same objects. So in commutative diagrams, like the\nnaturality square, we’re not saying that any two composite morphisms are\nequivalent because they start and end at the same places, but\ninstead that any two morphisms starting and ending at the sample place\nmust be equivalent under the morphism structure in the\ncategory. That is, not just any morphisms with the right\ndomain/co-domain will do; they must actually be the same in the\nunderlying category in order to uphold the kind of constraints the\ndiagram is talking about. This is annoying/confusing because the\ndiagrams don’t really convey this: you simply see arrows between\nobjects, and it seems to suggest that moving from/to the target objects\nis good enough. But in reality, they are merely abstracting away any\ncategory-level morphism specifics so as to make a category-independent\nstatement, but crucially those specifics will still apply when\noperating in any such category. That is to say: once you’re actually\noperating in a specific category, the known structure of its objects and\nmorphisms comes into view, and you cannot ignore that revealed\nstructure. I routinely find myself thinking in terms of Set when staring\nat commutative diagrams, and it’s a mistake to mix the two levels of\nabstraction (e.g., allowing oneself to think of the objects as sets, but\ncontinuing to treat morphisms as vague, non-functional maps; if you\nembrace the former, you must also work with the latter, else you’re\nmixing levels of abstraction). \n \n \n \n Here we’ll exploit a breaking condition for natural transformations:\nwhen the component\n $\\eta_X$ \n(i.e., that which connects\n $F(X)$ \nto\n $G(X)$ )\ndepends on static data from the object\n $X$ \nitself. Here we’re manufacturing an unnatural dependency, one\nthat allows\n $F(X)$ \nto map to\n $G(X)$ \nbut in a way that does not act “fairly” across\n $X$ \n(cheating with knowledge of\n $X$ ’s\nvalues rather than just its structure). \n \n \n \n\n\n \n \n \n The example setup is as follows:\n $F$ \nand\n $G$ \nare both the identity endofunctor on the category\n $\\mathcal{C}$ .\n $\\mathcal{C}$ \ncontains two objects in Set:\n $X=\\{1,2\\}$ \nand\n $Y=\\{3,4\\}$ ,\nas well as a morphism\n $f: X_1\\rightarrow X_2$ \nthat is the constant function\n $f(x) = 4$ .\nWe then define the component morphisms\n $\\eta_X: X\\rightarrow X$ \nand\n $\\eta_Y: Y\\rightarrow Y$ \nto be the constant functions mapping to their respective set’s smallest\nelement, i.e.,\n $\\eta_X(x)\n= 1$ \nand\n $\\eta_Y(y)= 3$ . \n \n \n This is effectively a contrived example to ensure\n $\\eta_X$ \nand\n $\\eta_Y$ \ndisagree somewhere along the chain from\n $F(X)$ \nto\n $G(Y)$ :\n $X$ ’s\nvalues map to\n $Y$ ’s\nsecond value, whereas\n $Y$ ’s\nvalues map onto its own first value. Starting from the value of\n $1$ \nin\n $X$ ,\nwe have the following two trajectories: \n \n \n \n\n\n \n \n \n An even simpler demonstration of this is to take\n $\\eta_X$ \nas shown, and the constant function that maps to\n $X$ ’s\nother element, i.e.,\n $f(x)=2$ .\nOne can then see how the naturality square breaks just over the object\n $X$ \nwhen the order of those two morphisms are swapped:\n $1\\rightarrow 1\\rightarrow 2$ \nvs\n $1\\rightarrow 2\\rightarrow 1$ \n \n \n So what does this tell us? In this situation, we can see that the\norder of application matters. The map that is\nnatural-transformation-then-morphism is different than the\nmorphism-then-natural-transformation: their composite morphisms do not\nyield the same map in total. For consistency in component morphisms, we\ngenerally require that the choice of map acts on objects in a\ncoordinate-free manner. That is, it must be equivariant with\nrespect to the morphisms in\n $\\mathcal{C}$ \n(by definition). \n \n \n \n Breaks when the map is dependent on static data from the object\nitself \n \n \n not sufficiently relative; knowledge of the operating object\n“overwrites” the work done \n \n \n needs to be sufficiently local, sufficiently relative; shouldn’t\ndepend on external object structure. \n \n \n imagine first picking morphisms, then\n $\\eta$ s;\nand vice versa. This illuminates my confusion a bit since both feel fine\nbut they should get us into an equal amount of trouble \n \n \n \n On equivariance: invariance is observed\nwhen a property of a mathematical object remains unchanged under certain\nkinds of transformation. Such a property is called an invariant of the\nobject with respect to the transformation. Take, for instance, the area\nof a triangle: this is an invariant property of the shape under\ntransformations like rotation and reflection. Equivariance is a\nslightly looser notion of invariance, not requiring a property to remain\nwholly unchanged under transformation, but to itself be transformed\nalong with it in a predictable way. That is, transforming the\nobject and its property in the same way respects the definition of\nthe property. Keeping with the triangle example, one can look at\nthe triangle centroid. This effectively moves with the triangle and will\nchange under translation. So while it’s not an invariant (numerically\nthe same), it will change with the triangle under the same\ntranslation. \n \n \n \nStructure-preserving maps\n \n \n To preface: equivariance is easily framed from the perspective of\nstructure-preserving maps, which is a term commonly thrown around when\nspeaking about morphisms. Therefore, this too has been a slippery topic\nfor me in the past, and I’ve spent a sizeable chunk of time ensuring I\nhave a comfortable intuition around this. Structure preservation,\nequivariance, and natural transformations are heavily intertwined; I\nfind it helpful to frame latter two as structure preservation through\nentire classes or groups of transforms; we’ll formalize this\nafterwards. For now, I want to clarify the many, many places I’ve found\nmyself stuck on structure-preserving maps and leave no doubt about how\nwe should interpret this moving forward. \n \n \n For starters: what we mean by “structure-preserving,” in the most\ngeneral sense, can’t really be boiled down to anything more precise. As\nmentioned, we often call such things, in the abstract, morphisms. In\nthat sense, one might imagine a structure-preserving map need only\noperate from/to the same kind of object: the very fact we “end up” in\nthe (structured) object suggests we’re upholding necessary structure.\nBut this is not generally what we mean, or is at least often not\nsufficient. That is to say: being a map with the appropriate external\n“signature” (i.e., mapping to/from the right kind of object) is alone\nnot enough, as we care about how we move between the objects\n(and whether that movement preserves some structure within them). \n \n \n Take, for instance, addition over the integers. Suppose we have the\ngroups\n $G=(\\mathbb{Z}, +)$ \nand\n $H=(\\mathbb{Z}, +)$ ,\nand a map\n $f:G\\rightarrow H$ \ndefined\n $f:x\\mapsto x+1$ .\nClearly “pushing” all integers through\n $f$ \nstill just gives you\n $\\mathbb{Z}$ ,\nand so in a very loose sense upholds “group structure” insofar as there\nis still a valid group on the other side. But the way we move\nunder\n $f$ \ndoes not preserve addition (which is really all we mean by “structure”\nin this case): for any\n $a+b=c$ \nin\n $G$ ,\nwe do not have\n $f(a)+f(b)=f(c)$ \nin\n $H$ .\nSo even though\n $f$ \nmoves from a valid group to a valid group, it does not align\nthe elements of the groups in a manner that is consistent with the\naddition operator. \n \n \n As a reminder,\n $f$ \nis unary in\n $\\mathbb{Z}$ .\nEven if our binary operation (addition) had arbitrary arity,\n $f$ \nis a still a bottleneck of sorts through which each integer individually\npasses. So applying\n $f$ \nto individual elements in the underlying set is the only way to apply\nthe transform at all. I’ve found myself occasionally thinking that\n $f$ \ncould have distinct joint behavior, similar to the operator, but this\nisn’t true and recognizing as much helps simplify things. When I look at\nthe required equality here, i.e., \n \n \n $f(a+b) = f(a) + f(b),$ \n \n \n I’ve let myself question whether this is the only constraint that\ncould mean\n $f$ \nis structure preserving. But recalling how\n $f$ \ncan only operate on single items in the underlying set, it becomes clear\nthis is really the only thing you even can say in the first\nplace. We use\n $f$ \nto map the items of one group onto another one-by-one, and the question\nis simply whether related items in the source group remain related in\nthe transformed group.\n $a$ \nmust go to\n $f(a)$ ,\n $b$ \nmust go to\n $f(b)$ ,\nand the only real question from there is whether\n $a+b$ \ngoes to the same place as\n $f(a)+f(b)$ . \n \n \n That last bit, the only salient part, has made me nervous for reasons\nI can’t easily articulate. I think it just somehow feels mysterious\nevery time: I have to unpack it and check that it makes sense, and I get\nlost along the way. Either that, or as I mentioned before, the idea that\nthere could be some other way to express structure preservation that I’m\nnot able to see bugs me; I don’t have good intuition that the form is\n“tight.” At risk of restating what has already been said, the following\nmight help: \n \n \n We’re looking for a way to relate the transform\n $f$ \nto the operation at hand. Both need to be involved for us to establish\nany such link, and it’s critical to note that we’re observing what\nhappens with the trajectories themselves under\n $f$ .\nIt’s not\n $f$ ’s\nimage, it’s not its co-domain; these are more like aggregate\nconsequences of having applied\n $f$ \nto every item. Working with\n $f$ ’s\nimage or its co-domain ignores how we got there, and would therefore say\nnothing about\n $f$ \nat all. The entirety of the setup also avoids relating the transform and\nthe operation:\n $a$ \nand\n $b$ \nare just any two items in the source set,\n $f(a)$ \nand\n $f(b)$ \nare the items we map them to, and both\n $a+b$ \nand\n $f(a)+f(b)$ \nare the results of applying our operation on those pairs. Thus far,\nwe’ve said nothing about how\n $f$ \nand\n $+$ \ninteract, we’ve merely applied them to eligible operands/inputs. Now, if\n $f$ \nis going to respect\n $+$ ,\nthe only thing that matters is that it takes a valid application of the\noperation to another valid application of the operation: that’s just\nabout all one could mean when saying\n“ $f$ \npreserves the operation.” If it took three items\n $a+b=c$ \nto three items that couldn’t be related via addition, then\n $f$ \nwould clearly not be compatible with the behavior of that operation. In\nsuch a case, I couldn’t rely on my operation as a means of relating\nitems under the movement/flow of\n $f$ .\nThe operation is like a stitching in the set-fabric between\nitems\n $(a, b,\na+b)$ ,\nand preserving it means the same stitching is present when moving the\nfibers through\n $f$ .\nThis can only mean that I produce related items\n $(f(a),\nf(b), f(a+b))$ :\nafter I’ve taken my stitches to their new locations via\n $f$ ,\nI find they’re still intact. \n \n \n That last bit I find the most helpful. Taking a concrete thing like\n $((a, b),\na+b)$ \n(the operands and their output) removes the burden of feeling like\nthere’s some nebulous structure coming along with the operation for\nwhich I need to have developed intuition. That pair is the\nstructure; it’s like an extensional view of the operation (basically its\ngraph). So every such pair that holds true in the source set is a little\nknot/stitch that locally exhibits the structure we’re trying to\npreserve, and we cannot untie it under\n $f$ ’s\nmovement. If\n $f$ \nin fact preserves the desired structure, it will move every such knot to\nanother valid knot, locally preserving our operation dynamics. Note that\n $f$ \ndoesn’t directly map knots to knots, though; it applies uniformly on an\nelement-level basis. Therefore it takes our knot \n \n \n $((a, b), a+b) \\rightarrow ((f(a), f(b)), f(a+b))$ \n \n \n In order for that right-hand tuple to be a “valid knot” under\n $+$ ,\nit must be consistent with the structure of the left-hand side, i.e.,\nhave an output element that is the sum of the two operands. So if\n $f$ \npreserves the knot, the item in the output slot,\n $f(a+b)$ ,\nmust be equal to the sum of the input items, i.e., \n \n \n $f(a+b) = f(a)+f(b),$ \n \n \n which is the usual constraint. Let’s take a look at a concrete\nexample: \n \n \n \n\n\n \n \n \n Take the red bridge along the top: this is a “local knot” that\n $+$ \nties. That is, it exemplifies the kind of structure that\n $+$ \ninduces, and as a result that which we want\n $f$ \nto preserve. But as we move down the diagram via\n $f$ ,\nwe see that it “unties” the knot, giving us transformed operands whose\nsum does not match the transformed sum. The red bridge along the bottom\nshows the knot that would be valid under our transformed inputs. It\nwould be equally reasonable to consider the kinds of inputs\n $f$ \ncould’ve “chosen” to reach the needed output. In either case,\n $f$ \ndoes not align them correctly: it doesn’t map a\n $+$ -knot\nto a\n $+$ -knot. \n \n \n For the linear transformation\n $f(x)=2x$ ,\non the other hand: \n \n \n \n\n\n \n \n \n Here we find the map does respect the\n $+$ \noperation (at least over the specific operands we’ve plotted): the knots\nremained tied through\n $f$ . \n \n \n Moving back out, recall that we say generally that, for a\ntransformation\n $f:A\\rightarrow B$ \nand\n $k$ -ary\noperations\n $\\mu_A: A^k \\rightarrow A$ \nand\n $\\mu_B: B^k \\rightarrow B$ \ndefined over\n $A$ \nand\n $B$ ,\nrespectively,\n $f$ \nis said to be structure-preserving if the following holds: \n \n \n $f(\\mu_A(x_1,\\dots,x_k)) = \\mu_B(f(x_1),\\dots,f(x_k))$ \n \n \n This is very general, and worth exploring the dynamics when\n $f$ \nis more like a binary relation. In this case, I found myself thinking of\naddition in modular arithmetic. Here we have an operation that behaves\nsimilarly over entire classes of values: rather than observing\nthe “paired” association of structure\n $(x,\\mu(x))$ \nwith\n $(y,\\mu(y))$ \nthrough\n $f$ \n(paired as in we relate one “bit of structure” with just one other), we\nnow say something like the structure\n $(x,\\mu(x))$ \ncan be associated with many occurrences\n $(z,\\mu(z)),\nz\\sim x$ ,\nunder the relation\n $\\sim$ .\nNothing is fundamentally different here save for our map being “wider”\nin a sense: it links together more structure across objects. \n \n \n Looking at a rough sketch of congruence modulo\n $k$ ,\nwe can imagine “folding up”\n $\\Bbb Z$ \nto give us\n $k$ \nnew classes of items: \n \n \n \n\n\n \n \n \n It’s over these equivalence classes that we observe consistent\nstructure under addition. That is, the relation that induces this\npartition respects the operation: we can first add values in\n $\\Bbb Z$ \nand map to their equivalence classes, or first map to equivalence\nclasses and “add” them². \n \n \n \n\n\n \n \n \n The diagram above attempts to show what’s happening as move from\nitems to their equivalence classes (going down), and the ways we add\nthose objects (moving right). Note the difference in the 2D\nrepresentation here compared to similar diagrams above:\n $x_1$ \nand\n $x_2$ \nare both meant to be\n $k$ -dimensional,\nand we’re simply laying out our objects in the grid there. Note also the\nnotion of addition used over equivalence classes is the Minkowski sum,\nshowing how the operation being respected can show up in slightly\ndifferent forms between\n $\\mu_A$ \nto\n $\\mu_B$ .\nIn any case, the high-level take away here is that our\nstructure-preserving maps can be more than just functions: they can\nassociate many target objects to a single source, and this naturally\nabides by the same principles we’ve discussed above. In this particular\ncase, we’re saying that whatever the equivalence class “allegiances” of\n $+$ ’s\noperands and output are, they will be respected: addition is consistent\nup to class assignment. Put another way,\n $+$ \nplays along with the new notion of identity induced by the equivalence\nrelation (color in the diagram). If a blue item and green item add up to\na red item (following the top bridge), then adding any blue and\ngreen items will yield a red item (following the bottom bridge). \n \n \n In total, the relation-like map explored here helps broaden my\nintuition, shedding light on how general a structure-preserving map\nreally can be. If we’ve allowed a generalization from pairwise\nassociations through functional maps to more “class-like” associations\nvia relational maps, you might naturally ask whether we can do something\nsimilar over the operation. That is, can a map respect a class\nof operations or transformations? This is pretty much exactly what we do\nwith equivariant maps, and we rely on group theory to formalize\nvalid collections of transformations through the same language of\noperations (i.e., the group action). We explore this in depth below;\nthat’s what kicked off this aside in the first place. \n \n \n One final remark to potentially keep in mind: the notion of structure\npreservation is really more of a loose hierarchy. Questions like\nwhich structure or how much of it can be naturally\naddressed by strengthening or weakening one’s requirements in a given\nsetting. What is meant by “structure-preserving” is generally very\nflexible and, within the general framework we’ve formulated here, can\naccommodate any object, transformation, or interpretation of “map.” One\nmight attempt to formulate a map between groups and find that it fails\nto be a group homomorphism, but it can perhaps still be seen as a map\npreserving structure of sets or even some other relaxed notion of\nsymmetry. One also needn’t have maps that are particularly well-defined,\ne.g., analogous to surjectivity/injectivity; even trivial,\ninformation-destroying maps can be structure-preserving over the\n“places” where they’re defined. All in all: structure preservation is\nvery fundamental in abstract algebra, and having a strong intuition for\nwhat it means is important (even if I don’t yet have that, but I sure\nhave spent a lot of time toiling). But it is ultimately incredibly\nflexible and context-dependent, applying across many different layers of\nabstraction. \n \n \n \n Emerging from the aside on structure-preserving maps, the definition\nof equivariance feels pretty familiar. An equivariant map is one that\npreserves some notion of symmetry over its source and target objects. We\ncapture this expression of symmetry as a symmetry group, and\nthe preserved operation is the group action on the objects in the domain\nand co-domain. \n \n \n In particular, we say that an equivariant map\n $f$ \nrespects the group action of a group\n $G$ \nover its\n $G$ -sets.\nThis is similar in form to that of structure-preserving maps, but with a\nstronger requirement that structure is held up across all\n $g\\in G$ \n(rather than just a single operation; instead, we have a whole family of\noperations). That is, for a group action\n $\\cdot$ \nand for all\n $g\\in G$ ,\nwe have \n \n \n $g\\cdot f(x) = f(g\\cdot x)$ \n \n \n Again, this just says that the movement through\n $f$ \ndoesn’t break the structures in place between objects, which in this\ncase is the symmetry represented by the group’s transformations. If we\nlabel the output of the group action\n $x^\\prime = g\\cdot x$ ,\nwe can think of this structure as manifesting concretely in the pair\n $(x, x^\\prime)$ .\nIf our map is structure preserving, we need to see precisely the same\nstructure on the other side of\n $f$ ,\ni.e., when mapping\n $x$ \nand\n $x^\\prime$ \nwe find the results also “meet the criteria” for being bundled up as a\ntuple. Such a pair would be\n $(f(x), f(x^\\prime))$ \nwhich must exhibit the same relationship between the first and second\nelement as we had before\n $f$ ,\ni.e.,\n $f(x^\\prime) = g\\cdot f(x)$ .\nExpanding\n $x^\\prime$ \ngives us our form above.³ \n \n \n Note that a\n $G$ -set\nis just a set acted upon by\n $G$ :\nit’s effectively a formal construct that bundles those two together (the\nset and the group action) such that we get a standalone object. This is\njust the group equivalent of the “set plus structure” we saw before,\ne.g.,\n $(\\mathbb{Z}, +)$ :\nwe need a way to bring along more than just a set. (Possibly\nconfusingly,\n $(\\mathbb{Z}, +)$ \nis itself a group. The operation being respected there is what induces\nthe group; with\n $G$ -sets\nwe’re working with a group action rather than the group’s operation,\nultimately more like a functional than a function.) \n \n \n We can see a quick example with points (or shapes) in\n $\\mathbb{R}^2$ .\nWe can apply the rotations in\n $C_2$ \n(i.e., just\n0 $^\\circ$ \nand\n180 $^\\circ$ )\nand leverage an arbitrary map\n $f$ \nbetween shapes. Here we have a square\n $\\{-1,1\\}^2$ \nand map to a triangle by projecting points with positive\n $y$ \ncoordinates onto the\n $y$ -axis,\ni.e.,\n $(1,1) \\mapsto (0, 1)$ \nand\n $(-1,1) \\mapsto\n(0, 1)$ : \n \n \n \n\n\n \n \n \n Note that the orientation of the diagram is transposed relative to\nthe form we’ve been using for the naturality square and/or above\ndiagrams. Here we see that\n $f$ \nmaps the same orientation of the square to the same orientation of the\ntriangle, but our\n180 $^\\circ$ \nrotation changes only the orientation of triangle. The map\n $f$ \ntherefore does not respect the structure of\n $C_2$ ’s\ngroup action (the “family” of operations it represents). On the other\nhand, if\n $f$ \nmaps to another shape with\n180 $^\\circ$ \nrotational symmetry, we would find the group action is respected. Take,\nfor instance, an\n $f$ \nthat simply projects the points onto the\n $x$ -axis: \n \n \n \n\n\n \n \n \n This is equivariance: a map that changes related objects such that\nthe objects on the other side remain related in the same way. And so we\nreturn to natural transformations: \n \n \n \n\n\n \n \n \n Here\n $\\mu$ \nis our structure-preserving map: where there’s “structure” in\n $F(\\mathcal{C})$ \n(which just means related object pairs and how they’re connected via\nmorphisms), that same structure analog is present on the other side,\ni.e., in\n $G(\\mathcal{C})$ . \n \n \n As seen with equivariant maps,\n $\\mu$ \nrespects structure of not only a single operation, but instead a whole\ncollection of transformations, which in this case are the morphisms in\n $\\mathcal{C}$ ,\nor\n $\\text{Hom}_{\\mathcal{C}}$ .\nTo be precise,\n $\\mu$ \nrespects how structure present in\n $\\text{Hom}_{\\mathcal{C}}$ \nshows up in\n $\\mathcal{D}$ \nthrough the functors\n $F$ \nand\n $G$ ;\nit doesn’t interact with that structure in\n $\\mathcal{C}$ \ndirectly. \n \n \n This lines up nicely with the group theory analog discussed for\nequivariant maps: we had an abstract (symmetric) group\n $G$ \nwhose transformations were first made concrete through its\ngroup action on specific\n $G$ -sets.\n $f$ \nalways maps from/to specific objects, and\n $G$ ’s\nstructure must first be “realized” on those objects (producing the\nconcrete\n $G$ -sets)\nin order for us to check how\n $f$ \nwill behave. That action of “realizing” structure is what matters here,\nsince it’s coming from the same base group: however\n $g\\cdot$ \npresented for both objects\n $X$ \nand\n $Y$ ,\nthe resulting transformations in both “universes” can be fairly compared\nas they’re based on the same fundamental transformation in\n $G$ . \n \n \n The exact same thing is happening here with our base category\n $\\mathcal{C}$ :\nit can be seen as a kind of abstract category that is realized through\nthe functors\n $F$ \nand\n $G$ \ninto the category\n $\\mathcal{D}$ .\nEverything from there on out takes place solely in\n $\\mathcal{D}$ ,\njust as we didn’t strictly need the group\n $G$ \nprovided we had\n $G$ -sets\n(i.e., objects that already contain the relevant realizations of\n $G$ )\nin the case of equivariant maps. Put another way: once we “project”\n $\\mathcal{C}$ \nonto\n $\\mathcal{D}$ \nvia functors\n $F$ \nand\n $G$ ,\nwe no longer need\n $\\mathcal{C}$ \ndirectly. We only need to know which objects/morphisms correspond to the\nsame structure across\n $F$ \nand\n $G$ ,\ni.e., which realized objects/morphisms originate from the same\nobjects/morphisms in\n $\\mathcal{C}$ . \n \n \n In total, we have a\n $\\mu$ \nthat, through its applicable components (i.e., on objects in\n $\\mathcal{C}$ ),\nrespects the structure induced on its domain/co-domain by\n $\\mathcal{C}$ .\nIn the diagram,\n $X$ \nand\n $Y$ \nare any two objects in\n $\\mathcal{C}$ \nrelated by morphism, and for all such pairs of objects together with\neach of their morphisms\n $f\\in \\text{Hom}_{\\mathcal{C}}(X, Y)$ ,\n $\\mu$ \nmust uphold that structure in\n $\\text{Im } F$ \nand\n $\\text{Im } G$ \n, i.e., all of\n $\\mathcal{C}$ ’s\nstructure. \n \n \n Whiskering \n \n Loosely speaking, whiskering is the composition of functors\nand natural transformations. Suppose we have a natural transformation\n $\\eta:F\\implies G$ \nbetween two functors\n $F, G:C\\rightarrow D$ ,\nalong with another functor\n $H:D\\rightarrow E$ . \n \n \n \n\n\n \n \n \n Whiskering\n $H$ \nand\n $\\eta$ \nyields the natural transformation\n $H\\eta: H\\circ\nF\\implies H\\circ G$ ,\nwhere\n $(H\\eta)_X = H(\\eta_X)$ : \n \n \n \n\n\n \n \n \n This is simply the natural mapping between the two available composed\nfunctor trajectories. The new map\n $H\\eta$ \nhas components (i.e.,\n $(H\\eta)_X$ )\nthat are the components of\n $\\eta$ \nmapped through\n $H$ \n(i.e.,\n $H(\\eta_X)$ ).\nMore explicitly: \n \n \n \n\n\n \n \n \n This shows\n $\\eta$ ’s\ncomponents as they’re mapped by\n $H$ ,\nrelating objects\n $H(F(X))$ \nand\n $G(F(X))$ .\n $H\\eta$ \nis a natural transformation between functors\n $H\\circ F$ \nand\n $H\\circ G$ ;\nthis is a bit more clear when isolating the full passthrough: \n \n \n \n\n\n \n \n \n \n Vertical composition \n \n Suppose we have functors\n $F, G, H: C \\rightarrow D$ \nand natural transformations\n $\\eta: F\\implies G$ \nand\n $\\epsilon: G\\implies H$ . \n \n \n \n\n\n \n \n \n We can compose the transformations\n $\\eta$ \nand\n $\\epsilon$ \nto produce the map\n $\\eta\\circ\\epsilon: F\\implies H$ .\nThat is, we get a single map between\n $F$ \nand\n $H$ \nby gluing together intermediate natural transformations component-wise,\ni.e., \n \n \n $(\\epsilon\\circ\\eta)_X = \\epsilon_X\\circ\\eta_X$ \n \n \n \n\n\n \n \n \n Here we depict vertical composition as “stacking” functors with the\nsame source and target categories, compacting them to produce direct\nmaps between indirectly connected functors. \n \n \n \n Horizontal composition \n \n Suppose we have functors\n $F, G: C \\rightarrow D$ \nand\n $J, K: D \\rightarrow E$ ,\nwith natural transformations\n $\\eta: F\\implies G$ \nand\n $\\epsilon: J\\implies K$ . \n \n \n \n\n\n \n \n \n The horizontally composed natural transformation\n $\\epsilon * \\eta: J\\circ\nF\\implies K\\circ G$ \nis defined component-wise: \n \n \n $(\\epsilon * \\eta)_X = \\epsilon_{G(X)}\\circ J(\\eta_X) = K(\\eta_X)\\circ \\epsilon_{F(X)}$ \n \n \n \n\n\n \n \n \n Notice how this is (loosely) something like generalized whiskering:\nrather than a single natural transformation, we connect composed\nfunctors through multiple natural transformations across different\ncategories. The same principle applies: the composed map\n $\\epsilon \\eta$ \nlets you “stop off” and move through\n $\\epsilon$ \nor\n $\\eta$ . \n \n \n \n\n\n \n \n \n On the left, we see how\n $\\epsilon_{F(X)}$ \nconnects the same object,\n $F(X)$ ,\nacross both functors\n $J$ \nand\n $K$ ;\nthis is our usual intuition for natural transformation components as\nmorphisms. Once we’re at\n $K$ ’s\nobject for\n $F(X)$ ,\nwe can take\n $K$ ’s\nversion of\n $\\eta_X$ \nto move from\n $F$ \nto\n $G$ .\nThe right side takes an alternative route: first moving from\n $F(X)$ \nto\n $G(X)$ \nvia\n $J$ ’s\nversion of\n $\\eta_X$ ,\nthen taking\n $\\epsilon_{G(X)}$ \nto travel to\n $K$ . \n \n \n Either way, it’s worth recalling that natural transformations are\ncomposed of morphisms between associated objects under the source and\ntarget functors. For an object\n $X$ ,\nthe morphism components link\n $F(X)$ \nand\n $G(X)$ .\nIn the case of horizontal composition, we’re defining a natural\ntransformation component-wise that links objects\n $(J\\circ F)(X)$ \nand\n $(K\\circ G)(X)$ \nin the usual sense, connecting functors\n $J\\circ F$ \nand\n $K\\circ G$ .\nWe simply recognize there are multiple ways to establish this connection\nprovided we have\n $\\eta$ \nand\n $\\epsilon$ :\nmove by\n $J(\\eta)$ \nthen\n $\\epsilon$ ,\nor\n $\\epsilon$ \nthen\n $K(\\eta)$ .\nUnder the hood, these movements should be exactly the same, and\nconstitute the morphisms in our new natural transformation. \n \n \n Informal remark: this is clearly a bit more involved than\nvertical composition. Vertical movement (specifically moving down) in\nour diagrams represents movement between functors along a natural\ntransformation. Vertical composition, then, in name alone, sounds more\nnatural…and it is. There we simply stack transformations and objects\npass straight down, the beginning and end of the route is easy to\nconnect. Horizontal composition, however, requires movement across\nmultiple categories, with routes that move both right and down. Tracking\nobjects is therefore less trivial, with multiple possible paths. A new,\nabstracted natural transformation is possible to define all the\nsame. \n \n \n Summary: \n \n \n \n Whiskering: maps between functors facilitates maps\nbetween composed functors; stop off to take the defined natural\ntransformation, then complete the journey. \n \n \n Vertical composition: indirectly connected functors\n(over the same categories) can be connected directly \n \n \n Horizontal composition: generalized whiskering\n(roughly speaking); maps between functors can be chained to map between\ncomposed functors. \n \n \n \n \n \n Monad \n \n As the (apparently) infamous definition goes: a\nmonad is a monoid in the category of endofunctors⁴. We can try to set this up\nstep-by-step: \n \n \n \n An endofunctor of a category\n $C$ \nis a functor from the category to itself. \n \n \n The category of endofunctors* treats those functors as its\nobjects, and there are morphisms, as in any other category, between\nthem⁵. The category of functors between\ntwo categories\n $A$ \nand\n $B$ \nis denoted\n $[A, B]$ ;\ntherefore the category of\n $C$ ’s\nendofunctors is written\n $[C, C]$ . \n \n \nTaking that category, we recall what makes a monoid (at least, as we’ve\nstudied it in\n $\\text{Set}$ ):\nassociativity and identity. Functor composition and the identity functor\nsatisfy these requirements:\n \n \n The identity functor\n $1_C$ \nin the category\n $C$ \nmaps each object/morphism to itself (and is therefore an endofunctor).\nFor an endofunctor\n $F$ \nin\n $[C,\nC]$ ,\n $F = F \\circ 1_C = 1_C \\circ F$ .\nTherefore\n $1_C$ \nsatisfies our requirements for identity on objects in\n $[C, C]$ . \n \n \n By virtue of the fact that functors serve as morphisms in categories\nof categories, composition of functors is associative as is generally\nrequired by morphisms. \n \n \n We say that the identity functor and composition induce a monoidal\nstructure on\n $[C, C]$ .\nNote that we generally write monoidal categories as the triple\n $(C, \\otimes, I)$ \n(where\n $\\otimes$ \nis a bifunctor), containing monoid objects\n $(T, \\mu, \\eta)$ \nwhere \n \n \n $\\mu: T \\otimes T \\rightarrow T$ \nis the multiplication morphism \n \n \n $\\eta: I \\rightarrow T$ \nis the unit morphism \n \n \n subject to some associative and identity constraints (loosely). So\nour category of endofunctors over\n $C$ \ncan be written explicitly as the monoidal category\n $([C, C], \\circ, I_C)$ ,\nand a monad on C is a monoid object\n $(T,\n\\mu, \\eta)$ \nin that category. \n \n \n The morphisms between functors are generally referred to as\nnatural transformations. These transformations preserve the\ninternal structure (i.e., composition of morphisms) in the underlying\ncategories. \n \n \n \n Let’s look at the monoid object a bit more closely: \n \n \n $(C, \\otimes, I)$ \nis a monoidal category, i.e., a category\n $C$ \nendowed with an associative bifunctor\n $\\otimes: C\\times C\\rightarrow C$ .\n $C\\times C$ \nis a product category, with objects as pairs of\nobjects from\n $C$ \nand morphisms as pairs of morphisms between individual objects\nin the object pairs. The monoidal part of this just follows from our\nset-based definition, i.e., having an associative binary operation\n(bifunctor) defined over elements (objects, morphisms) of the set\n(category). \n \n \n Our monoidal category of interest is\n $([C, C], \\circ, I_C)$ ,\ni.e., the endofunctors on\n $C$ \ntogether with (associative) functor composition. An object\n $T$ \nin this category is an endofunctor\n $T: C\\rightarrow C$ ,\nand we have some defined morphisms for that object (which are actually\nnatural transformations; recall that natural transformations are\nmorphisms in categories of functors): \n \n \n \n Unit\n $\\eta: 1_C\\rightarrow T$ ;\ntells us how every object/morphism in\n $C$ \nis involved or transformed by\n $T$ .\nIt serves as an “injection” of values into the monad. \n \n \n Multiplication\n $\\mu: T^2\\rightarrow T$ ;\na reduction/flatten/join-like transformation, telling us how to flatten\nout doubly nested monadic structures back to “regular” monad\nobjects. \n \n \n \n \n\n\n \n \n \n \n\n\n \n \n \n Above we visualize these as morphisms between functor objects in the\ncategory of endofunctors on\n $C$ .\nThese are the canonical “movements” between functors that need to be\ndefined, but we can whisker in\n $T$ \nto move up and down levels of composition: \n \n \n \n\n\n \n \n \n Nested levels of structure can always be “unwrapped” into their\ncomponent pieces, until either\n $\\mu$ \nor\n $\\eta$ \ncan be applied to one of the items. The coherence\nconditions relate the different possible whiskering orders,\nrequiring that composed maps are equal: \n \n \n \n For natural transformations\n $T^3\\rightarrow T$ : \n $\\mu\\circ T\\mu = \\mu\\circ \\mu T$ \n \n \n\n\n \n \n This condition is analogous to associativity for monoids.\nWe’re starting from\n $T\\circ T\\circ T$ ,\nand can write parentheses: \n \n \n $T\\circ (T\\circ T) \\rightarrow T(\\mu)$ \n \n \n $(T\\circ T)\\circ T \\rightarrow (\\mu) T$ \n \n \n Where we draw parentheses is where we’re allowed to apply\n $\\mu$ \n(this is not suggesting the left and right side are equivalent;\njust a depiction of how the forms fall out when it comes to\nassociativity). This is initially a bit confusing because\n $\\mu T$ \nand\n $T\\mu$ \njust look like a commutative swap, when in fact they are two distinct\nmaps resulting from application of\n $\\mu$ \nat different places. \n Clarity: this is something like doing \n $(a+a)+a = a+(a+a) \\implies (2a)+a = a+(2a)$ \n When we look at the right side, the statement appears commutative in\nnature: we’re just swapping operands. But this is downstream from an\nassociative statement, and the simplification on its own is a\nbit misleading. \n \n \n For natural transformations\n $T\\rightarrow T$ : \n $\\mu\\circ T\\eta = \\mu\\circ \\eta T = 1_T$ \n where\n $1_T$ \nis the identity natural transformation on\n $T$ . \n \n \n\n\n \n \n This condition is analogous to identity for monoids. \n \n \n \n So just like we wrote the monoid\n $(M, \\cdot, e)$ \nfor sets, we have the categorical equivalent in\n $(T, \\mu, \\eta)$ .\n $T$ \nis the underlying object (an endofunctor),\n $\\mu$ \nis a binary operation (natural transformation on\n $T^2$ ),\nand\n $\\eta$ \nis an identity element. \n \n \n \nGeneralizing arity\n \n \n Small confusing point for me here: the analogy of\n $\\mu$ \nto a binary operation, in the usual sense, was unclear to me. We have\n $\\mu: T^2\\rightarrow T$ ,\nbut\n $T^2$ \nis still just a functor, i.e.,\n $T\\circ T$ ,\nso it feels just like a unary operation defined over any other functor,\ne.g., some natural transformation\n $f:\nT\\rightarrow T$ .\nThat is,\n $\\mu$ \nisn’t defined over pairs of objects, nor does it map explicitly\nfrom two functors; composition can at least be seen to operate on two\nfunctors\n $\\circ(F_A, F_B) \\mapsto F_C$ .\n $\\mu$ ,\nhowever, doesn’t operate explicitly on two items. \n \n \n A quick, perhaps obvious thing: maps can always be seen as operating\non one object. Even multivariate functions can just be seen to\nmap from a single\n $n$ -tuple,\nrather than being a construct capable of accepting\n $n$ \nseparate objects. Of course, this changes nothing about what any map of\nthis kind is capable of, but it’s a simplifying perspective that helps\nopen new possible interpretation for the term “arity.” \n \n \n We might now say the following: a map’s arity isn’t simply the number\nof arguments it can accept. Rather, it’s a quantifiable property of the\ndomain of objects over which it’s defined. Addition over the reals,\n $+:\\Bbb\nR^2\\rightarrow\\Bbb R$ ,\nremains the same 2-ary operation in the usual sense: it maps from the\nspace of 2-tuples, explicitly pairing up two items. Implicitly here\n $\\Bbb R^2$ \nrefers to the cartesian product\n $\\Bbb R\\times\\Bbb R$ ,\nand the arity of the resulting map is aligned with the dimensionality of\nthe product space domain. But what if we had an operation other than\n $\\times$ ,\ne.g.,\n $\\otimes$ ? \n \n \n This is precisely the question that leads us to analogizing\n $\\mu$ \nwith a (typical) binary operation. In monoidal categories, we saw above\nthat\n $\\otimes:\nC\\times C\\rightarrow C$ \nis a bifunctor. (Note how\n $\\times$ \nis used in the definition of\n $\\otimes$ ;\nthis whole “arity generalization” only really applies inside monoidal\ncontexts, so we have to get there first with existing machinery.) This\ncan be seen to operate on pairs of objects (and morphisms) in\n $C$ ,\nmapping them to other objects (and morphisms) in\n $C$ .\n $O\\otimes O$ \nis simply us applying\n $\\otimes$ \nto the same object\n $O$ ,\nand this is the new form of the domain for the binary operation-like\nmap\n $\\mu$ .\nWe’re simply generalizing the way we build the domain from two objects\nby letting our bifunctor\n $\\otimes$ \n“put things together” instead. \n \n \n In total, for our multiplication morphism\n $\\mu: O\\otimes O\\rightarrow O$ \ndefined for monoid objects in any monoidal category, we have a map that\ndoesn’t strictly look like a binary operation in the usual sense (hence\nthis messy aside). It is nevertheless constructed around a domain that\nis built from two objects with\n $\\otimes$ ,\nand that’s good enough for a general analogy to binary operations. It’s\nworth noting that for monads, in particular,\n $\\otimes$ \nis fixed as functor composition\n $\\circ$ ,\nand\n $\\mu$ ’s\ndomain\n $T^2 = T\\circ T$ \nis still just a functor.\n $\\mu$ ’s\n“arity” is therefore more akin to levels of nested composition\nrather than number of dimensions or arguments: it flattens. \n \n \n \n \nWhiskering\n $\\mu$ \nand\n $\\eta$ \n(detailed)\n \n \n \n\n\n \n \n \n \n\n\n \n \n \n \n When grappling with the intuition behind the coherence conditions, or\nreally just the general positioning of monoid objects, I find the\nfollowing helpful: \n \n \n Monoidal categories are associative safe spaces. They allow one to\nrewrite the rules of association, and with that new mechanism\nensure things behave in a familiar, expected way. Very simply put: you\nstart with a category\n $C$ ,\na collection of objects and relations between them. You then define the\nnotion of association\n $\\otimes$ ,\na bifunctor, which combines pairs of objects in\n $C$ \ninto new\n $C$ \nobjects. You’re now in a world endowed with a new “product substrate”\nfor defining “binary operations:” you take an object\n $A$ \nfrom that world, and a binary operation on the object maps from\n $A\\otimes A$ \nto\n $A$ . \n \n \n Recall that typically, i.e., in\n $\\text{Set}$ ,\nbinary operations are defined on specific sets, having structure\n $f: S\\times S\\rightarrow S$ .\nThe movement to the categorical generalization… \n \n \n \n loosens the kinds of objects we use, relaxing to categories other\nthan\n $\\text{Set}$ ,\nand \n \n \n generalizes the means of association, relaxing from\n $\\times$ \nto\n $\\otimes$ ,\nallowing for more than just simple object pairing. \n \n \n \n In short: monoidal categories are universes with their own notion of\n“productization” (under the monoidal product\n $\\otimes$ ).\nThis establishes the foundation upon which binary operations on\nobjects in that universe are built. Put another way: monoidal categories\nredefine the notion of association (what a pair of objects\nis), and binary operations define means of combination\nunder that paradigm. The former describes how we put objects together\n(ambient productization;\n $a,b\\mapsto (a,b)$ ,\n $f,g\\mapsto\ng\\circ f$ ,\nwhatever), and the latter is an action of combination hip to that notion\nof togetherness (internal combination;\n $(a,b)\\mapsto a\\cdot b$ ,\netc). (It’s not so easy to give a simple form for operating on\ncombined map-like objects such as\n $g\\circ f$ ;\nsuch an operation would typically be defined on an element-wise basis,\ni.e., if\n $(g\\circ f)\\mapsto h$ ,\nwe’d talk about how we map between items\n $g(f(x))$ \nand\n $h(x)$ \nrather than something more abstract.) \n \n \n Monads are what we get when we want our objects to “understand” a\nwhole type of (other) object (i.e., a category): no choice of\nobject\n $A$ \nin\n $A\\otimes A$ \nis partially defined. In monoids on\n $\\text{Set}$ ,\nwe must pick a particular set\n $S$ \nover which to define the binary operation (as discussed above),\nfundamentally allowing the exclusion of some objects that could be\nincluded in sets. That is, the set\n $S$ \ndoesn’t strictly contain a whole class of object (on purpose):\nwe have the flexibility to pick arbitrary sets over which the operation\nis defined. If we instead wanted\n $S$ \nto be representative of an entire type, we’d be effectively asking for\n $S$ \nto become a category itself (i.e., to fundamentally represent a certain\nkind of object if we’re going to define operations over it). So we let\n $A$ \nnot be an arbitrary object in the desired type’s category, but we “lock\nthat category in” by operating instead on endofunctors of that category.\nIn this case, our objects are maps defined over the entire type:\n $A\\otimes A$ \ninvolves a specific functor, but each functor is defined over the\nentire type/category of interest. \n \n \n Putting this all together: monads leverage the machinery of monoids\nto define a universe where “products” are sequences of composition; this\nis the fundamental thing monoidal product’s\n $\\otimes$ \nallow us to do. That composition takes place over not just any old\nfunctions defined between two objects, but instead on a map\n $T$ \ndefined over an entire class of object (i.e., a functor). Such a map can\nbe seen as “cladding” objects of the underling type/category in\nadditional structure: it “reskins” the object class by coating every\nobject and morphism individually. \n \n \n A movement under that map represents a movement into a new structured\nworld. The type of the underlying objects remains the same (it’s an\nendofunctor; we don’t leave the category), but the nature of the objects\non the other side is different (and different in a consistent\nway, having the newly added structure). Note that\n $T$ \nindiscriminately adds structure to all objects in the category: it can\nbe seen to map object\n $X$ \nto\n $T(X)$ ,\nand since\n $T(X)$ \nis already in the category, it’s also acted upon by\n $T$ \nin that single application, mapping to\n $T(T(X))$ ,\nand so on. Point is,\n $T$ \nadds structure indiscriminately, so multiple composition steps\ncan add too much structure: “already structured” objects\nreceive an extra layer of structure.\n $\\mu$ \nis a mediator or regulator to address this “bunching\nup,” flattening doubled up structure back to just a single layer. \n \n \n Monads are structured computation environments.\n $\\eta$ \nbrings you into it (maps “plain” values into the structured landscape),\nthe functor\n $T$ \nkeeps you in it (brings all objects and morphisms along for the ride\nevery time), and\n $\\mu$ \nflattens redundant build up (for those objects and morphisms that didn’t\nneed any more of the functor’s effects). In aggregate, you have the\nnecessary pieces for facilitating smooth, structured sequences of\ncomputation in\n $T$ ’s\nworld. \n \n \n Example: power set monad \n \n On\n $\\text{Set}$ ,\nthe power set monad is a monad\n $\\mathcal{P} = (T, \\mu, \\eta)$ ,\nwhere: \n \n \n \n $T(A)$ \nis the power set of a set\n $A$ ,\nand\n $T(f)$ \nis a function of direct images between power sets under a function\n $f:A\\rightarrow B$ \n \n \n $\\eta$ \nconsists of component morphisms\n $\\eta_A:A\\rightarrow T(A)$ ,\ntaking every\n $a\\in A$ \nto the singleton\n $\\{a\\}$ \n \n \n $\\mu$ \nconsists of component morphisms\n $\\mu_A:T(T(A))\\rightarrow T(A)$ ,\ntaking a set of sets to its union. \n \n \n \n For a concrete\n $A=\\{1,2\\}$ , \n \n \n \n $T(A) = \\{\\{\\},\\{1\\},\\{2\\},\\{1,2\\}\\}$ \n \n \n $T(T(A)) = \\{\\{\\},\\{\\{\\}\\},\\{\\{1\\}\\},\\{\\{2\\}\\},\\{\\{1,2\\}\\},\\dots,\\{\\{\\},\\{1\\},\\{2\\},\\{1,2\\}\\} \\}$ \n \n \n $\\eta_A(1) = \\{1\\}$ ,\n $\\eta_A(2) = \\{2\\}$ \n \n \n $\\mu_A(\\{\\{1\\},\\{2\\}\\}) = \\{1,2\\}, \\dots$ \n \n \n \n \n Common examples from functional programming \n \n en.wikipedia.org/wiki/Free_monoid \n en.wikipedia.org/wiki/Syntactic_monoid \n en.wikipedia.org/wiki/Monoid \n en.wikipedia.org/wiki/Monoid_(category_theory) \n en.wikipedia.org/wiki/Monad_(category_theory) \n en.wikipedia.org/wiki/Natural_transformation \n en.wikipedia.org/wiki/Monoidal_category \n \n \n \n \n \n \n \nFelt the need to spell out more of this here because I find myself\nworried about the possible flexibility of\n $f$ .\nAs in, it’s not exactly clear why things should automatically fall into\nplace to ensure it’s the only option. But the fact you can’t really\n“fill in” all the gaps (outside of the options that\n $g$ \nis defined over) in arbitrary ways helps me accept this; you must fill\nthem in a way that ensures\n $f$ \nrespects the category’s morphisms.\n \n↩︎ \n \nThis particular perspective lacks a little precision since our map is\nformulated more as a set-valued function\n $[\\cdot]$ \nthat takes a representative value to its equivalence class, but it’s\nnevertheless induced by an underlying relation.\n \n↩︎ \n \nGoing to the trouble to formulate things this way is really meant to\nhelp me avoid falling into my typical trap: getting caught on the\nseeming open-endedness of the structure-preserving commutative form.\nWhen I “mask away”\n $g\\cdot x$ \nbehind\n $x^\\prime$ \nas simply our symmetry’s “output” when acting on\n $x$ ,\nit becomes much easier to just follow\n $f$ ’s\napplication on both\n $x$ \nand\n $x^\\prime$ .\nThose two have a clear relationship before\n $f$ ,\nand that same relationship needs to be there after\n $f$ \nbetween whatever new items\n $f$ \nproduces. But when I leave\n $x\\prime$ \nas\n $g\\cdot x$ ,\nI find I get distracted by our\n $f(g\\cdot x)$ \nstep and want to deconstruct it, reading more into it than is actually\nthere. Really,\n $g\\cdot x$ \nis a new thing, and we write in terms that communicate precisely how\nit’s related to\n $x$ \n(which is of course clean and elegant)…it’s just that doing those two\ntogether often leads me astray.\n \n↩︎ \n \nHere I’m actually seeking a setup without some of the usual functional\nprogramming weight slapped on. That’s not to say it’s not helpful, but I\nwant to feel the more abstract fundamentals hit home first (or at least\nnow given all of the loose context I’ve absorbed with the 10 different\ntutorials that start with `Maybe`).\n \n↩︎ \n \nRight away this is a little confusing: morphisms in a category of\nfunctors are map-like things being applied to map-like things. Functors\nare the maps applying on the entire category (mapping all\nobjects/morphisms in\n $C$ \nonto themselves, observing totality but not necessarily surjectivity),\nand the morphisms between them\n \n↩︎ \n \n \n', 'toc': ' \n General\nproperties \n Algebraic structures\n \n Structure\ntypes \n Free monoid \n \n Free object\n \n Example: free monoids \n \n Natural transformations\n \n Whiskering \n Vertical composition \n Horizontal composition \n \n Monad\n \n Example: power set monad \n Common examples\nfrom functional programming \n \n ', 'created': '2020-05-25', 'modified': '2025-10-11 16:24', 'summary': '', 'abstract': '', 'series': ''}, {'id': 10974, 'path': '/home/smgr/Documents/notes/Abstract_algebra.md', 'rpath': 'Abstract_algebra.md', 'name': 'Abstract_algebra.md', 'title': 'Abstract algebra', 'link': 'Abstract_algebra', 'ftype': 'md', 'ctime': '1760225077.31', 'mtime': '1760225053.0', 'atime': '1760225053.0', 'type': 'wiki', 'yaml_text': 'title: Abstract algebra\ncreated: 2020-05-25\nmodified: 2025-10-11 16:24\ndatelink: [[2020-05-25]]\ntype: wiki\nsummary:', 'name_fmt': 'Abstract_algebra.md+src', 'format': 'src', 'content': '', 'toc': '', 'created': '2020-05-25', 'modified': '2025-10-11 16:24', 'summary': '', 'abstract': '', 'series': ''}]
indexes	{'format': {'html5': {'id': 10973, 'path': '/home/smgr/Documents/notes/Abstract_algebra.md', 'rpath': 'Abstract_algebra.md', 'name': 'Abstract_algebra.md', 'title': 'Abstract algebra', 'link': 'Abstract_algebra', 'ftype': 'md', 'ctime': '1760225077.31', 'mtime': '1760225053.0', 'atime': '1760225053.0', 'type': 'wiki', 'yaml_text': 'title: Abstract algebra\ncreated: 2020-05-25\nmodified: 2025-10-11 16:24\ndatelink: [[2020-05-25]]\ntype: wiki\nsummary:', 'name_fmt': 'Abstract_algebra.md+html5', 'format': 'html5', 'content': '\n \n \n General properties \n \n \n Associativity: the order in which operation\nevaluation is performed does not affect the result. That is, for a given\nexpression, you can pick operations and “resolve” them, one by one, in\nany order. Note that we’re not saying we allow any order of the\noperands (that’s commutativity), but rather that the\napplication order doesn’t matter. \n Associativity can be formally stated as follows: if, for all\n $a,b,c\\in S$ ,\nthe equation\n $(a\\cdot b)\\cdot c = a\\cdot (b\\cdot c)$ \nholds, then the binary operation\n $\\cdot$ \non the set\n $S$ \nis associative. Note how this is the most simple distillation of our\napplication invariance mentioned above: the operands remain in a fixed\norder, but parentheses can be re-written arbitrarily. \n Intuition: what does associativity buy us? Intuitively, it feels\nsomething like an associative operation can “see” the components of\ncomposite operands. The form\n $(a\\cdot b)\\cdot c$ \ntakes\n $a$ \nand applies pieces step-by-step, i.e., first adding\n $b$ ,\nthen mixing in\n $c$ \nafter. Associativity says we can first mix up\n $b$ \nand\n $c$ \nand give the result to\n $a$ ,\nand it doesn’t change\n $a$ ’s\nreaction: the\n $b$ \nand\n $c$ \nmixture is not a fundamentally new compound, but instead has transparent\ningredients. In practice, this matters when we don’t want to have to\nworry about when we assemble the pieces of a composite result, so long\nas they have the same final order. \n \n \n \n \n Algebraic structures \n \n An algebraic structure is a mathematical object consisting of a set,\noperations defined over the set, and a set of axioms/identities\nsatisfied by the operations. \n \n \n \n\n\n \n \n \n General axioms: \n \n \n \n Associativity: \n \n \n Divisibility: \n \n \n Identity: for every element in\n $x\\in X$ ,\nthere is an element\n $e\\in X$ \nsuch that\n $x\\cdot e = e\\cdot x = x$ . \n Note one obvious implication of this: our binary operation preserves\nthe presence of all items in the base set, i.e., is “full.” \n \n \n \n Structure types \n \n \n Magma:\n $(M, \\cdot)$ ;\na set\n $M$ \nwith a single closed binary operation\n $\\cdot$ . \n \n \n Quasigroup: a magma endowed with a notion of\ndivision, i.e., a pair\n $(Q, )$ \nsuch that \n $ax = b \\;\\;;\\;\\; ya = b$ \n are defined for unique\n $x, y\\in Q$ ,\nwith solutions written\n $x=a\\backslash b$ \nand\n $y = b/a$ .\nThe operations\n $\\backslash$ \nand\n $/$ \nare left and right division, respectively. \n \n \n Unital magma* \n \n \n Semigroup: an associative magma, i.e., a set\n $S$ \ntogether with an associative binary operation. \n \n \n Loop: a quasigroup with identity \n \n \n Associative quasigroup \n \n \nMonoid: a semigroup with identity, written as a triple\n $(M, \\cdot, e)$ ,\nwhere\n $e$ \nis the identity element.\n \n \n A submonoid of a monoid\n $(M, \\cdot, e)$ \nis a monoid\n $(N, \\cdot, e)$ \nsuch that\n $N \\subseteq M$ .\nThat is, it is a monoid defined around a base set that is a subset of\nthe “supermonoid’s” base set, has the same identity, and remains closed\nunder the inherited binary operation. \n \n \n A subset\n $S$ \nis a generator of a monoid\n $M$ \nif the smallest submonoid of\n $M$ \ncontaining\n $S$ \nis\n $M$ .\nThat is,\n $S$ \nneeds to be a “tight base set” in order to be a generator; one can’t\ninclude\n $S$ \nin the base set to produce a monoid “smaller” than\n $M$ .\nMonoids with finite generators are said to be finitely\ngenerated. \n \n \n \n \n Group: \n \n \n \n \n Free monoid \n \n The free monoid on a set (or alphabet)\n $A$ \nis the monoid\n $(A^, ., "")$ ,\nwhere\n $.$ \nis the string concatenation operator, the empty string is the identity\nelement, and\n $A^$ \nis the Kleene star (the unary operator\n $$ )\napplied to\n $A$ .\nWe often refer to the free monoid as\n $A^$ \n(which is a little confusing since one writes the monoid’s internal set\nas\n $A^$ \nas well). \n \n \n \n A graded monoid* is a monoid that can be written\nas \n $M = M_0\\oplus M_1\\oplus \\cdots \\oplus M_n$ \n which is to say it can be “factored” or “graded” as a collection of\nsubmonoids (i.e., it’s layered). All free monoids are graded;\n $M_i$ \ncontains the free monoid’s strings of length\n $i$ . \n \n \n The “base set”\n $A$ \nover which the Kleene star (or Kleene plus) is applied contains the\nfree generators of the free monoid\n $A^$ .\nHere there’s a fairly clear analogy to vector bases: the free generators\nare “single-letter words” (i.e., characters) whose arbitrary combination\nforms all elements in the monoid. Further, the cardinality of the set of\nfree generators is referred to as the rank* of the free\nmonoid (although each free monoid has exactly one set of free\ngenerators, while vector spaces can have many bases). We also call this\nfree generating set a basis. \n \n \n A code is a set of words\n $C$ \nwhose Kleene star\n $C^$ \nis a free monoid for which\n $C$ \nis a basis. \n \n \n \n \n \n Free object \n \n Free objects are to categories what bases are to vector spaces,\nroughly speaking. A basis is a fundamental characterization of a vector\nspace, and linear transformations between spaces can be captured\nentirely by their values on that basis. Free objects are effectively\nobjects with a known set of generators, analogous to a basis together\nwith the vector space it spans. \n \n \n In particular, let\n $C$ \nbe a concrete category* with a faithful (forgetful) functor\n $U: C \\rightarrow \\text{Set}$ \n(i.e.,\n $U$ \nis an injection from objects in\n $C$ \nto sets). Let\n $X$ \nbe a set, to serve as the basis of the free object. Then the free\nobject on\n $X$ \nis a pair\n $(A, i)$ ,\nwhere\n $A$ \nis an object in\n $C$ \nand\n $i: X\\rightarrow U(A)$ \nis the canonical injection. Here\n $U$ \ncan be thought of as a map that “strips away” all structure but an\nobject’s base set, and\n $i$ \nis therefore responsible for connecting\n $A$ ’s\nbasis (or generators) to items in\n $A$ ’s\nbase set. Free objects also satisfy the universal property that, for any\nother object\n $B$ \nin\n $C$ \nwith a map\n $g: X \\rightarrow U(B)$ ,\nthere is a unique\n $f: A\\rightarrow B$ \nsuch that\n $g = U(f) \\circ i$ .\nWhat this tells us is, if we want a map\n $f$ \nfrom objects\n $A$ \nto\n $B$ ,\nit is uniquely determined by how we choose to map the\n $A$ ’s\nbasis\n $X$ \ninto\n $B$ ’s\nbase set,\n $U(B)$ . \n \n \n Intuitively, what we’re doing here is “peeling back” the layers of\nobjects in arbitrary categories and establishing a fundamental link\nusing the common language of sets. When a (free) object is\ngenerated by a known set\n $X$ ,\nwe can treat those items in\n $X$ \nas guiding axes that map into the core set-based representation of a\ntarget object. We can then “reapply” the layers/structure on top of this\nfundamental relation, getting a map between full objects for free. \n \n \n Example: free monoids \n \n Returning to free monoids, we can apply the details for free objects\nmore generally as laid out above. Recall our general notation for the\nmonoid as a triple\n $M = (A^, ., ``")$ \ndefined over a set\n $A$ . \n \n \n \n We have the forgetful functor\n $U: \\text{Mon}\\rightarrow\\text{Set}$ \nwhich simply forgets the operation and identity, i.e., mapping\n $M$ \nto\n $A^$ . \n \n \n $A$ \nis the base set of generators, the basis for the monoid. \n \n \n The generator injection\n $i: A\\rightarrow U(M)$ \ncanonically maps each element of\n $A$ \ninto the monoid. In this case, the symbols in\n $x\\in A$ \nare mapped to one-letter words, i.e.,\n $x\\mapsto [x]$ ,\nas they appear in\n $U(M) = A^$ . \n \n \n For any other monoid\n $N$ ,\nalong with a function\n $g: A\\rightarrow U(N)$ ,\nwe get a unique monoid homomorphism\n $f: M \\rightarrow N$ .\nThis function has the specific form \n $f(x_1x_2\\cdots x_n) = g(x_1)\\cdot g(x_2) \\cdots g(x_n)$ \n That is, the string\n $x_1x_2\\cdots x_n$ \nin\n $M$ \ngets mapped to the equivalent in\n $N$ \nafter replacing each symbol\n $x_i\\in A$ \nwith the string in\n $N$ \nthat it’s mapped to by\n $g$ . \n \n \n \n For example, suppose we took the base set\n $A = \\{a, b, c\\}$ \nfor monoid\n $M$ ,\nand\n $B = \\{x,y,z\\}$ \nfor monoid\n $N$ .\nThen a function \n \n \n $g(a)=x \\; ;\\; g(b)=y \\; ;\\; g(c)=z$ \n \n \n determines a unique mapping between the entire monoids, and we can\nconvert any string as was written above, e.g.,\n $aaccbb \\rightarrow xxzzyy$ .\nNote the flexibility in the assignment of\n $M$ ’s\nbase symbols to strings* in\n $N$ \n(i.e., any element in\n $U(N)$ ,\nwhich in this case is\n $B^$ ).\nFor example, it’s just as valid to define\n $g$ \nas \n \n \n $g(a)=xx \\; ;\\; g(b)=yy \\; ;\\; g(c)=zz,$ \n \n \n and conversion takes place as before under\n $f$ ,\ne.g.,\n $acc\\rightarrow xxzzzz$ .\nAgain, the universal property that applies here suggests that specifying\n $g$ \nuniquely determines how we map the monoid objects* onto\neach other. \n \n \n One additional observation to note here is how small the scope is for\nthe morphism equivalence\n $g = U(f)\\circ i$ .\nThe domain of\n $g$ \nis\n $M$ ’s\nalphabet (or base set), and that’s all we’re even allowed to specify.\nTherefore\n $U(f)$ \nneed only match\n $g$ \nat these points of definition, and it has no duty to uphold usual monoid\nhomomorphism properties (namely identity and multiplication\npreservation). But once we “extend” that back to a full monoid\nhomomorphism\n $f$ ,\nit needs to match\n $g$ \nalong the generators we’ve provided while also taking care to actually\nbe a monoid homomorphism, by definition. All that to say, there is\nindeed extra structure that comes into the picture as we bring\n $g\n= U(f)$ \nback into the category we’re working in, and those extra restrictions\nplay a role in constraining the map to uniqueness¹. \n \n \n \n \n Natural transformations \n \n Foreword: this section is adapted from a very large\n“development aside,” i.e., a section I wrote to address concerns with my\nintuition as I worked through relevant concepts. As such, it’s far less\nstructured than the other sections on this page, but is central\nto my understanding of maps, in the most general sense. \n \n \n Natural transformations feel so natural, I struggled to see why they\nwere even explicitly defined. Perhaps more accurately, I had a poor\nunderstanding of the implications of the constraints that make up a\nnatural transformation and what it’d mean to violate them. \n \n \n As we’ve established, natural transformations are best introduced as\nmorphisms in categories of functors, and the constraints that apply\nensure that those morphisms behave in an appropriate way (according to\nthe usual category rules, that is). In order to do that, we need to\n“break open” the objects they’re being applied to: the functors. This\ngives us the family of morphisms that apply to the objects\ninside the source/target categories of the functors. If you\ndoubt this move, consider what else we really have to work with; to\nappropriately map between map-like things, one really must inspect what\nchanges between the maps in question (i.e., for the same input, how do\nthe maps’ outputs change?). If we can consistently characterize the\nchange in outputs between two functors (which includes both objects and\nmorphisms of the underlying categories), then we’ve got the thing we\nwanted. \n \n \n \n\n\n \n \n \n Intuitively, the naturality square ensures that we can naturally move\naround in categories before and after a functor is applied. It is a\nconsistent bridge between categories induced by functors that\nimplies the order of morphism and functor application doesn’t matter.\nThe analogy I like here (although I came up with it, so it may be\ndubious) is that the coherence conditions ensure “safe passage” through\nchains of functors. If you’re moving from a source object to a target\nobject via a series of morphisms across categories, as long as you have\nnatural transformations between the functors linking those categories,\nyou can apply your morphisms in whichever category you like: you can\nload them up all in the first category, do a\nmorphism-functor-morphism-functor, or wait until the very end. They will\nall yield the same final object, and you never have to worry that you\ntook the “wrong exit” via a particular morphism in a particular category\nthat changes the dynamics for the rest of your journey. \n \n \n Let’s look at a small, concrete example: functors between a few\nset-like objects. Here we have a category\n $\\mathcal{C}$ \nwith three sets\n $X_1$ ,\n $X_2$ ,\n $X_3$ \nand two morphisms\n $f_A$ ,\n $f_B$ \nbetween them as depicted: \n \n \n \n\n\n \n \n \n The diagram also shows two functors\n $F$ \nand\n $G$ \nthat map between\n $\\mathcal{C}$ \nand another category\n $\\mathcal{D}$ ,\nitself containing three set objects. You can see how the functors may\nchange the objects/morphisms in different ways: over\n $\\mathcal{D}$ ’s\nobjects (which, despite the labels of 1, 2, 3, don’t necessarily\ncorrespond to those in\n $\\mathcal{C}$ ;\nit’s just an implicit ordering so we can compare differences\nconsistently),\n $F(f_A)$ \nand\n $G(f_A)$ \napply to different source/target objects due to\n $F$ \nand\n $G$ \nmapping the objects differently (we’ll see this a bit more explicitly in\nthe next diagram), i.e.,\n $F$ \nmaps\n $X_1$ \nin\n $\\mathcal{C}$ \nto\n $Y_1$ \nin\n $\\mathcal{D}$ \nwhereas\n $G$ \nmaps\n $X_1$ \nin\n $\\mathcal{C}$ \nto\n $Y_2$ \nin\n $\\mathcal{D}$ . \n \n \n $\\eta$ \nis a natural transformation between functors\n $F$ \nand\n $G$ …at\nleast in name. We need to show that the relevant conditions are met;\notherwise we might merely refer to it as a map between functors. Here we\ncan look at the same example, but with additional detail relating\nobjects: \n \n \n \n\n\n \n \n \n As stated before, natural transformations are defined over category\ncomponents, and by definition we have a family of morphisms: \n \n \n \n $\\eta_{X}: F(X) \\rightarrow G(X)$ \nfor any object\n $X$ \nin\n $\\mathcal{C}$ .\nThis is called the “component of\n $\\eta$ \nat\n $X$ ,”\nlinking the two object targets under the functors for each source object\n $X$ . \n In\n $\\mathcal{D}$ ,\nit’s pretty easy to think of this collection/family of morphisms as the\nstitching holding the functors together. If we land anywhere in\n $\\mathcal{D}$ \nby using\n $F$ ,\nwe can simply walk across these component morphisms to get to the\nequivalent under\n $G$ . \n \n \n We additionally require that, for every morphism\n $f: X\\rightarrow Y$ \nin\n $\\mathcal{C}$ ,\nwe have \n $\\eta_Y\\circ F(f) = G(f) \\circ \\eta_X$ \n In words, this says the following two processes are identical: \n \n \n For any object\n $F(X)$ ,\nmap it to\n $G(X)$ \nvia\n $\\eta_X$ \nand apply\n $G$ ’s\nversion of\n $f$ ,\ntherefore mapping into\n $G(Y)$ \n(recall that a functor\n $A$ \non\n $C$ \nmust map\n $f$ \nto\n $A(f): A(X) \\rightarrow A(Y)$ ). \n \n \n For any object\n $F(X)$ ,\nfirst apply\n $F$ ’s\nversion of\n $f$ \nthereby mapping into\n $F(Y)$ ,\nand then move to\n $G(Y)$ \nvia\n $\\eta_Y$ . \n \n \n \n \n \n The last equivalence is often expressed by the following commutative\ndiagram (where details are borrowed from our example, but it\nnevertheless reflects the general form of the naturality square): \n \n \n \n\n\n \n \n \n So now we’ve contextualized the family of morphisms a bit more, and\nhave a general sense for what it might mean to move “naturally” between\nfunctors. But the components\n $\\eta_X$ \nseem to fall out pretty much automatically by simply associating functor\ntarget objects…right? When does this not hold? Let’s look at an example\nwith concrete objects and morphisms. \n \n \n \nAside: morphisms, once and for all™\n \n \n It’s worth noting that I needed this level of depth to bump into, yet\nagain, a (seemingly persistent) fundamental misunderstanding of\nmorphisms. Looking at the naturality square, I initially thought\nsomething like the following: “well both paths from\n $Y_1$ \nget us to\n $Y_3$ ,\nso things just seem to hold by definition.” The first part of that is\nperfectly accurate I suppose, but it’s missing the actual meaning of the\nstatement. Morphisms are more than just a map from one object to\nanother, as in they don’t just take in a whole object and produce\nanother whole object. Getting to an object isn’t enough here, it’s\ngetting to it in the same way. Put simply, I was thinking on\ntoo high a level, as if morphisms are necessarily maps that must ignore\nany inner components of the objects they’re acting on. This is likely me\njust misinterpreting the super high level of abstraction one faces when\nwarming up to category theory. We write morphisms as maps\n $f: X\\rightarrow Y$ \nand rarely say anything more specific unless working in a specific\ncategory, so I pretty much took that mean what it says in the most\ngeneral sense: a morphism\n $f$ \ntakes in an object\n $X$ \nand gives out an object\n $Y$ ,\nas if there’s nothing more atomic to act on than entire objects. But of\ncourse, a morphism merely maps from an object to another with no\nexplicit guarantees of completeness, and the way you get to the\nco-domain can vary. Morphisms in general can exhibit all the usual\nflexibilities of functions in\n $\\text{Set}$ ,\ne.g., not strictly being surjective, and of course two functions\n $f: X\\rightarrow Y$ \nand\n $g:\nX\\rightarrow Y$ ,\ndespite sharing a domain and co-domain, need not actually map individual\ncomponents to the same place (i.e., we can have\n $f(x) \\not=\ng(x)$ \nfor any\n $x\\in X$ ). \n \n \n I faced similar issues over in Category theory§Pullback, with a\nsimilarly detailed accounting of what now feels like an incredibly basic\npoint. This just goes to show how easy it can be to misconstrue even\nsimple, foundational concepts. Ultimately I think this just stems from\noperating at a level of uncomfortable abstraction, and you tend to be\nvery careful making assumptions about anything that isn’t otherwise\nstated outright. I think the fundamental disconnect here has been the\nfact that almost no category theory verbiage deals in elements smaller\nthan objects, and I’ve hesitated thinking in terms of anything lower as\na result. Perhaps this is a result of not having followed a clear\nintroduction built on the back of sets, which perhaps would’ve made this\nwhole thing a non-issue. \n \n \n After a considerable amount of toiling, I finally feel okay with how\nwe get from the very abstract and vague descriptions of morphisms to\nmaps aware of object structure. I didn’t even think I took issue with\nthis, but I got myself caught in a loop where I just couldn’t feel why\nmaps shouldn’t be required to return full objects. We say generally that\na morphism is a map from an object\n $X$ \nto an object\n $Y$ ,\nand that’s about as specific as we get (except for identity and\nassociativity). If we’re operating at this level and saying nothing\nfurther about the structure of those objects, it feels pretty\nstraightforward to interpret that as a map taking an object\n $X$ \nand giving us an object\n $Y$ \nout. The problem is: it doesn’t mean that, or at least it doesn’t have\nto. Firstly, we stay at such an abstract level because it lets us say\nonly as much as we need: as long as those top-level items are true, we\nhave a viable morphism. Anything that falls under that\njurisdiction should then be fair game; from here on out my problems\nmostly remained with the verbiage. \n \n \n Here I was struggling with the idea that objects are atomic units\nthat can’t be further decomposed. This is again missing nuance; we\nobviously have some kinds of objects, like sets, that can be seen as\ncontainers of other items. When we say objects are atomic, we\neffectively mean they’re atomic relative to the category\ntheoretic statements being made. That is, they’re the “lowest level”\nconstruct those statements need to care about to be true, and they don’t\ncare if there happens to be further structure when one looks closer.\nNow, actually defining classes of morphisms in specific categories is\nalmost always a process of designing a map-like construct that is not\nonly aware but often preserving of underlying object structure.\nSo it’s not as if morphisms, in the way they’re actually applied, don’t\nlook closer at the objects they apply to; they often must in order to\nremain consistent. It’s just that one can ignore those specifics and\ntreat objects as black box units when it comes to making broad, category\ntheoretic statements about the structures at play. \n \n \n So decomposing is fine, but how about non-surjectivity? That still\nirked me a bit, when our top level statements seem to suggest we’re\ndealing in “whole objects.” It’s worth noting right away that any notion\nof “partial objects” can only present inside categories where\nobjects can be decomposed further. Again, at the top level, we just\ndon’t see inside the objects, and statements we make cannot apply to a\ndeeper, internal level. So this non-surjectivity is merely a byproduct\nof behavior we allow inside of a category like Set, because it\ndoesn’t break those top-level rules. If it did, sure,\nsurjectivity would be important to apply everywhere, but our rules of\ncomposition and identity on morphisms work all the same whether we have\nour set function mapping onto the co-domain or not. It is of course\nperfectly natural to non-surjectively map between two sets, so we also\nactually want to be able to do this. Here I find embracing the word “to”\nis effective: morphisms map an object\n $X$ \nto an object\n $Y$ .\nUnder the hood, this may very well correspond to associating an element\nin\n $Y$ \nto each element in\n $X$ ;\nthat’s perfectly allowed when defining morphisms on set-like objects. To\nget the entirety of this whole aside, the only thing you need\nto accept is the following: we are merely taking\n $X$ \nto\n $Y$ ,\nnot (necessarily) transforming it 1-to-1, and that’s an acceptable\nmeaning of the term “map.” Imagine how small\n $X$ \ncan be and how large\n $Y$ \ncan be and this still make sense as a map. For example, let\n $X$ \nbe my group of friends and\n $Y$ \nbe seat numbers in an NFL stadium. Our NFL tickets map each person to\ntheir one seat number, but that makes up a tiny fraction of all\navailable seats and is obviously not “onto.” And it shouldn’t have to be\nfor this relation to serve as a valid map between\n $X$ \nand\n $Y$ ,\nnot only in the set-level or the category theoretic sense, but in the\nmore pesky linguistic sense that got me into this mess in the first\nplace. From the top-level, such a map really does get to be an arrow\nbetween the two whole objects\n $X$ \nand\n $Y$ .\nHow much the image of that morphism actually “fills out”\n $Y$ \njust isn’t relevant: to even think about the items in the image requires\nset-level awareness that cannot be seen when dealing in statements where\nobjects are atomic by definition. \n \n \n Even with all this, I still find it slippery somehow. When I try to\nroot out what’s wrong, I think it just comes down the notation of\nwriting functions according to their domains/codomains, e.g.,\n $f: A\\rightarrow B$ .\nIt’s actually this that I take issue with, more than anything inherent\nto category theory or object level abstractions or whatever else. It\nreally goes beyond functions: it’s with binary relations. The simple\nfact we can call a link between one object in\n $A$ \nto one object in\n $B$ \na relation (which is actually a valid, minimal relation) is what I find\nkind of pesky and stupid. That just doesn’t feel like what we\nmean by relation. So something like the single pair (2, green) is a\nvalid relation from integers to colors. By all accounts that’s an\nassociation from an integer to a color, but it doesn’t\nseem like what we’d ever mean by a relation\n $R: \\text{Int} \\rightarrow \\text{Color}$ \n(that notation is a little more conventionally functional than barebones\nrelation, but the point stands). That is where I’m\ngetting so hung up I think. \n \n \n I’ve spent even more time on this, with a three part audio\nrecording series to accompany it. But I’ve hit the root of the issue\nwith the following, once and for all. \n \n \n \n\n\n \n \n \n (left side of diagram) Within Set, object structure is visible.\nMorphisms can be defined over the elements within the objects, and the\nabove depiction includes (effectively) the simplest possible relation\nbetween the two (i.e., connecting a single element in X to a single\nelement in Y). Two comments: \n \n \n \n I’ve complained this doesn’t even feel like a relation (or what we’d\nmean by any such definition of one), let alone a morphism. But binary\nrelations can be seen to act on all pairs in the Cartesian product\nbetween sets, mapping those with related items to True, and False\notherwise. This is at least holistic; no elements in the domain or\nco-domain are systematically ignored just because they aren’t assigned a\nTrue value. That is, no elements “slip by” the relation: it can always\nformally be seen to make a Boolean decision for every possible pair. \n \n \n If you don’t accept that this should “mature” to a morphism at the\nabstract level, notice that the line you’d then have to draw is\narbitrary. How many items must be included for it to be enough, for it\nto count as a “map?” Notice that, regardless of how “full” your relation\nmay be, as soon as you become blind to element-specific involvement, all\nyou see is a bridge between the two objects. This is what we do as we\nmove to the right diagram. \n \n \n \n (right side of diagram) Outside of Set, back in the general abstract\ncategory theory viewpoint, we lack any insight into internal object\nstructure. One can imagine masking off all element-wise knowledge we had\nwhile in Set, leaving the above: two atomic objects with some\nconnection between them. How these objects are related is no\nlonger even a well-posed question at this level of granularity; they\nsimply are or are not connected, and the “endpoints” of the arrow can\nonly be the entire object. \n \n \n So all relations inside of Set, no matter how large, are viewed\nuniformly once we’re outside of Set. “Thin” relations mature to full\nmorphisms in the abstract view (since they can’t be distinguished from\nany other relation), and from the abstract view a full morphism simply\nsuggests any relation exists under the hood. \n \n \n It’s worth noting that relations are not even valid morphisms in Set,\nas they do not generally respect associativity of composition. But once\nyou accept that relations are, in the most basic sense, mappings between\nsets, then you’re past the issue of non-surjective functions and the\nidea of yielding “less than Y” via a morphism that doesn’t map onto the\nco-domain. That is to say: from the abstract point-of-view, any kind of\nconnection between elements within Set simply looks like a connection\nbetween the set-objects themselves, and it says nothing about what that\nrelation must look like once you’ve revealed the element-level details.\nNot involving all elements in your map is not something that can be seen\noutside of Set, so all relations between any source and target look the\nexact same. \n \n \n Taking a look back at this some weeks later: I think I do a good job\nconveying what felt problematic and how I got past it. This is from the\nperspective of not really recalling (right away, anyway) what the\nproblem was in the first place, and I feel satisfied by the end of this\npassage. Here I think it may be helpful to simply recap why we’re here\nin the first place: looking at diagrams like the naturality square, one\ngets the sense that simply following two trajectories that end up in the\nsame object makes them equivalent, by the sheer fact they point to same\nplace. But that alone doesn’t actually make those (composite) morphisms\nequivalent, since they’re allowed to have such distinct structure in\narbitrary categories (like being functions in Set) while\nmapping from/to the same objects. So in commutative diagrams, like the\nnaturality square, we’re not saying that any two composite morphisms are\nequivalent because they start and end at the same places, but\ninstead that any two morphisms starting and ending at the sample place\nmust be equivalent under the morphism structure in the\ncategory. That is, not just any morphisms with the right\ndomain/co-domain will do; they must actually be the same in the\nunderlying category in order to uphold the kind of constraints the\ndiagram is talking about. This is annoying/confusing because the\ndiagrams don’t really convey this: you simply see arrows between\nobjects, and it seems to suggest that moving from/to the target objects\nis good enough. But in reality, they are merely abstracting away any\ncategory-level morphism specifics so as to make a category-independent\nstatement, but crucially those specifics will still apply when\noperating in any such category. That is to say: once you’re actually\noperating in a specific category, the known structure of its objects and\nmorphisms comes into view, and you cannot ignore that revealed\nstructure. I routinely find myself thinking in terms of Set when staring\nat commutative diagrams, and it’s a mistake to mix the two levels of\nabstraction (e.g., allowing oneself to think of the objects as sets, but\ncontinuing to treat morphisms as vague, non-functional maps; if you\nembrace the former, you must also work with the latter, else you’re\nmixing levels of abstraction). \n \n \n \n Here we’ll exploit a breaking condition for natural transformations:\nwhen the component\n $\\eta_X$ \n(i.e., that which connects\n $F(X)$ \nto\n $G(X)$ )\ndepends on static data from the object\n $X$ \nitself. Here we’re manufacturing an unnatural dependency, one\nthat allows\n $F(X)$ \nto map to\n $G(X)$ \nbut in a way that does not act “fairly” across\n $X$ \n(cheating with knowledge of\n $X$ ’s\nvalues rather than just its structure). \n \n \n \n\n\n \n \n \n The example setup is as follows:\n $F$ \nand\n $G$ \nare both the identity endofunctor on the category\n $\\mathcal{C}$ .\n $\\mathcal{C}$ \ncontains two objects in Set:\n $X=\\{1,2\\}$ \nand\n $Y=\\{3,4\\}$ ,\nas well as a morphism\n $f: X_1\\rightarrow X_2$ \nthat is the constant function\n $f(x) = 4$ .\nWe then define the component morphisms\n $\\eta_X: X\\rightarrow X$ \nand\n $\\eta_Y: Y\\rightarrow Y$ \nto be the constant functions mapping to their respective set’s smallest\nelement, i.e.,\n $\\eta_X(x)\n= 1$ \nand\n $\\eta_Y(y)= 3$ . \n \n \n This is effectively a contrived example to ensure\n $\\eta_X$ \nand\n $\\eta_Y$ \ndisagree somewhere along the chain from\n $F(X)$ \nto\n $G(Y)$ :\n $X$ ’s\nvalues map to\n $Y$ ’s\nsecond value, whereas\n $Y$ ’s\nvalues map onto its own first value. Starting from the value of\n $1$ \nin\n $X$ ,\nwe have the following two trajectories: \n \n \n \n\n\n \n \n \n An even simpler demonstration of this is to take\n $\\eta_X$ \nas shown, and the constant function that maps to\n $X$ ’s\nother element, i.e.,\n $f(x)=2$ .\nOne can then see how the naturality square breaks just over the object\n $X$ \nwhen the order of those two morphisms are swapped:\n $1\\rightarrow 1\\rightarrow 2$ \nvs\n $1\\rightarrow 2\\rightarrow 1$ \n \n \n So what does this tell us? In this situation, we can see that the\norder of application matters. The map that is\nnatural-transformation-then-morphism is different than the\nmorphism-then-natural-transformation: their composite morphisms do not\nyield the same map in total. For consistency in component morphisms, we\ngenerally require that the choice of map acts on objects in a\ncoordinate-free manner. That is, it must be equivariant with\nrespect to the morphisms in\n $\\mathcal{C}$ \n(by definition). \n \n \n \n Breaks when the map is dependent on static data from the object\nitself \n \n \n not sufficiently relative; knowledge of the operating object\n“overwrites” the work done \n \n \n needs to be sufficiently local, sufficiently relative; shouldn’t\ndepend on external object structure. \n \n \n imagine first picking morphisms, then\n $\\eta$ s;\nand vice versa. This illuminates my confusion a bit since both feel fine\nbut they should get us into an equal amount of trouble \n \n \n \n On equivariance: invariance is observed\nwhen a property of a mathematical object remains unchanged under certain\nkinds of transformation. Such a property is called an invariant of the\nobject with respect to the transformation. Take, for instance, the area\nof a triangle: this is an invariant property of the shape under\ntransformations like rotation and reflection. Equivariance is a\nslightly looser notion of invariance, not requiring a property to remain\nwholly unchanged under transformation, but to itself be transformed\nalong with it in a predictable way. That is, transforming the\nobject and its property in the same way respects the definition of\nthe property. Keeping with the triangle example, one can look at\nthe triangle centroid. This effectively moves with the triangle and will\nchange under translation. So while it’s not an invariant (numerically\nthe same), it will change with the triangle under the same\ntranslation. \n \n \n \nStructure-preserving maps\n \n \n To preface: equivariance is easily framed from the perspective of\nstructure-preserving maps, which is a term commonly thrown around when\nspeaking about morphisms. Therefore, this too has been a slippery topic\nfor me in the past, and I’ve spent a sizeable chunk of time ensuring I\nhave a comfortable intuition around this. Structure preservation,\nequivariance, and natural transformations are heavily intertwined; I\nfind it helpful to frame latter two as structure preservation through\nentire classes or groups of transforms; we’ll formalize this\nafterwards. For now, I want to clarify the many, many places I’ve found\nmyself stuck on structure-preserving maps and leave no doubt about how\nwe should interpret this moving forward. \n \n \n For starters: what we mean by “structure-preserving,” in the most\ngeneral sense, can’t really be boiled down to anything more precise. As\nmentioned, we often call such things, in the abstract, morphisms. In\nthat sense, one might imagine a structure-preserving map need only\noperate from/to the same kind of object: the very fact we “end up” in\nthe (structured) object suggests we’re upholding necessary structure.\nBut this is not generally what we mean, or is at least often not\nsufficient. That is to say: being a map with the appropriate external\n“signature” (i.e., mapping to/from the right kind of object) is alone\nnot enough, as we care about how we move between the objects\n(and whether that movement preserves some structure within them). \n \n \n Take, for instance, addition over the integers. Suppose we have the\ngroups\n $G=(\\mathbb{Z}, +)$ \nand\n $H=(\\mathbb{Z}, +)$ ,\nand a map\n $f:G\\rightarrow H$ \ndefined\n $f:x\\mapsto x+1$ .\nClearly “pushing” all integers through\n $f$ \nstill just gives you\n $\\mathbb{Z}$ ,\nand so in a very loose sense upholds “group structure” insofar as there\nis still a valid group on the other side. But the way we move\nunder\n $f$ \ndoes not preserve addition (which is really all we mean by “structure”\nin this case): for any\n $a+b=c$ \nin\n $G$ ,\nwe do not have\n $f(a)+f(b)=f(c)$ \nin\n $H$ .\nSo even though\n $f$ \nmoves from a valid group to a valid group, it does not align\nthe elements of the groups in a manner that is consistent with the\naddition operator. \n \n \n As a reminder,\n $f$ \nis unary in\n $\\mathbb{Z}$ .\nEven if our binary operation (addition) had arbitrary arity,\n $f$ \nis a still a bottleneck of sorts through which each integer individually\npasses. So applying\n $f$ \nto individual elements in the underlying set is the only way to apply\nthe transform at all. I’ve found myself occasionally thinking that\n $f$ \ncould have distinct joint behavior, similar to the operator, but this\nisn’t true and recognizing as much helps simplify things. When I look at\nthe required equality here, i.e., \n \n \n $f(a+b) = f(a) + f(b),$ \n \n \n I’ve let myself question whether this is the only constraint that\ncould mean\n $f$ \nis structure preserving. But recalling how\n $f$ \ncan only operate on single items in the underlying set, it becomes clear\nthis is really the only thing you even can say in the first\nplace. We use\n $f$ \nto map the items of one group onto another one-by-one, and the question\nis simply whether related items in the source group remain related in\nthe transformed group.\n $a$ \nmust go to\n $f(a)$ ,\n $b$ \nmust go to\n $f(b)$ ,\nand the only real question from there is whether\n $a+b$ \ngoes to the same place as\n $f(a)+f(b)$ . \n \n \n That last bit, the only salient part, has made me nervous for reasons\nI can’t easily articulate. I think it just somehow feels mysterious\nevery time: I have to unpack it and check that it makes sense, and I get\nlost along the way. Either that, or as I mentioned before, the idea that\nthere could be some other way to express structure preservation that I’m\nnot able to see bugs me; I don’t have good intuition that the form is\n“tight.” At risk of restating what has already been said, the following\nmight help: \n \n \n We’re looking for a way to relate the transform\n $f$ \nto the operation at hand. Both need to be involved for us to establish\nany such link, and it’s critical to note that we’re observing what\nhappens with the trajectories themselves under\n $f$ .\nIt’s not\n $f$ ’s\nimage, it’s not its co-domain; these are more like aggregate\nconsequences of having applied\n $f$ \nto every item. Working with\n $f$ ’s\nimage or its co-domain ignores how we got there, and would therefore say\nnothing about\n $f$ \nat all. The entirety of the setup also avoids relating the transform and\nthe operation:\n $a$ \nand\n $b$ \nare just any two items in the source set,\n $f(a)$ \nand\n $f(b)$ \nare the items we map them to, and both\n $a+b$ \nand\n $f(a)+f(b)$ \nare the results of applying our operation on those pairs. Thus far,\nwe’ve said nothing about how\n $f$ \nand\n $+$ \ninteract, we’ve merely applied them to eligible operands/inputs. Now, if\n $f$ \nis going to respect\n $+$ ,\nthe only thing that matters is that it takes a valid application of the\noperation to another valid application of the operation: that’s just\nabout all one could mean when saying\n“ $f$ \npreserves the operation.” If it took three items\n $a+b=c$ \nto three items that couldn’t be related via addition, then\n $f$ \nwould clearly not be compatible with the behavior of that operation. In\nsuch a case, I couldn’t rely on my operation as a means of relating\nitems under the movement/flow of\n $f$ .\nThe operation is like a stitching in the set-fabric between\nitems\n $(a, b,\na+b)$ ,\nand preserving it means the same stitching is present when moving the\nfibers through\n $f$ .\nThis can only mean that I produce related items\n $(f(a),\nf(b), f(a+b))$ :\nafter I’ve taken my stitches to their new locations via\n $f$ ,\nI find they’re still intact. \n \n \n That last bit I find the most helpful. Taking a concrete thing like\n $((a, b),\na+b)$ \n(the operands and their output) removes the burden of feeling like\nthere’s some nebulous structure coming along with the operation for\nwhich I need to have developed intuition. That pair is the\nstructure; it’s like an extensional view of the operation (basically its\ngraph). So every such pair that holds true in the source set is a little\nknot/stitch that locally exhibits the structure we’re trying to\npreserve, and we cannot untie it under\n $f$ ’s\nmovement. If\n $f$ \nin fact preserves the desired structure, it will move every such knot to\nanother valid knot, locally preserving our operation dynamics. Note that\n $f$ \ndoesn’t directly map knots to knots, though; it applies uniformly on an\nelement-level basis. Therefore it takes our knot \n \n \n $((a, b), a+b) \\rightarrow ((f(a), f(b)), f(a+b))$ \n \n \n In order for that right-hand tuple to be a “valid knot” under\n $+$ ,\nit must be consistent with the structure of the left-hand side, i.e.,\nhave an output element that is the sum of the two operands. So if\n $f$ \npreserves the knot, the item in the output slot,\n $f(a+b)$ ,\nmust be equal to the sum of the input items, i.e., \n \n \n $f(a+b) = f(a)+f(b),$ \n \n \n which is the usual constraint. Let’s take a look at a concrete\nexample: \n \n \n \n\n\n \n \n \n Take the red bridge along the top: this is a “local knot” that\n $+$ \nties. That is, it exemplifies the kind of structure that\n $+$ \ninduces, and as a result that which we want\n $f$ \nto preserve. But as we move down the diagram via\n $f$ ,\nwe see that it “unties” the knot, giving us transformed operands whose\nsum does not match the transformed sum. The red bridge along the bottom\nshows the knot that would be valid under our transformed inputs. It\nwould be equally reasonable to consider the kinds of inputs\n $f$ \ncould’ve “chosen” to reach the needed output. In either case,\n $f$ \ndoes not align them correctly: it doesn’t map a\n $+$ -knot\nto a\n $+$ -knot. \n \n \n For the linear transformation\n $f(x)=2x$ ,\non the other hand: \n \n \n \n\n\n \n \n \n Here we find the map does respect the\n $+$ \noperation (at least over the specific operands we’ve plotted): the knots\nremained tied through\n $f$ . \n \n \n Moving back out, recall that we say generally that, for a\ntransformation\n $f:A\\rightarrow B$ \nand\n $k$ -ary\noperations\n $\\mu_A: A^k \\rightarrow A$ \nand\n $\\mu_B: B^k \\rightarrow B$ \ndefined over\n $A$ \nand\n $B$ ,\nrespectively,\n $f$ \nis said to be structure-preserving if the following holds: \n \n \n $f(\\mu_A(x_1,\\dots,x_k)) = \\mu_B(f(x_1),\\dots,f(x_k))$ \n \n \n This is very general, and worth exploring the dynamics when\n $f$ \nis more like a binary relation. In this case, I found myself thinking of\naddition in modular arithmetic. Here we have an operation that behaves\nsimilarly over entire classes of values: rather than observing\nthe “paired” association of structure\n $(x,\\mu(x))$ \nwith\n $(y,\\mu(y))$ \nthrough\n $f$ \n(paired as in we relate one “bit of structure” with just one other), we\nnow say something like the structure\n $(x,\\mu(x))$ \ncan be associated with many occurrences\n $(z,\\mu(z)),\nz\\sim x$ ,\nunder the relation\n $\\sim$ .\nNothing is fundamentally different here save for our map being “wider”\nin a sense: it links together more structure across objects. \n \n \n Looking at a rough sketch of congruence modulo\n $k$ ,\nwe can imagine “folding up”\n $\\Bbb Z$ \nto give us\n $k$ \nnew classes of items: \n \n \n \n\n\n \n \n \n It’s over these equivalence classes that we observe consistent\nstructure under addition. That is, the relation that induces this\npartition respects the operation: we can first add values in\n $\\Bbb Z$ \nand map to their equivalence classes, or first map to equivalence\nclasses and “add” them². \n \n \n \n\n\n \n \n \n The diagram above attempts to show what’s happening as move from\nitems to their equivalence classes (going down), and the ways we add\nthose objects (moving right). Note the difference in the 2D\nrepresentation here compared to similar diagrams above:\n $x_1$ \nand\n $x_2$ \nare both meant to be\n $k$ -dimensional,\nand we’re simply laying out our objects in the grid there. Note also the\nnotion of addition used over equivalence classes is the Minkowski sum,\nshowing how the operation being respected can show up in slightly\ndifferent forms between\n $\\mu_A$ \nto\n $\\mu_B$ .\nIn any case, the high-level take away here is that our\nstructure-preserving maps can be more than just functions: they can\nassociate many target objects to a single source, and this naturally\nabides by the same principles we’ve discussed above. In this particular\ncase, we’re saying that whatever the equivalence class “allegiances” of\n $+$ ’s\noperands and output are, they will be respected: addition is consistent\nup to class assignment. Put another way,\n $+$ \nplays along with the new notion of identity induced by the equivalence\nrelation (color in the diagram). If a blue item and green item add up to\na red item (following the top bridge), then adding any blue and\ngreen items will yield a red item (following the bottom bridge). \n \n \n In total, the relation-like map explored here helps broaden my\nintuition, shedding light on how general a structure-preserving map\nreally can be. If we’ve allowed a generalization from pairwise\nassociations through functional maps to more “class-like” associations\nvia relational maps, you might naturally ask whether we can do something\nsimilar over the operation. That is, can a map respect a class\nof operations or transformations? This is pretty much exactly what we do\nwith equivariant maps, and we rely on group theory to formalize\nvalid collections of transformations through the same language of\noperations (i.e., the group action). We explore this in depth below;\nthat’s what kicked off this aside in the first place. \n \n \n One final remark to potentially keep in mind: the notion of structure\npreservation is really more of a loose hierarchy. Questions like\nwhich structure or how much of it can be naturally\naddressed by strengthening or weakening one’s requirements in a given\nsetting. What is meant by “structure-preserving” is generally very\nflexible and, within the general framework we’ve formulated here, can\naccommodate any object, transformation, or interpretation of “map.” One\nmight attempt to formulate a map between groups and find that it fails\nto be a group homomorphism, but it can perhaps still be seen as a map\npreserving structure of sets or even some other relaxed notion of\nsymmetry. One also needn’t have maps that are particularly well-defined,\ne.g., analogous to surjectivity/injectivity; even trivial,\ninformation-destroying maps can be structure-preserving over the\n“places” where they’re defined. All in all: structure preservation is\nvery fundamental in abstract algebra, and having a strong intuition for\nwhat it means is important (even if I don’t yet have that, but I sure\nhave spent a lot of time toiling). But it is ultimately incredibly\nflexible and context-dependent, applying across many different layers of\nabstraction. \n \n \n \n Emerging from the aside on structure-preserving maps, the definition\nof equivariance feels pretty familiar. An equivariant map is one that\npreserves some notion of symmetry over its source and target objects. We\ncapture this expression of symmetry as a symmetry group, and\nthe preserved operation is the group action on the objects in the domain\nand co-domain. \n \n \n In particular, we say that an equivariant map\n $f$ \nrespects the group action of a group\n $G$ \nover its\n $G$ -sets.\nThis is similar in form to that of structure-preserving maps, but with a\nstronger requirement that structure is held up across all\n $g\\in G$ \n(rather than just a single operation; instead, we have a whole family of\noperations). That is, for a group action\n $\\cdot$ \nand for all\n $g\\in G$ ,\nwe have \n \n \n $g\\cdot f(x) = f(g\\cdot x)$ \n \n \n Again, this just says that the movement through\n $f$ \ndoesn’t break the structures in place between objects, which in this\ncase is the symmetry represented by the group’s transformations. If we\nlabel the output of the group action\n $x^\\prime = g\\cdot x$ ,\nwe can think of this structure as manifesting concretely in the pair\n $(x, x^\\prime)$ .\nIf our map is structure preserving, we need to see precisely the same\nstructure on the other side of\n $f$ ,\ni.e., when mapping\n $x$ \nand\n $x^\\prime$ \nwe find the results also “meet the criteria” for being bundled up as a\ntuple. Such a pair would be\n $(f(x), f(x^\\prime))$ \nwhich must exhibit the same relationship between the first and second\nelement as we had before\n $f$ ,\ni.e.,\n $f(x^\\prime) = g\\cdot f(x)$ .\nExpanding\n $x^\\prime$ \ngives us our form above.³ \n \n \n Note that a\n $G$ -set\nis just a set acted upon by\n $G$ :\nit’s effectively a formal construct that bundles those two together (the\nset and the group action) such that we get a standalone object. This is\njust the group equivalent of the “set plus structure” we saw before,\ne.g.,\n $(\\mathbb{Z}, +)$ :\nwe need a way to bring along more than just a set. (Possibly\nconfusingly,\n $(\\mathbb{Z}, +)$ \nis itself a group. The operation being respected there is what induces\nthe group; with\n $G$ -sets\nwe’re working with a group action rather than the group’s operation,\nultimately more like a functional than a function.) \n \n \n We can see a quick example with points (or shapes) in\n $\\mathbb{R}^2$ .\nWe can apply the rotations in\n $C_2$ \n(i.e., just\n0 $^\\circ$ \nand\n180 $^\\circ$ )\nand leverage an arbitrary map\n $f$ \nbetween shapes. Here we have a square\n $\\{-1,1\\}^2$ \nand map to a triangle by projecting points with positive\n $y$ \ncoordinates onto the\n $y$ -axis,\ni.e.,\n $(1,1) \\mapsto (0, 1)$ \nand\n $(-1,1) \\mapsto\n(0, 1)$ : \n \n \n \n\n\n \n \n \n Note that the orientation of the diagram is transposed relative to\nthe form we’ve been using for the naturality square and/or above\ndiagrams. Here we see that\n $f$ \nmaps the same orientation of the square to the same orientation of the\ntriangle, but our\n180 $^\\circ$ \nrotation changes only the orientation of triangle. The map\n $f$ \ntherefore does not respect the structure of\n $C_2$ ’s\ngroup action (the “family” of operations it represents). On the other\nhand, if\n $f$ \nmaps to another shape with\n180 $^\\circ$ \nrotational symmetry, we would find the group action is respected. Take,\nfor instance, an\n $f$ \nthat simply projects the points onto the\n $x$ -axis: \n \n \n \n\n\n \n \n \n This is equivariance: a map that changes related objects such that\nthe objects on the other side remain related in the same way. And so we\nreturn to natural transformations: \n \n \n \n\n\n \n \n \n Here\n $\\mu$ \nis our structure-preserving map: where there’s “structure” in\n $F(\\mathcal{C})$ \n(which just means related object pairs and how they’re connected via\nmorphisms), that same structure analog is present on the other side,\ni.e., in\n $G(\\mathcal{C})$ . \n \n \n As seen with equivariant maps,\n $\\mu$ \nrespects structure of not only a single operation, but instead a whole\ncollection of transformations, which in this case are the morphisms in\n $\\mathcal{C}$ ,\nor\n $\\text{Hom}_{\\mathcal{C}}$ .\nTo be precise,\n $\\mu$ \nrespects how structure present in\n $\\text{Hom}_{\\mathcal{C}}$ \nshows up in\n $\\mathcal{D}$ \nthrough the functors\n $F$ \nand\n $G$ ;\nit doesn’t interact with that structure in\n $\\mathcal{C}$ \ndirectly. \n \n \n This lines up nicely with the group theory analog discussed for\nequivariant maps: we had an abstract (symmetric) group\n $G$ \nwhose transformations were first made concrete through its\ngroup action on specific\n $G$ -sets.\n $f$ \nalways maps from/to specific objects, and\n $G$ ’s\nstructure must first be “realized” on those objects (producing the\nconcrete\n $G$ -sets)\nin order for us to check how\n $f$ \nwill behave. That action of “realizing” structure is what matters here,\nsince it’s coming from the same base group: however\n $g\\cdot$ \npresented for both objects\n $X$ \nand\n $Y$ ,\nthe resulting transformations in both “universes” can be fairly compared\nas they’re based on the same fundamental transformation in\n $G$ . \n \n \n The exact same thing is happening here with our base category\n $\\mathcal{C}$ :\nit can be seen as a kind of abstract category that is realized through\nthe functors\n $F$ \nand\n $G$ \ninto the category\n $\\mathcal{D}$ .\nEverything from there on out takes place solely in\n $\\mathcal{D}$ ,\njust as we didn’t strictly need the group\n $G$ \nprovided we had\n $G$ -sets\n(i.e., objects that already contain the relevant realizations of\n $G$ )\nin the case of equivariant maps. Put another way: once we “project”\n $\\mathcal{C}$ \nonto\n $\\mathcal{D}$ \nvia functors\n $F$ \nand\n $G$ ,\nwe no longer need\n $\\mathcal{C}$ \ndirectly. We only need to know which objects/morphisms correspond to the\nsame structure across\n $F$ \nand\n $G$ ,\ni.e., which realized objects/morphisms originate from the same\nobjects/morphisms in\n $\\mathcal{C}$ . \n \n \n In total, we have a\n $\\mu$ \nthat, through its applicable components (i.e., on objects in\n $\\mathcal{C}$ ),\nrespects the structure induced on its domain/co-domain by\n $\\mathcal{C}$ .\nIn the diagram,\n $X$ \nand\n $Y$ \nare any two objects in\n $\\mathcal{C}$ \nrelated by morphism, and for all such pairs of objects together with\neach of their morphisms\n $f\\in \\text{Hom}_{\\mathcal{C}}(X, Y)$ ,\n $\\mu$ \nmust uphold that structure in\n $\\text{Im } F$ \nand\n $\\text{Im } G$ \n, i.e., all of\n $\\mathcal{C}$ ’s\nstructure. \n \n \n Whiskering \n \n Loosely speaking, whiskering is the composition of functors\nand natural transformations. Suppose we have a natural transformation\n $\\eta:F\\implies G$ \nbetween two functors\n $F, G:C\\rightarrow D$ ,\nalong with another functor\n $H:D\\rightarrow E$ . \n \n \n \n\n\n \n \n \n Whiskering\n $H$ \nand\n $\\eta$ \nyields the natural transformation\n $H\\eta: H\\circ\nF\\implies H\\circ G$ ,\nwhere\n $(H\\eta)_X = H(\\eta_X)$ : \n \n \n \n\n\n \n \n \n This is simply the natural mapping between the two available composed\nfunctor trajectories. The new map\n $H\\eta$ \nhas components (i.e.,\n $(H\\eta)_X$ )\nthat are the components of\n $\\eta$ \nmapped through\n $H$ \n(i.e.,\n $H(\\eta_X)$ ).\nMore explicitly: \n \n \n \n\n\n \n \n \n This shows\n $\\eta$ ’s\ncomponents as they’re mapped by\n $H$ ,\nrelating objects\n $H(F(X))$ \nand\n $G(F(X))$ .\n $H\\eta$ \nis a natural transformation between functors\n $H\\circ F$ \nand\n $H\\circ G$ ;\nthis is a bit more clear when isolating the full passthrough: \n \n \n \n\n\n \n \n \n \n Vertical composition \n \n Suppose we have functors\n $F, G, H: C \\rightarrow D$ \nand natural transformations\n $\\eta: F\\implies G$ \nand\n $\\epsilon: G\\implies H$ . \n \n \n \n\n\n \n \n \n We can compose the transformations\n $\\eta$ \nand\n $\\epsilon$ \nto produce the map\n $\\eta\\circ\\epsilon: F\\implies H$ .\nThat is, we get a single map between\n $F$ \nand\n $H$ \nby gluing together intermediate natural transformations component-wise,\ni.e., \n \n \n $(\\epsilon\\circ\\eta)_X = \\epsilon_X\\circ\\eta_X$ \n \n \n \n\n\n \n \n \n Here we depict vertical composition as “stacking” functors with the\nsame source and target categories, compacting them to produce direct\nmaps between indirectly connected functors. \n \n \n \n Horizontal composition \n \n Suppose we have functors\n $F, G: C \\rightarrow D$ \nand\n $J, K: D \\rightarrow E$ ,\nwith natural transformations\n $\\eta: F\\implies G$ \nand\n $\\epsilon: J\\implies K$ . \n \n \n \n\n\n \n \n \n The horizontally composed natural transformation\n $\\epsilon * \\eta: J\\circ\nF\\implies K\\circ G$ \nis defined component-wise: \n \n \n $(\\epsilon * \\eta)_X = \\epsilon_{G(X)}\\circ J(\\eta_X) = K(\\eta_X)\\circ \\epsilon_{F(X)}$ \n \n \n \n\n\n \n \n \n Notice how this is (loosely) something like generalized whiskering:\nrather than a single natural transformation, we connect composed\nfunctors through multiple natural transformations across different\ncategories. The same principle applies: the composed map\n $\\epsilon \\eta$ \nlets you “stop off” and move through\n $\\epsilon$ \nor\n $\\eta$ . \n \n \n \n\n\n \n \n \n On the left, we see how\n $\\epsilon_{F(X)}$ \nconnects the same object,\n $F(X)$ ,\nacross both functors\n $J$ \nand\n $K$ ;\nthis is our usual intuition for natural transformation components as\nmorphisms. Once we’re at\n $K$ ’s\nobject for\n $F(X)$ ,\nwe can take\n $K$ ’s\nversion of\n $\\eta_X$ \nto move from\n $F$ \nto\n $G$ .\nThe right side takes an alternative route: first moving from\n $F(X)$ \nto\n $G(X)$ \nvia\n $J$ ’s\nversion of\n $\\eta_X$ ,\nthen taking\n $\\epsilon_{G(X)}$ \nto travel to\n $K$ . \n \n \n Either way, it’s worth recalling that natural transformations are\ncomposed of morphisms between associated objects under the source and\ntarget functors. For an object\n $X$ ,\nthe morphism components link\n $F(X)$ \nand\n $G(X)$ .\nIn the case of horizontal composition, we’re defining a natural\ntransformation component-wise that links objects\n $(J\\circ F)(X)$ \nand\n $(K\\circ G)(X)$ \nin the usual sense, connecting functors\n $J\\circ F$ \nand\n $K\\circ G$ .\nWe simply recognize there are multiple ways to establish this connection\nprovided we have\n $\\eta$ \nand\n $\\epsilon$ :\nmove by\n $J(\\eta)$ \nthen\n $\\epsilon$ ,\nor\n $\\epsilon$ \nthen\n $K(\\eta)$ .\nUnder the hood, these movements should be exactly the same, and\nconstitute the morphisms in our new natural transformation. \n \n \n Informal remark: this is clearly a bit more involved than\nvertical composition. Vertical movement (specifically moving down) in\nour diagrams represents movement between functors along a natural\ntransformation. Vertical composition, then, in name alone, sounds more\nnatural…and it is. There we simply stack transformations and objects\npass straight down, the beginning and end of the route is easy to\nconnect. Horizontal composition, however, requires movement across\nmultiple categories, with routes that move both right and down. Tracking\nobjects is therefore less trivial, with multiple possible paths. A new,\nabstracted natural transformation is possible to define all the\nsame. \n \n \n Summary: \n \n \n \n Whiskering: maps between functors facilitates maps\nbetween composed functors; stop off to take the defined natural\ntransformation, then complete the journey. \n \n \n Vertical composition: indirectly connected functors\n(over the same categories) can be connected directly \n \n \n Horizontal composition: generalized whiskering\n(roughly speaking); maps between functors can be chained to map between\ncomposed functors. \n \n \n \n \n \n Monad \n \n As the (apparently) infamous definition goes: a\nmonad is a monoid in the category of endofunctors⁴. We can try to set this up\nstep-by-step: \n \n \n \n An endofunctor of a category\n $C$ \nis a functor from the category to itself. \n \n \n The category of endofunctors* treats those functors as its\nobjects, and there are morphisms, as in any other category, between\nthem⁵. The category of functors between\ntwo categories\n $A$ \nand\n $B$ \nis denoted\n $[A, B]$ ;\ntherefore the category of\n $C$ ’s\nendofunctors is written\n $[C, C]$ . \n \n \nTaking that category, we recall what makes a monoid (at least, as we’ve\nstudied it in\n $\\text{Set}$ ):\nassociativity and identity. Functor composition and the identity functor\nsatisfy these requirements:\n \n \n The identity functor\n $1_C$ \nin the category\n $C$ \nmaps each object/morphism to itself (and is therefore an endofunctor).\nFor an endofunctor\n $F$ \nin\n $[C,\nC]$ ,\n $F = F \\circ 1_C = 1_C \\circ F$ .\nTherefore\n $1_C$ \nsatisfies our requirements for identity on objects in\n $[C, C]$ . \n \n \n By virtue of the fact that functors serve as morphisms in categories\nof categories, composition of functors is associative as is generally\nrequired by morphisms. \n \n \n We say that the identity functor and composition induce a monoidal\nstructure on\n $[C, C]$ .\nNote that we generally write monoidal categories as the triple\n $(C, \\otimes, I)$ \n(where\n $\\otimes$ \nis a bifunctor), containing monoid objects\n $(T, \\mu, \\eta)$ \nwhere \n \n \n $\\mu: T \\otimes T \\rightarrow T$ \nis the multiplication morphism \n \n \n $\\eta: I \\rightarrow T$ \nis the unit morphism \n \n \n subject to some associative and identity constraints (loosely). So\nour category of endofunctors over\n $C$ \ncan be written explicitly as the monoidal category\n $([C, C], \\circ, I_C)$ ,\nand a monad on C is a monoid object\n $(T,\n\\mu, \\eta)$ \nin that category. \n \n \n The morphisms between functors are generally referred to as\nnatural transformations. These transformations preserve the\ninternal structure (i.e., composition of morphisms) in the underlying\ncategories. \n \n \n \n Let’s look at the monoid object a bit more closely: \n \n \n $(C, \\otimes, I)$ \nis a monoidal category, i.e., a category\n $C$ \nendowed with an associative bifunctor\n $\\otimes: C\\times C\\rightarrow C$ .\n $C\\times C$ \nis a product category, with objects as pairs of\nobjects from\n $C$ \nand morphisms as pairs of morphisms between individual objects\nin the object pairs. The monoidal part of this just follows from our\nset-based definition, i.e., having an associative binary operation\n(bifunctor) defined over elements (objects, morphisms) of the set\n(category). \n \n \n Our monoidal category of interest is\n $([C, C], \\circ, I_C)$ ,\ni.e., the endofunctors on\n $C$ \ntogether with (associative) functor composition. An object\n $T$ \nin this category is an endofunctor\n $T: C\\rightarrow C$ ,\nand we have some defined morphisms for that object (which are actually\nnatural transformations; recall that natural transformations are\nmorphisms in categories of functors): \n \n \n \n Unit\n $\\eta: 1_C\\rightarrow T$ ;\ntells us how every object/morphism in\n $C$ \nis involved or transformed by\n $T$ .\nIt serves as an “injection” of values into the monad. \n \n \n Multiplication\n $\\mu: T^2\\rightarrow T$ ;\na reduction/flatten/join-like transformation, telling us how to flatten\nout doubly nested monadic structures back to “regular” monad\nobjects. \n \n \n \n \n\n\n \n \n \n \n\n\n \n \n \n Above we visualize these as morphisms between functor objects in the\ncategory of endofunctors on\n $C$ .\nThese are the canonical “movements” between functors that need to be\ndefined, but we can whisker in\n $T$ \nto move up and down levels of composition: \n \n \n \n\n\n \n \n \n Nested levels of structure can always be “unwrapped” into their\ncomponent pieces, until either\n $\\mu$ \nor\n $\\eta$ \ncan be applied to one of the items. The coherence\nconditions relate the different possible whiskering orders,\nrequiring that composed maps are equal: \n \n \n \n For natural transformations\n $T^3\\rightarrow T$ : \n $\\mu\\circ T\\mu = \\mu\\circ \\mu T$ \n \n \n\n\n \n \n This condition is analogous to associativity for monoids.\nWe’re starting from\n $T\\circ T\\circ T$ ,\nand can write parentheses: \n \n \n $T\\circ (T\\circ T) \\rightarrow T(\\mu)$ \n \n \n $(T\\circ T)\\circ T \\rightarrow (\\mu) T$ \n \n \n Where we draw parentheses is where we’re allowed to apply\n $\\mu$ \n(this is not suggesting the left and right side are equivalent;\njust a depiction of how the forms fall out when it comes to\nassociativity). This is initially a bit confusing because\n $\\mu T$ \nand\n $T\\mu$ \njust look like a commutative swap, when in fact they are two distinct\nmaps resulting from application of\n $\\mu$ \nat different places. \n Clarity: this is something like doing \n $(a+a)+a = a+(a+a) \\implies (2a)+a = a+(2a)$ \n When we look at the right side, the statement appears commutative in\nnature: we’re just swapping operands. But this is downstream from an\nassociative statement, and the simplification on its own is a\nbit misleading. \n \n \n For natural transformations\n $T\\rightarrow T$ : \n $\\mu\\circ T\\eta = \\mu\\circ \\eta T = 1_T$ \n where\n $1_T$ \nis the identity natural transformation on\n $T$ . \n \n \n\n\n \n \n This condition is analogous to identity for monoids. \n \n \n \n So just like we wrote the monoid\n $(M, \\cdot, e)$ \nfor sets, we have the categorical equivalent in\n $(T, \\mu, \\eta)$ .\n $T$ \nis the underlying object (an endofunctor),\n $\\mu$ \nis a binary operation (natural transformation on\n $T^2$ ),\nand\n $\\eta$ \nis an identity element. \n \n \n \nGeneralizing arity\n \n \n Small confusing point for me here: the analogy of\n $\\mu$ \nto a binary operation, in the usual sense, was unclear to me. We have\n $\\mu: T^2\\rightarrow T$ ,\nbut\n $T^2$ \nis still just a functor, i.e.,\n $T\\circ T$ ,\nso it feels just like a unary operation defined over any other functor,\ne.g., some natural transformation\n $f:\nT\\rightarrow T$ .\nThat is,\n $\\mu$ \nisn’t defined over pairs of objects, nor does it map explicitly\nfrom two functors; composition can at least be seen to operate on two\nfunctors\n $\\circ(F_A, F_B) \\mapsto F_C$ .\n $\\mu$ ,\nhowever, doesn’t operate explicitly on two items. \n \n \n A quick, perhaps obvious thing: maps can always be seen as operating\non one object. Even multivariate functions can just be seen to\nmap from a single\n $n$ -tuple,\nrather than being a construct capable of accepting\n $n$ \nseparate objects. Of course, this changes nothing about what any map of\nthis kind is capable of, but it’s a simplifying perspective that helps\nopen new possible interpretation for the term “arity.” \n \n \n We might now say the following: a map’s arity isn’t simply the number\nof arguments it can accept. Rather, it’s a quantifiable property of the\ndomain of objects over which it’s defined. Addition over the reals,\n $+:\\Bbb\nR^2\\rightarrow\\Bbb R$ ,\nremains the same 2-ary operation in the usual sense: it maps from the\nspace of 2-tuples, explicitly pairing up two items. Implicitly here\n $\\Bbb R^2$ \nrefers to the cartesian product\n $\\Bbb R\\times\\Bbb R$ ,\nand the arity of the resulting map is aligned with the dimensionality of\nthe product space domain. But what if we had an operation other than\n $\\times$ ,\ne.g.,\n $\\otimes$ ? \n \n \n This is precisely the question that leads us to analogizing\n $\\mu$ \nwith a (typical) binary operation. In monoidal categories, we saw above\nthat\n $\\otimes:\nC\\times C\\rightarrow C$ \nis a bifunctor. (Note how\n $\\times$ \nis used in the definition of\n $\\otimes$ ;\nthis whole “arity generalization” only really applies inside monoidal\ncontexts, so we have to get there first with existing machinery.) This\ncan be seen to operate on pairs of objects (and morphisms) in\n $C$ ,\nmapping them to other objects (and morphisms) in\n $C$ .\n $O\\otimes O$ \nis simply us applying\n $\\otimes$ \nto the same object\n $O$ ,\nand this is the new form of the domain for the binary operation-like\nmap\n $\\mu$ .\nWe’re simply generalizing the way we build the domain from two objects\nby letting our bifunctor\n $\\otimes$ \n“put things together” instead. \n \n \n In total, for our multiplication morphism\n $\\mu: O\\otimes O\\rightarrow O$ \ndefined for monoid objects in any monoidal category, we have a map that\ndoesn’t strictly look like a binary operation in the usual sense (hence\nthis messy aside). It is nevertheless constructed around a domain that\nis built from two objects with\n $\\otimes$ ,\nand that’s good enough for a general analogy to binary operations. It’s\nworth noting that for monads, in particular,\n $\\otimes$ \nis fixed as functor composition\n $\\circ$ ,\nand\n $\\mu$ ’s\ndomain\n $T^2 = T\\circ T$ \nis still just a functor.\n $\\mu$ ’s\n“arity” is therefore more akin to levels of nested composition\nrather than number of dimensions or arguments: it flattens. \n \n \n \n \nWhiskering\n $\\mu$ \nand\n $\\eta$ \n(detailed)\n \n \n \n\n\n \n \n \n \n\n\n \n \n \n \n When grappling with the intuition behind the coherence conditions, or\nreally just the general positioning of monoid objects, I find the\nfollowing helpful: \n \n \n Monoidal categories are associative safe spaces. They allow one to\nrewrite the rules of association, and with that new mechanism\nensure things behave in a familiar, expected way. Very simply put: you\nstart with a category\n $C$ ,\na collection of objects and relations between them. You then define the\nnotion of association\n $\\otimes$ ,\na bifunctor, which combines pairs of objects in\n $C$ \ninto new\n $C$ \nobjects. You’re now in a world endowed with a new “product substrate”\nfor defining “binary operations:” you take an object\n $A$ \nfrom that world, and a binary operation on the object maps from\n $A\\otimes A$ \nto\n $A$ . \n \n \n Recall that typically, i.e., in\n $\\text{Set}$ ,\nbinary operations are defined on specific sets, having structure\n $f: S\\times S\\rightarrow S$ .\nThe movement to the categorical generalization… \n \n \n \n loosens the kinds of objects we use, relaxing to categories other\nthan\n $\\text{Set}$ ,\nand \n \n \n generalizes the means of association, relaxing from\n $\\times$ \nto\n $\\otimes$ ,\nallowing for more than just simple object pairing. \n \n \n \n In short: monoidal categories are universes with their own notion of\n“productization” (under the monoidal product\n $\\otimes$ ).\nThis establishes the foundation upon which binary operations on\nobjects in that universe are built. Put another way: monoidal categories\nredefine the notion of association (what a pair of objects\nis), and binary operations define means of combination\nunder that paradigm. The former describes how we put objects together\n(ambient productization;\n $a,b\\mapsto (a,b)$ ,\n $f,g\\mapsto\ng\\circ f$ ,\nwhatever), and the latter is an action of combination hip to that notion\nof togetherness (internal combination;\n $(a,b)\\mapsto a\\cdot b$ ,\netc). (It’s not so easy to give a simple form for operating on\ncombined map-like objects such as\n $g\\circ f$ ;\nsuch an operation would typically be defined on an element-wise basis,\ni.e., if\n $(g\\circ f)\\mapsto h$ ,\nwe’d talk about how we map between items\n $g(f(x))$ \nand\n $h(x)$ \nrather than something more abstract.) \n \n \n Monads are what we get when we want our objects to “understand” a\nwhole type of (other) object (i.e., a category): no choice of\nobject\n $A$ \nin\n $A\\otimes A$ \nis partially defined. In monoids on\n $\\text{Set}$ ,\nwe must pick a particular set\n $S$ \nover which to define the binary operation (as discussed above),\nfundamentally allowing the exclusion of some objects that could be\nincluded in sets. That is, the set\n $S$ \ndoesn’t strictly contain a whole class of object (on purpose):\nwe have the flexibility to pick arbitrary sets over which the operation\nis defined. If we instead wanted\n $S$ \nto be representative of an entire type, we’d be effectively asking for\n $S$ \nto become a category itself (i.e., to fundamentally represent a certain\nkind of object if we’re going to define operations over it). So we let\n $A$ \nnot be an arbitrary object in the desired type’s category, but we “lock\nthat category in” by operating instead on endofunctors of that category.\nIn this case, our objects are maps defined over the entire type:\n $A\\otimes A$ \ninvolves a specific functor, but each functor is defined over the\nentire type/category of interest. \n \n \n Putting this all together: monads leverage the machinery of monoids\nto define a universe where “products” are sequences of composition; this\nis the fundamental thing monoidal product’s\n $\\otimes$ \nallow us to do. That composition takes place over not just any old\nfunctions defined between two objects, but instead on a map\n $T$ \ndefined over an entire class of object (i.e., a functor). Such a map can\nbe seen as “cladding” objects of the underling type/category in\nadditional structure: it “reskins” the object class by coating every\nobject and morphism individually. \n \n \n A movement under that map represents a movement into a new structured\nworld. The type of the underlying objects remains the same (it’s an\nendofunctor; we don’t leave the category), but the nature of the objects\non the other side is different (and different in a consistent\nway, having the newly added structure). Note that\n $T$ \nindiscriminately adds structure to all objects in the category: it can\nbe seen to map object\n $X$ \nto\n $T(X)$ ,\nand since\n $T(X)$ \nis already in the category, it’s also acted upon by\n $T$ \nin that single application, mapping to\n $T(T(X))$ ,\nand so on. Point is,\n $T$ \nadds structure indiscriminately, so multiple composition steps\ncan add too much structure: “already structured” objects\nreceive an extra layer of structure.\n $\\mu$ \nis a mediator or regulator to address this “bunching\nup,” flattening doubled up structure back to just a single layer. \n \n \n Monads are structured computation environments.\n $\\eta$ \nbrings you into it (maps “plain” values into the structured landscape),\nthe functor\n $T$ \nkeeps you in it (brings all objects and morphisms along for the ride\nevery time), and\n $\\mu$ \nflattens redundant build up (for those objects and morphisms that didn’t\nneed any more of the functor’s effects). In aggregate, you have the\nnecessary pieces for facilitating smooth, structured sequences of\ncomputation in\n $T$ ’s\nworld. \n \n \n Example: power set monad \n \n On\n $\\text{Set}$ ,\nthe power set monad is a monad\n $\\mathcal{P} = (T, \\mu, \\eta)$ ,\nwhere: \n \n \n \n $T(A)$ \nis the power set of a set\n $A$ ,\nand\n $T(f)$ \nis a function of direct images between power sets under a function\n $f:A\\rightarrow B$ \n \n \n $\\eta$ \nconsists of component morphisms\n $\\eta_A:A\\rightarrow T(A)$ ,\ntaking every\n $a\\in A$ \nto the singleton\n $\\{a\\}$ \n \n \n $\\mu$ \nconsists of component morphisms\n $\\mu_A:T(T(A))\\rightarrow T(A)$ ,\ntaking a set of sets to its union. \n \n \n \n For a concrete\n $A=\\{1,2\\}$ , \n \n \n \n $T(A) = \\{\\{\\},\\{1\\},\\{2\\},\\{1,2\\}\\}$ \n \n \n $T(T(A)) = \\{\\{\\},\\{\\{\\}\\},\\{\\{1\\}\\},\\{\\{2\\}\\},\\{\\{1,2\\}\\},\\dots,\\{\\{\\},\\{1\\},\\{2\\},\\{1,2\\}\\} \\}$ \n \n \n $\\eta_A(1) = \\{1\\}$ ,\n $\\eta_A(2) = \\{2\\}$ \n \n \n $\\mu_A(\\{\\{1\\},\\{2\\}\\}) = \\{1,2\\}, \\dots$ \n \n \n \n \n Common examples from functional programming \n \n en.wikipedia.org/wiki/Free_monoid \n en.wikipedia.org/wiki/Syntactic_monoid \n en.wikipedia.org/wiki/Monoid \n en.wikipedia.org/wiki/Monoid_(category_theory) \n en.wikipedia.org/wiki/Monad_(category_theory) \n en.wikipedia.org/wiki/Natural_transformation \n en.wikipedia.org/wiki/Monoidal_category \n \n \n \n \n \n \n \nFelt the need to spell out more of this here because I find myself\nworried about the possible flexibility of\n $f$ .\nAs in, it’s not exactly clear why things should automatically fall into\nplace to ensure it’s the only option. But the fact you can’t really\n“fill in” all the gaps (outside of the options that\n $g$ \nis defined over) in arbitrary ways helps me accept this; you must fill\nthem in a way that ensures\n $f$ \nrespects the category’s morphisms.\n \n↩︎ \n \nThis particular perspective lacks a little precision since our map is\nformulated more as a set-valued function\n $[\\cdot]$ \nthat takes a representative value to its equivalence class, but it’s\nnevertheless induced by an underlying relation.\n \n↩︎ \n \nGoing to the trouble to formulate things this way is really meant to\nhelp me avoid falling into my typical trap: getting caught on the\nseeming open-endedness of the structure-preserving commutative form.\nWhen I “mask away”\n $g\\cdot x$ \nbehind\n $x^\\prime$ \nas simply our symmetry’s “output” when acting on\n $x$ ,\nit becomes much easier to just follow\n $f$ ’s\napplication on both\n $x$ \nand\n $x^\\prime$ .\nThose two have a clear relationship before\n $f$ ,\nand that same relationship needs to be there after\n $f$ \nbetween whatever new items\n $f$ \nproduces. But when I leave\n $x\\prime$ \nas\n $g\\cdot x$ ,\nI find I get distracted by our\n $f(g\\cdot x)$ \nstep and want to deconstruct it, reading more into it than is actually\nthere. Really,\n $g\\cdot x$ \nis a new thing, and we write in terms that communicate precisely how\nit’s related to\n $x$ \n(which is of course clean and elegant)…it’s just that doing those two\ntogether often leads me astray.\n \n↩︎ \n \nHere I’m actually seeking a setup without some of the usual functional\nprogramming weight slapped on. That’s not to say it’s not helpful, but I\nwant to feel the more abstract fundamentals hit home first (or at least\nnow given all of the loose context I’ve absorbed with the 10 different\ntutorials that start with `Maybe`).\n \n↩︎ \n \nRight away this is a little confusing: morphisms in a category of\nfunctors are map-like things being applied to map-like things. Functors\nare the maps applying on the entire category (mapping all\nobjects/morphisms in\n $C$ \nonto themselves, observing totality but not necessarily surjectivity),\nand the morphisms between them\n \n↩︎ \n \n \n', 'toc': ' \n General\nproperties \n Algebraic structures\n \n Structure\ntypes \n Free monoid \n \n Free object\n \n Example: free monoids \n \n Natural transformations\n \n Whiskering \n Vertical composition \n Horizontal composition \n \n Monad\n \n Example: power set monad \n Common examples\nfrom functional programming \n \n ', 'created': '2020-05-25', 'modified': '2025-10-11 16:24', 'summary': '', 'abstract': '', 'series': ''}, 'src': {'id': 10974, 'path': '/home/smgr/Documents/notes/Abstract_algebra.md', 'rpath': 'Abstract_algebra.md', 'name': 'Abstract_algebra.md', 'title': 'Abstract algebra', 'link': 'Abstract_algebra', 'ftype': 'md', 'ctime': '1760225077.31', 'mtime': '1760225053.0', 'atime': '1760225053.0', 'type': 'wiki', 'yaml_text': 'title: Abstract algebra\ncreated: 2020-05-25\nmodified: 2025-10-11 16:24\ndatelink: [[2020-05-25]]\ntype: wiki\nsummary:', 'name_fmt': 'Abstract_algebra.md+src', 'format': 'src', 'content': '', 'toc': '', 'created': '2020-05-25', 'modified': '2025-10-11 16:24', 'summary': '', 'abstract': '', 'series': ''}}}

blocks mode

Abstract_algebra.md@listitem@12:1-34:1

Abstract_algebra.md@para@36:1-37:82

Abstract_algebra.md@figure@39:1-41:52

Abstract_algebra.md@para@43:1-43:16

Abstract_algebra.md@listitem@45:1-46:1

Abstract_algebra.md@listitem@46:1-47:1

Abstract_algebra.md@listitem@47:1-52:1

Abstract_algebra.md@listitem@55:1-56:1

Abstract_algebra.md@listitem@56:1-64:1

Abstract_algebra.md@listitem@64:1-65:1

Abstract_algebra.md@listitem@65:1-67:1

Abstract_algebra.md@listitem@67:1-68:1

Abstract_algebra.md@listitem@68:1-69:1

Abstract_algebra.md@listitem@71:3-75:1

Abstract_algebra.md@listitem@75:3-80:1

Abstract_algebra.md@listitem@69:1-80:1

Abstract_algebra.md@listitem@80:1-81:1

Abstract_algebra.md@para@84:1-88:72

Abstract_algebra.md@listitem@90:1-97:1

Abstract_algebra.md@listitem@97:1-105:1

Abstract_algebra.md@listitem@105:1-107:1

Abstract_algebra.md@para@109:1-113:74

Abstract_algebra.md@para@115:1-126:66

Abstract_algebra.md@para@128:1-134:6

Abstract_algebra.md@para@137:1-139:51

Abstract_algebra.md@listitem@141:1-143:1

Abstract_algebra.md@listitem@143:1-144:1

Abstract_algebra.md@listitem@144:1-147:1

Abstract_algebra.md@listitem@147:1-156:1

Abstract_algebra.md@para@157:1-158:48

Abstract_algebra.md@para@160:1-160:39

Abstract_algebra.md@para@162:1-166:23

Abstract_algebra.md@para@168:1-168:43

Abstract_algebra.md@para@170:1-172:71

Abstract_algebra.md@para@174:1-184:55

Abstract_algebra.md@para@187:1-191:20

Abstract_algebra.md@para@193:1-196:27

Abstract_algebra.md@para@198:1-209:60

Abstract_algebra.md@figure@211:1-213:41

Abstract_algebra.md@para@215:1-227:76

Abstract_algebra.md@para@229:1-231:63

Abstract_algebra.md@figure@233:1-236:39

Abstract_algebra.md@para@238:1-247:50

Abstract_algebra.md@para@249:1-252:54

Abstract_algebra.md@figure@254:1-258:39

Abstract_algebra.md@para@260:1-261:49

Abstract_algebra.md@listitem@263:1-271:1

Abstract_algebra.md@listitem@278:3-281:1

Abstract_algebra.md@listitem@281:3-283:1

Abstract_algebra.md@listitem@271:1-283:1

Abstract_algebra.md@para@284:1-286:40

Abstract_algebra.md@figure@288:1-290:42

Abstract_algebra.md@para@292:1-296:51

Abstract_algebra.md@para@301:1-325:25

Abstract_algebra.md@para@327:1-337:72

Abstract_algebra.md@para@339:1-353:28

Abstract_algebra.md@para@355:1-369:41

Abstract_algebra.md@para@371:1-402:62

Abstract_algebra.md@para@404:1-417:50

Abstract_algebra.md@para@419:1-421:14

Abstract_algebra.md@figure@423:1-425:40

Abstract_algebra.md@para@427:1-430:74

Abstract_algebra.md@listitem@432:1-440:1

Abstract_algebra.md@listitem@440:1-446:1

Abstract_algebra.md@para@447:1-453:37

Abstract_algebra.md@para@455:1-459:6

Abstract_algebra.md@para@461:1-471:21

Abstract_algebra.md@para@473:1-502:70

Abstract_algebra.md@para@506:1-511:59

Abstract_algebra.md@figure@513:1-515:40

Abstract_algebra.md@para@517:1-523:25

Abstract_algebra.md@para@525:1-528:70

Abstract_algebra.md@figure@530:1-533:40

Abstract_algebra.md@para@535:1-539:30

Abstract_algebra.md@para@541:1-547:44

Abstract_algebra.md@listitem@549:1-550:1

Abstract_algebra.md@listitem@550:1-552:1

Abstract_algebra.md@listitem@552:1-554:1

Abstract_algebra.md@listitem@554:1-557:1

Abstract_algebra.md@para@558:1-570:13

Abstract_algebra.md@para@575:1-584:51

Abstract_algebra.md@para@586:1-595:53

Abstract_algebra.md@para@597:1-606:39

Abstract_algebra.md@para@608:1-614:66

Abstract_algebra.md@para@616:1-616:26

Abstract_algebra.md@para@618:1-625:45

Abstract_algebra.md@para@627:1-633:37

Abstract_algebra.md@para@635:1-657:27

Abstract_algebra.md@para@659:1-669:55

Abstract_algebra.md@para@671:1-671:53

Abstract_algebra.md@para@673:1-677:6

Abstract_algebra.md@para@679:1-679:24

Abstract_algebra.md@para@681:1-681:72

Abstract_algebra.md@figure@683:1-689:37

Abstract_algebra.md@para@691:1-699:12

Abstract_algebra.md@para@701:1-701:60

Abstract_algebra.md@figure@703:1-705:37

Abstract_algebra.md@para@707:1-708:71

Abstract_algebra.md@para@710:1-713:51

Abstract_algebra.md@para@715:1-715:57

Abstract_algebra.md@para@717:1-726:16

Abstract_algebra.md@para@728:1-729:46

Abstract_algebra.md@figure@731:1-731:67

Abstract_algebra.md@para@733:1-736:65

Abstract_algebra.md@figure@738:1-740:44

Abstract_algebra.md@para@742:1-758:59

Abstract_algebra.md@para@760:1-769:35

Abstract_algebra.md@para@771:1-787:23

Abstract_algebra.md@para@790:1-794:61

Abstract_algebra.md@para@796:1-801:22

Abstract_algebra.md@para@803:1-803:30

Abstract_algebra.md@para@805:1-815:23

Abstract_algebra.md@para@817:1-824:52

Abstract_algebra.md@para@826:1-831:9

Abstract_algebra.md@figure@833:1-833:82

Abstract_algebra.md@para@835:1-843:26

Abstract_algebra.md@figure@845:1-845:78

Abstract_algebra.md@para@847:1-849:17

Abstract_algebra.md@figure@851:1-853:46

Abstract_algebra.md@para@855:1-858:27

Abstract_algebra.md@para@860:1-865:47

Abstract_algebra.md@para@867:1-875:73

Abstract_algebra.md@para@877:1-887:15

Abstract_algebra.md@para@889:1-895:36

Abstract_algebra.md@para@898:1-900:79

Abstract_algebra.md@figure@902:1-902:86

Abstract_algebra.md@para@904:1-905:52

Abstract_algebra.md@figure@907:1-907:68

Abstract_algebra.md@para@909:1-912:12

Abstract_algebra.md@figure@914:1-916:45

Abstract_algebra.md@para@918:1-921:13

Abstract_algebra.md@figure@923:1-925:45

Abstract_algebra.md@para@928:1-929:49

Abstract_algebra.md@figure@931:1-933:45

Abstract_algebra.md@para@935:1-938:6

Abstract_algebra.md@para@940:1-940:50

Abstract_algebra.md@figure@942:1-944:45

Abstract_algebra.md@para@946:1-948:31

Abstract_algebra.md@para@951:1-952:49

Abstract_algebra.md@figure@954:1-956:47

Abstract_algebra.md@para@958:1-959:47

Abstract_algebra.md@para@961:1-961:90

Abstract_algebra.md@figure@963:1-965:47

Abstract_algebra.md@para@967:1-971:22

Abstract_algebra.md@figure@973:1-975:47

Abstract_algebra.md@para@977:1-982:69

Abstract_algebra.md@para@984:1-993:45

Abstract_algebra.md@para@995:1-1003:75

Abstract_algebra.md@para@1005:1-1005:13

Abstract_algebra.md@listitem@1007:1-1010:1

Abstract_algebra.md@listitem@1010:1-1012:1

Abstract_algebra.md@listitem@1012:1-1014:1

Abstract_algebra.md@para@1016:1-1017:70

Abstract_algebra.md@listitem@1019:1-1020:1

Abstract_algebra.md@listitem@1020:1-1024:1

Abstract_algebra.md@listitem@1027:3-1031:1

Abstract_algebra.md@listitem@1031:3-1034:1

Abstract_algebra.md@listitem@1040:3-1041:1

Abstract_algebra.md@listitem@1041:3-1042:1

Abstract_algebra.md@listitem@1024:1-1047:1

Abstract_algebra.md@listitem@1047:1-1050:1

Abstract_algebra.md@para@1051:1-1051:52

Abstract_algebra.md@para@1053:1-1059:34

Abstract_algebra.md@para@1061:1-1065:78

Abstract_algebra.md@listitem@1067:1-1070:1

Abstract_algebra.md@listitem@1070:1-1073:1

Abstract_algebra.md@figure@1074:1-1076:37

Abstract_algebra.md@figure@1078:1-1082:44

Abstract_algebra.md@para@1084:1-1087:16

Abstract_algebra.md@figure@1089:1-1089:82

Abstract_algebra.md@para@1091:1-1094:40

Abstract_algebra.md@figure@1100:3-1102:42

Abstract_algebra.md@listitem@1107:3-1108:1

Abstract_algebra.md@listitem@1108:3-1109:1

Abstract_algebra.md@listitem@1096:1-1124:1

Abstract_algebra.md@figure@1130:3-1132:42

Abstract_algebra.md@listitem@1124:1-1135:1

Abstract_algebra.md@para@1136:1-1139:35

Abstract_algebra.md@para@1144:1-1151:33

Abstract_algebra.md@para@1153:1-1158:37

Abstract_algebra.md@para@1160:1-1168:11

Abstract_algebra.md@para@1170:1-1180:51

Abstract_algebra.md@para@1182:1-1191:12

Abstract_algebra.md@figure@1197:1-1197:71

Abstract_algebra.md@figure@1199:1-1199:73

Abstract_algebra.md@para@1203:1-1204:78

Abstract_algebra.md@para@1206:1-1213:62

Abstract_algebra.md@para@1215:1-1217:34

Abstract_algebra.md@listitem@1219:1-1221:1

Abstract_algebra.md@listitem@1221:1-1223:1

Abstract_algebra.md@para@1224:1-1236:74

Abstract_algebra.md@para@1238:1-1252:53

Abstract_algebra.md@para@1254:1-1261:14

Abstract_algebra.md@para@1263:1-1274:39

Abstract_algebra.md@para@1276:1-1282:7

Abstract_algebra.md@para@1285:1-1286:7

Abstract_algebra.md@listitem@1288:1-1290:1

Abstract_algebra.md@listitem@1290:1-1292:1

Abstract_algebra.md@listitem@1292:1-1294:1

Abstract_algebra.md@para@1295:1-1295:28

Abstract_algebra.md@listitem@1297:1-1298:1

Abstract_algebra.md@listitem@1298:1-1299:1

Abstract_algebra.md@listitem@1299:1-1300:1

Abstract_algebra.md@listitem@1300:1-1301:1

Abstract_algebra.md@note@1313:7-1319:15

Abstract_algebra.md@note@1321:7-1325:30

Abstract_algebra.md@note@1327:7-1331:35

Abstract_algebra.md@note@1333:7-1345:27

Abstract_algebra.md@note@1347:7-1350:39