Sets and functions are two of the most basic ideas in mathematics, and we'll need to know what they are to discuss some things about categories rigorously. Normally you'd learn about sets and functions way before encountering category theory, but in the spirit of assuming as little math as possible, we should write this post. It's also worth addressing a few matters for their conceptual relevance.

Sets are imaginary bags we put things into. For example, you can take a dog, a cat, and a shoe, put them in an imaginary bag, and now you have a set consisting of . The members of the set—dog, cat, and shoe—are called the elements of the set.

A subtle but important aspect of a set is that the imaginary bag has to be defined by a rule. This rule can be pretty much anything, like "put into a bag everything I'm pointing at," but it does have to be a rule. Typically, sets can fit pretty much anything in, and so you can often just say "here is my set" rather than having to be explicit about the rule. We'll get back to why the rule matters at the end. For now, sets are imaginary bags that you can put pretty much anything into.

What are we putting into these bags, exactly? Pretty much anything, yes—but clearly we aren't actually putting dogs, cats, and shoes into bags. Mathematically, what are these things?

That is to say, what's the difference between the set and the set ?

Well, what's the difference between the equations and ? Nothing but the name of the variable—which does not matter at all. We could call anything. We could represent it with a thirty-foot tall watercolor of a fire truck.

So what's the difference between the set {dog} and the set {cat}? Only the name of the element—which does not matter at all. Just like we can take posets like and and represent their common structure abstractly as , we can do the same for and with this set: .

The set is what a one-element set like {dog} or {cat} really is. There's no mathematical meaning to that actually makes the element of this set a four-legged barking creature. It's just an element that happens to be labeled "dog."

So why do we care about sets? Set theory is really important to mathematics, but from a category-theoretic perspective, they actually aren't very important at all. Instead, sets serve one major purpose: sets let us define functions, and functions are really, really important!

Functions are maps between sets that meet a few rules. First of all, let's just talk about the "maps" part of that. Think of sets as countries, and the elements of the sets as cities in that country. A map between the two sets is a map telling you how to go from the cities in one country to the cities in the other country.

But of course, math is a little more specific than that. So let's say you have one set and another set . What does it mean to define a map—a morphism, in fact—from to ?

Well, it's pretty simple in the end. You have to start from a city in , so one of , or . And you have to end up in a city in , so one of or . Let's say you go from to . Then the map is just...the set of where you started from and where you ended up. That is, and , respectively.

That's it! It's a short trip. There's not much to sets in the first place, so there's not much to maps between them. (Technically, sets have no structure—they're imaginary bags filled with black dots—and so the maps between them are as simple as a map across a country with no geography.) But let's point out one thing: we do need this map to tell us where we started and where we ended. In a set, the order of the elements doesn't mean anything. For example, the set and the set are literally identical. The only reason the elements are even written in an order at all is because there's no other way to display text.

To get our elements in order, so that we can show where we started from and where we ended up, we use something called an ordered pair, which you've probably seen from doing Cartesian coordinates. When we have a map telling us to go from to , we represent that as an ordered pair . The ordered pair means "we started at and ended up at ."

Although sets don't have orders, we can have sets of ordered pairs (since we can put pretty much anything into sets), and in that way we can represent order in a set. For example, you can have the set consisting of just the ordered pair . That would be written .

So what does it mean to define a map from to ? It means defining a set of ordered pairs, the first part of the ordered pair coming from the set and the second part of the ordered pair coming from the set .

That is to say, a map is defined as a set whose elements are ordered pairs , where is an element in and is an element in .

So for example, we could have a map with the following directions: This maps says, "If you're in , go to . If you're in , go to . If you're in , go to ." All such maps are called binary relations, because they define relationships between pairs of things. For example, the map just given defines relationships between and , and , and and .

We could define all sorts of maps based on this definition. You could make "partial" maps that tell you where to go if you're in , but not if you're in or . You could make "choose your own adventure" maps that have forking directions, e.g., a map having both and in them.

What's the best map? That is unquestionably a special kind of binary relation known as a function. "Best" might be a strong word, but functions have two special properties that have made it the most important type of map, and indeed morphism, in all of mathematics.

The first property of functions is that they provide you instructions for going from to for every "city" in . Let's move away from the countries-and-cities metaphor and consider the study of science. Think of the elements of as possible states of the world. As scientists, we'd like a scientific rule that gives us predictions for every possible state of the world, i.e., something that provides a map for every element of . That's something a function does—this property is called total.

The second property of functions is something called univalence, which means that you only get one mapping for every city you could be starting from. That is to say, if your function tells you to do and , it must be the case that and are completely identical, i.e., . Having a map to two different cities starting from the same city is strictly disallowed by univalence.

Let's relate univalence to science as well. Basically, it captures the idea of determinism. If a state of the world yields a particular output, and then you reset things to that exact state of the world again, it had better yield the same output again. I.e., if you have a state of the world , and you observe that yields output and also yields output , the outputs and had better be the same output that accidentally got two different labels applied to it.

So between totality and univalence, a function basically captures the idea of "making deterministic predictions for everything." Which is exactly what science is all about.

We can combine totality and univalence into this single rule: A function is a set of ordered pairs where each element of shows up in one and only one ordered pair. That is to say, is definitely in an ordered pair, as guaranteed by totality. But by univalence, it will only show up once: if you have , then you won't also have .

You should know that while a function can't give you and unless , it can totally give you and . That would just be saying that two states of the world end up at the same point, which certain seems like a scientific possibility.

Now we're going to address sets and functions from a category theoretic perspective. In fact, we're going to make a category.

***

Let's build a category whose objects are sets and whose morphisms are functions.

The first step is to make sets into objects. We do this by saying that sets are objects. Seriously, it's that simple—objects don't really have any properties at all aside from their morphisms, so there's nothing more to this step.

The next step is to make our functions into morphisms. We do this by saying that functions are morphisms, and then we check that functions obey the rules of morphisms in a category.

First, domain and codomain. Sets serve as the domain and codomain of functions, and since sets are our objects, the functions will clearly have the objects of this category as their domain and codomain.

So the first real thing to figure out is composition. Let's say we have and as sets, and functions and . Composition would require that we have another function such that .

Let's break this requirement down. The function starts with the elements in and returns some elements in . That is to say, we have a set of ordered pairs of the sort , where comes from and comes from . Say that consists of and B is . The function allows us to specifically assign an element in to each element of . That is to say, we can ask, what is the specific (i.e., only one) element of corresponding to ? It could be , for example. If so, then . And we can ask the question next for and . We might reuse elements of or not. Let's say the full function gives .

Having assigned elements of to elements of in this way, we could think of elements of as "hiding in" elements of . For example, is hiding in . (That is to say, the function f is hiding in —it doesn't get to hide there automatically.)

Next we have , which specifically assigns an element in to each element of . Say that consists of . Let's say gives .

Now let's reveal our hidden soldiers. The elements and were hiding in , and was hiding in . Naturally, they ambush the soldiers of like so: .

(Why is this ambush allowed? Because means , and means . Substituting for , we have .)

Is that "ambush" set a function? Yes, it has each element of in an ordered pair, and each element is in only one pair. Is it a function ? Yes, the "first part" of each ordered pair comes from and the "second part" of each comes from . Can we call this function ? Yes, we just label it that for free. Is this function the same as doing first, and then ? Yes, that's exactly what we just saw.

So functions compose. Check mark.

Now let's prove that these functions compose associatively. Say you have functions . We want to show that .

Let's plug in an arbitrary element of . Our functions carry that element through to some element of , and we want to know if it's the same regardless of how you group the functions. So let's see if

We know that composition is allowed (we're proving that composition is associative, after all). So let's compose. Say that and . Now we can simplify to asking if .

Well, what is ? It's a mapping of an element from through to , and from through to . And is equal to the path that going through from through to , and from there through to . So overall, maps an element from through to , through to , and through to .

And what is ? It's a mapping of an element from through to , and from through to . And since is equal to the path that goes from through to , and from there through to . So overall, maps an element from through to E, through to , and through to .

Which is exactly what we just said about . Because they're both carrying the same element through the same functions, they have to end up at the same element in on pain of violating univalence. So they're equal for the case of , and since is an arbitrary element in an arbitrary set, it works in general. Thus, composition is associative.

Finally we need each set to have an identity morphism. Because our morphisms are functions, this will be the identity function. This is as simple as asking whether, for any set , we can define a function that does nothing.

Here's an example of such a function. Say . Then a function would do nothing if it gave . That is to say, each element just gets mapped back to itself.

Let's show that does nothing. I.e., if you have a function , the composition is just equal to . Well, duh! The function is a mapping of elements from to . So if you keep all the elements where they were in , then is just going to be what was going to be if you didn't do anything in the first place.

Obviously, you can just pair up each element with itself for any set, and that's going to keep not doing anything no matter how big the set gets. So every set has an identity function. (And only one, at that—there's only one way to pair up each element with itself.)

And now we're done. We've just shown how sets and functions can be considered a category: sets are objects by declaration, and we can show that functions between sets compose associatively, and an identity function can be defined for every set.

Neat!

You might be wondering if we could have defined a category with sets as objects and binary relations as morphisms. Indeed we can, in pretty much the same way as we did with sets and functions. In fact, since functions are just particular types of binary relations, proving that binary relations meet the rules of a category in general would have proved it for the specific case as well. In fact, the category of sets and functions is a subcategory of the category of sets and binary relations.

Yet it is the case that the category of sets and functions gets the significant name Set, whereas the category of sets and binary relations gets the much less interesting name Rel. That's because it is the category-theoretic perspective that everything that's interesting about a category is contained in its morphisms. Functions are basically the most important morphism in mathematics, so we give the category for which functions are morphisms the name Set—we care about sets because, more than anything, they let us define functions.

***

One last note on defining the category of sets, Set. You may have heard of Russell's paradox, which says you can't have the set of all sets. That's because sets have to be defined according to a rule, as we said in the beginning. What if you try to define a set of all sets that are not elements of themselves? Then one possibility is that this set is either an element of itself, in which case it is included, in which case we have violated the rule just laid down. But the only other possibility is that it is not an element of itself, in which case it should be an element of itself according to its rule. But we just saw why we can't let it be an element of itself. So we bounce back and forth in eternal paradox.

So we can't have a set of all sets. Then how can we have a category of all sets? We'll discuss that in the next post, which will address this issue and use our new understanding of sets to more properly formalize the definition of a category given a couple of posts ago.

Additionally, we'll look at two other interesting things about sets. We'll see how, although sets are bags of black dots, we can use functions to give those black dots meaning. It's the category-theoretic view that everything you want to know about an object is found in the object's morphisms. Furthermore, we'll also see that there's two special kinds of sets which can be though of as the "minimal" and "maximal" set, respectively. In doing so, we'll get our first tastes of the two main goals of this series, the Yoneda lemma and adjoint functors.

***

Incidentally, writing these posts has been shockingly exhausting and time-absorbing. So after this next post, I don't expect anything further on the weekend, although I may try to answer comments then. Five posts a week is not sustainable. 2-3 a week is probably more reasonable. This experience has given me a lot of respect for people who keep up daily blogs or write books. Thanks very much to people reading and commenting on these so far. It's very useful to have feedback and gratifying to know that people are interested in category theory.

I am going to work as well on lowering the cost to myself of creating drawings to aid explanations. I have always explained this material in the past with easy access to pen and paper, and somehow it never quite occurred to me that even sketching out a few dots and arrows is much more of a pain on a computer. Taking recommendations, if you have any.

New Comment
22 comments, sorted by Click to highlight new comments since:
Incidentally, writing these posts has been shockingly exhausting and time-absorbing.

Of course. Don't feel bad about it; sometimes is takes me 20 minutes to write a comment. For longer texts, the only thing that can significantly increase my speed is alcohol; it turns off the desire to rewrite each sentence dozen times until it's perfect. (Spoiler: After rewriting the sentence dozen times, it usually still feels bad, maybe even worse.)

I was thinking about a different technique, though I never tried it: to simply speak the article to a microphone, and then rewrite my speech. Perhaps you could do it along with drawing the diagrams on paper, and then also scan the paper and add the pictures to the article.

I have always explained this material in the past with easy access to pen and paper, and somehow it never quite occurred to me that even sketching out a few dots and arrows is much more of a pain on a computer.

Just draw it on the paper. Then either scan it, or simply make a photo -- your smartphone almost certainly has much higher resolution than you need.

You're clearly putting a lot of effort into this, but I have doubts about the exposition style. For someone who doesn't know multiple math fields, cat theory is useless. For someone who does, a more compact intro should suffice, like these slides or the SEP entry.

I agree. I'm shifting gears to work on something basically aimed at the idea that the intelligent layperson can grasp Yoneda lemma and adjunction if it's explained.

As someone with half an undergrads worth of math background, I've found these posts useful to grasp the purpose and some of the basics of category theory. It might be true that there's exist some exposition out there which would work better, but I haven't found/read that one, and I'm happy that this one exists (among other things, it has the not-to-be-underestimated virtue of being uneffortful to read). Looking forward to the Yoneda and adjunction posts!

To my unnderstanding sets are defined by their members. The part about there having ot be some rule sounds very starng. Sure you need to somehow express what the members are. Inparticular if you have two rules but you end up picking the same members then you have defined only 1 set not 2. The rule is not part of the make up of the set. It is confusing because it seems a lot of real deduction uses the "intent" of setting up the set to deduce what it does or does not contain. But that kind of deduction could be carried out without reference to memberhips. If sets need to have rules, how do the composition functions obtain their rule?

There is also a big difference between a set containing the word "dog" and a set containing a dog. And while in the start it seemed that "dogs lose all properties" iam guessing that by the end we have things like legs(dog)=4 which does move all the interesting stuff to the morphisms but the labels are not arbitrary. In order to set up the correct morphisms I would need to know a whole lot about dogs.

It was also supposed to be that objects are nouns and morphisms are verbs. It is wierd to think that legs(x) is a verb or that legs turns dogs into numbers (in a sense of we predict this dog will spontaneusly combust into number 4)

In the end, we don't really care about sets at all. They're just bags with stuff in them. Who cares about bags? But we do care about functions—we want those to be rule-based. We need functions to go "from" somewhere and "to" somewhere. Let's call those things sets. Then we need these "sets" to be rule-based.

I'm grateful for your comments. They're very useful, and you raise good points. I've got most of a post already about how functions give meaning to the elements of sets. As for how functions is a verb, think of properties as existing in verbs. So to know something, you need to observe it in some way, which means it has to affect your sensory devices, such as your ears, eyes, thermometers, whatever. You know dogs, for example, by the way they bark, by the way they lick, they way they look, etc. So properties exist in the verbs. "Legs" are a noun, but all of your knowledge about them has to come from verbs. Does that make sense?

You are making partially sense in that you are pointing to a modelling style but it does leave me unsure whether I can correctly fill in the missing bits. My thinking is interferred a lot by "getter functions" that are of the form [code]function get_color(self){ return self.color }. One of the point of such is that attributes tend to be private but methods are public and the programmer should he need to do so could change the implementation details without messing outside customers. So the modeeling style shares a similarity that objects are allowed to secretly have details outside of their interface. Sure if we have verbs and objects mixed up but can express object-like things as verbs by converting objects to verbs we only have to care about one basic ontology type. But I am unsure whether I missed it or is it fortcoming why it is important or valuable to focus on the verbs.

I am unsure what rule-basedness is but if it is different from extensional conception of functions then I would be super intrigued. I can get that sensing should be modelled with functions in that way but it seems contradictory how functions were supposed to be prediction or evolution models. So if I have a(b(c(d))) does it mean that first d goes throught two kinds of evolutions and is then observed or d goes throught one kind of evolution and then observation of that is observed. I am expecting this kind of divison is not an actual problem but I can't effortlessly go from the function formalism to observation/prediction formalisms and would likely make a lot of errors there.

My next series of posts will be directly about the Yoneda lemma, which basically tells us that everything you could want to know about an object is contained in the morphisms going into/out of the object. Moreover, we get this knowledge in a "natural" way that makes life really easy. It's a pretty cool theorem.

The analogy to geographic maps might confuse someone who knows geographic maps but not mathematical maps since "a map is a thing connecting cities in one country with cities in another country" has nothing to do with how you use a geographic map.

Throws me off a bit early on, you go directly from dog being an actual dog to dog being an arbitrary variable name. At this point I think of dog as "some x such that x is a dog", so {dog} and {cat} are different because one has an x that is a dog and another has a y that is a cat which is not a dog.

First, domain and codomain. Sets serve as the domain and codomain of functions, and since sets are our objects, the functions will clearly have the objects of this category as their domain and codomain.

I don't think this gives you the normal category of sets? I believe that's usually defined taking the codomain of a morphism to be the range of a function, which is a superset of its codomain. That is, the function "plus one" on the natural numbers has a codomain of the positive integers; but its range may be the natural numbers, or the reals, or indeed any superset of the positive integers. And these functions are indistinguishable as sets, but in the category of sets, there's a morphism for every one of them.

The terminology is the other way round. The range (also called the image) of a function is the set of values it actually takes. The codomain is whichever superset of the range you are considering as the set the function maps to, the "result type" of the function. So the range of the +1 function on the domain ℕ is the positive integers, but the codomain is any superset of that, and gives a different morphism in the category Set for each one.

Whelp, I've been getting that wrong for some time now, thanks.

Ah, apparently "range" is ambiguously used as both "codomain" and "image". I think I was taught it as codomain (before I was taught that word), and then at some point started to think of codomain as "the one that isn't the range".

https://en.wikipedia.org/wiki/Range_(mathematics)

It's older terminology. Everyone says image now.

I thought I had it right, and then mixed it up in my head myself.

That was my intention. Thanks for pointing it out. One of the mistakes of this series was the naive belief that simplicity comes from vagueness, when it actually comes from precision. Dumb of me.

[This comment is no longer endorsed by its author]Reply

Then the map is just...the set of where you started from and where you ended up. That is, a and x, respectively.

This sounds like the map is {a,x}.

If you run out of steam before reaching adjunctions, I hope you can manage a post about adjunctions that assumes that you had finished all the previous posts.

You say that functions are the best maps because of those two properties, but they are simply the defining properties of a function. What makes these properties the best properties for a definition of maps to have?

Steam is run out of. This was poorly conceived to begin with, arrogant in its inherent design, and even I don't have the patience for it anyway. I'll do a series about adjunction directly and Yoneda as well.

I'm commenting on this post probably two years too late, but I wanted to express my enthusiasm for this series you started on Category Theory! CT seems really cool and I recently started browsing Category Theory in Context by Emily Riehl, but paused because I felt like I haven't explored enough different branches of math deeply to see the beauty in what Riehl was sharing. The few posts you wrote here on LW sparked my interest again. I'm writing mostly for myself now, but also as a clue for others that come next. My plan from here is to explore:

johnswentworth's Category Theory Without The Baggage

johnswentworth's CTWTB: Paths of Computation State

Richard Southwell's Category Theory For Beginners: Introduction [58 min]

David Spivak/Brendan Fong/Topos Institute's Applied Category Theory Video Series (@ MIT 2019) [15 lectures, about 50 min each]

Wikiversity Introduction to Category Theory

Here is what confuses me: from before, I thought morphisms were "just" arrows between objects, with a specific identity.

But in the case of functions, we have to smuggle in the set of ordered pairs that define them. Do you simply equate the identity of a function with this set definition?

That might be fine, but it means there needs to be some kind of ... semantics? that gives us the "meaning" (~ implementation) of composition based on the "meaning" (the set of ordered pairs) of the composed morphisms.

Am I right here?

You raise a good point. Think of category theory as a language for expressing, in this case, the logic of sets and functions. You still need to know what that logic is. Then you can use category theory to work efficiently with that logic owing to its general-abstract nature.

That would the ambush part?