- Why did someone bother coming up with it?
- Why did they think it might be able to give us useful insights?
- And what interesting insights into reality does it give us?
Algebraic topology is the discipline that studies geometries by associating them with algebraic objects (usually, groups or vector spaces) and observing how changing the underlying space affects the related algebras. In 1941, two mathematicians working in that field sought to generalize a theorem that they discovered, and needed to show that their solution was still valid for a larger class of spaces, obtained by "natural" transformations. Natural, at that point, was a term lacking a precise definition, and only meant something like "avoiding arbitrary choices", in the same way a vector space is naturally isomorphic to its double dual, while it's isomorphic to its dual only through the choice of a basis.
The need to make precise the notion of naturality for algebraic topology led them to the definition of natural transformation, which in turn required the notion of functor which in turn required the notion of category.
This answers questions 1 and 2: category theory was born to give a precise definition of naturality, and was sought to generalize the "universal coefficient theorem" to a larger class of spaces.
This story is told with a lot of details in the first paragraphs of Riehl's wonderful "Category theory in context".
To answer n° 3, though, even if category theory was rapidly expanding during the '50s and the '60s, it was only with the work of Lawvere (who I consider a genius on par with Gödel) in the '70s that it became a foundational discipline: guided by his intuitions, category theory became the unifying language for every branch of mathematics, from geometry to computation to logic to algebras. Basically, it showed how the variety of mathematical disciplines are just different ways to say the same thing.
To give an answer on a fairly concrete level―I don't know a lot of category theory, but one interesting insight I've gained from my limited study of it, which has convinced me that it's worth learning, is about the notion of products of structures.
Most mathematics probably acquire an informal understanding of the notion of the products of structures before they study category theory. In the simplest cases we can say that taking the product of some structures is just the act of going from looking at individual elements to looking at composite elements that are made up of parts, and applying operations to these composites by distributing those operations over those parts. For example the product of the real line with itself can be thought of as the set of ordered pairs of real numbers, and addition of such pairs can be carried out by making the ordered pair whose first projection is the sum of the first projections and whose second projection is the sum of the second projections.
This sort of understanding of the notion of product works for algebraic structures, and is formalized in the part of mathematics known as universal algebra (which is basically category theory applied to specifically algebraic structures). But when it's naively applied to other structures it doesn't always work so well. The example that I'm familiar with is topologies. The analogous way to form a product of some topologies is to say that open sets in the product topology are Cartesian products of the sets in the component topologies. This defines what's known as the box topology. But this notion isn't all that useful―it lacks many "nice" properties. For example, box topologies are not guaranteed to be compact if all their components are compact.
(Mathematicians talk a lot about "nice"-ness, but I've never seen that much discussion about what it actually means. So I don't know how far mathematicians will agree with the description of what "nice"-ness is that I'm about to give. But as far as I understand, an operation is "nice" to the extent to which interesting properties of the object formed as a result of applying the operation can be determined from the properties of the operands. The "nice"-er an operation is, the more it pays off to think of an object as made up of simpler operands that will yield the object upon applying the operation, because conclusions that can be drawn about the operands can be transformed into conclusions about the result. This is why it matters that the box topology is not "nice".)
It turns out that a much "nice"-er notion of product results if you define the open sets in the product topology as Cartesian products of open sets in the component topologies with the extra proviso that only finitely many of the open sets in the product are not equal to the whole underlying set of the component topology. With this notion of product topology, we have that products of compact topologies are compact, sequences of points in product topologies converge iff the corresponding sequences of projections all converge, etc.
From this, the question naturally arises: is there a general notion of product that will degenerate to the proper notion of the product topology (instead of the box topology) for topologies, while degenerating to the universal-algebraic notion of product for algebraic structures? And the answer is: yes―the category-theoretic notion of product. Basically, products in category theory are characterized by the property that the operation of taking projections on elements of a product is a morphism.
A morphism is just a member of an arbitrary collection of maps between structures that we regard as "structure-preserving" for the category of structures in question. For algebraic structures homomorphisms are normally regarded as morphisms; for topologies continuous maps are normally regarded as morphisms. But the choice of which maps to regard as morphisms is ultimately up to the mathematician; it depends what properties they are interested in the preservation of.
This is the big conceptual innovation of category theory: attention shifts from the structures themselves to the structures together with the structure-preserving maps between them. A composite object made up of a class of structures together with a class of structure-preserving maps between them is called a category. Hence the name category theory.
So this is where category theory comes from―it's a perfectly natural idea once one starts thinking about the relationships between different mathematical structures and abstracting out general notions of things like products and quotients. We get all the usual benefits of abstraction for insight-production: we can see the whole range of stuff to which our insight applies, rather than just having the insight for one particular thing within that range; and by thinking about things at the appropriate abstraction level we ignore extraneous specificities that would lead us down fruitless paths.
If you don't regard category theory in its full generality as an aspect of reality that's interesting in its own right, you can probably always about replace any reasoning about specific structures making use of general category-theoretic principles with reasoning that stays specific to the structure―however, the reasoning might be a lot more difficult, complex and cognitively inaccessible that way. Again, this is a general observation about the way in which abstractions provide insight into the specifics they abstract over.
This seems rather intuitive to me, and maybe that's because I trained as a mathematician and I did that because I just happen to have the sort of mind that finds additional layers of abstraction useful on their own, but for the sake of being explicit I'll tell you why I think it exists and why we would invent it now if it hadn't already been.
Mathematics is (maybe) fundamentally about finding patterns in the world and reasoning about those patterns. When we see enough patterns, we can find patterns in those patterns, and then when we see enough meta-patterns and we find patterns in those patterns, and so on and on until we can't find any more patterns. For example, geometry is, at least originally about the patterns we find when drawing stuff on a flat surface; arithmetic is about the patterns we find when we combine countable things; regular algebra is about the patterns we find in arithmetic; abstract algebra is about the patterns we find in regular algebra, geometry, and some other fields of mathematics with particular structures. Category theory is an extension of this pattern finding to the level of finding patterns across broad swaths of otherwise disconnected parts of mathematics.
It's useful for several reasons, some internal to the theory itself, but I think largely because it gives us more general ways to reason about more concrete things. In addition to mathematics I also trained and work as a programmer, so my view is perhaps a bit biased here, but I find general abstractions useful because they let us deal with many concrete things that we would otherwise have to handle as special cases. With category theory we no longer have a bunch of mathematical silos that require the redevelopment of various concepts, and instead we have a general field that can at least give us for free theorems and structures and relationships for any part of mathematics that it adequately covers, thus I can take a result in category theory and use it to find similar results when applied to various fields.
Category theory also helps, much as abstract algebra did before it, to identify shared patterns across different fields of mathematics to set up correspondences that allow the transmutation of, say, a problem about graphs into a problem about complex variables without relying on a bunch of one-off proofs of shared structure because you can appeal to the categories to show how they relate. Yes, there is always stuff that doesn't translate between fields because the fields have their own unique parts that are different because they are trying to model different things, but category theory at least lets us abstract away what we can from the noise and notice what's going on in common.
It's been a while since I did much academic math so I'm a bit fuzzy on specific results to point to, but I hope that gives a general sense of why category theory seems valuable and important to me.
In the original series of articles by Eilenberg and Mac Lane, they wrote something like:
"Category" has been defined in order to be able to define "functor" and "functor" has been defined in order to be able to define "natural transformation."
The word "natural" has a long history in mathematics. Category theory is a rigorous interpretation of what it means (neither stronger nor weaker than the more obvious notion of canonical). The first example of a natural transformation is the determinant. What does it mean that it is "natural," that is compatible across rings of coefficients?
One possible answer is that any given mathematical field doesn't have to provide any useful insights into reality, and a lot of them have only provided insight after they already existed. This happens because theoretical mathematicians usually are into mathematics because they really like mathematics, not because they want to provide useful insights into reality, although when it happens is a nice side benefit.
I am not a mathematician but I've been studying category theory for about a year now. From what I've learned so far it seems that it's main benefit within pure mathematics is that it gives a way of translating between different domains of mathematical discourse. On the face of it, even if you've provided a common set-theoretic foundation for all areas of math, it isn't obvious how higher level constructions in say, geometry, can be translated into the language of algebra or topology, or vice versa. So category theory was invented to facilitate this process of sharing mathematical insights across mathematical sub-disciplines. (I think specifically the context in which it originated was algebraic topology, which as the name implies uses techniques from abstract algebra to study topology.)
Later, computer scientists realized that category theory was useful for thinking about the structure of programs (e.g., data types and functions). For example, the concept of a Monad in functional programming which allows the simulation of side effects in a pure functional programming language comes directly from category theory. Bartosz Milewski is the person to look to if you are interested in learning about this aspect of things.
Even more recently (the last 10 years or so) people have started applying category theory to science more generally. Two books by David Spivak explore this here and here. I think much of this work in applied category is too recent to expect to see much in the way of big practical discoveries or breakthroughs. It remains to be seen if it will produce major innovations, but I think it is very promising. The hope is that category theory will provide scientists a way to model model more of the structure of both their research domain and the research process itself in a unified formalism. It also shows promise for modelling natural language concepts and argumentation, which could lead to better methods of computer knowledge representation.
On a more philosophical level, some have argued that category theory provides support for structuralism in the philosophy of mathematics. This view argues that mathematical entities are essentially structures, which is to say patterns of relationship. In category theory, what an object is is entirely determined by the pattern of relationships (morphisms) with other objects, within a given context (category). This contrasts with set theory, where sets are described in terms of their internal structure of elements and subsets. In practice, this means that set theory starts from the bottom (the empty set) and builds up to the whole mathematical universe, while category theory starts from the top (the category of categories) and then defines everything else in terms of universal properties.
Essentially, category theory validates the intuition that the number 5 isn't some specific object floating out in Platonic heaven, nor is it just a made up meaningless symbol. It is a structure that is defined by it's properties, and those properties are all determined by its relations to everything else. Without actually studying category theory it is difficult to see how this idea could be cashed out in a rigorous non-hand wavy way.
And what interesting insights into reality does it give us?
Risky question to ask when dealing with pure mathematics
Category theory, of which I'm acquainted with at a basic level, seems to formalize a lot of regularities I already knew about as a programmer and a student of <those mathematics topics that were taught to me as part of my CS master's degree>.
I found it mathematically neat, but I have never derived any useful insights from it. Said otherwise, nothing would have changed if I had never been introduced to it. This seems quite wrong to me, so I was quite interested in reading the answers here. Unfortunately, there is not much in ways of insight.
I would ask similar questions about any area of math generalizing other areas of math. Why does the Stokes-Cartan theorem exist? Why did people bother coming up with differential forms? Etc.