Whence the determinant?

Ege Erdil

The determinant is a rather bizarre concept when one first encounters it. Given a square matrix $A$ , it's defined as

$det A = \sum σ \in S_{n} (- 1)^{σ} n \prod i = 1 A_{i σ (i)}$

where $S_{n}$ is the symmetric group on $n$ elements - the collection of all permutations of a set of $n$ distinct objects - and $(- 1)^{σ}$ is the sign of the permutation $σ$ . $A_{i j}$ represents the entry of the matrix $A$ on the $i$ th row and $j$ th column.

This definition means that the determinant is a sum that has one term for each permutation $σ$ of the columns of the matrix $A$ , and these terms are the product of $n$ entries of the matrix up to a funny sign $(- 1)^{σ}$ depending on the specific permutation $σ$ .

To understand this definition, it's essential that we first understand what the sign of a permtuation is.

Signs

The sign corresponds to the parity of the number of pairs of elements whose order is reversed by a permutation: if this number is even then we say the permutation is even and of sign $1$ , and if it's odd we say the permutation is odd and of sign $- 1$ .

Let's do some simple examples to see how this works, letting $n = 3$ :

The identity permutation reverses the order of exactly zero pairs, and zero is even. Therefore the identity is even, or its sign is positive.
The permutation $(12)$ , which swaps $1$ and $2$ but leaves $3$ invariant, changes the ordering of only one pair: the pair $(1, 2)$ . So this permutation is odd.
The permutation $(123)$ sends $1 \to 2, 2 \to 3, 3 \to 1$ . This permutation changes the ordering of $(2, 3)$ and $(1, 3)$ but leaves $(1, 2)$ invariant, so it reorders exactly two pairs. In other words, it's even.

Another way to think about the sign is as follows: any permutation of a finite set can be obtained by repeatedly swapping two elements of the set with each other. For example, we can get the permutation $(123)$ defined above by first swapping $2, 3$ and then swapping $1, 2$ , i.e. $(123) = (12) (23)$ . This expression is of course not unique, since we also have $(123) = (13) (13) (12) (23)$ , for example: swapping $1, 3$ two times in a row just gets us back to where we started. However, it turns out given a specific permutation, the parity of the number of pair swaps (or transpositions) we need to perform to obtain it is well defined. In our case, for instance, while we can produce $(123)$ out of two or four transpositions, we can't produce it using exactly three.

So that's where the sign means. Knowing this, we can work out the determinant in some explicit cases. For instance, the determinant of a two-by-two matrix is $A_{11} A_{22} - A_{12} A_{21}$ .

This expression still doesn't look like it should mean anything, so we need to further justify it. The key property of the determinant is that it commutes with matrix multiplication: for two matrices $A, B$ , we have $det (A B) = det (A) det (B)$ . If we want to multiply two square matrices and then take their determinant, it doesn't matter which order we do the operations in: we can take the determinants first and then multiply or the other way around. We'll get the same answer in the end.

This is a desirable property because matrices are big and complicated objects while numbers are comparatively simpler. The determinant gives us a way to reduce some questions about matrices to questions about numbers, which can be much easier to answer. It does this by throwing away a lot of information about the matrix, but that's not necessarily a problem depending on what we want to do.

So we want to find a map $det$ which has the property we just mentioned: it commutes with matrix multiplication. Furthermore, we should ask it to not be a trivial map: it shouldn't just send all matrices to $0$ or $1$ , since a constant function doesn't give us any information about anything. How might we go about finding such a map?

Finding the determinant

Let's first look at a special class of matrices. We know that a vector space $R^{n}$ , for example, has an obvious basis consisting of the vectors

$e_{1} = (1, 0, 0, \dots, 0) e_{2} = (0, 1, 0, \dots, 0) \dots e_{n} = (0, 0, \dots, 0, 1)$

We can form a subgroup of $n \times n$ matrices closed under multiplication by just looking at matrices where the columns are a permutation of these vectors. Now, we see a connection with the sign of a permutation: it's the only nontrivial way we know (and in fact it's the only way to do it at all!) to assign a scalar value to a permutation in a way that commutes with composition, which in this special case we know the determinant must do. Therefore, we can tentatively define

$det (e_{σ (1)}, e_{σ (2)}, \dots, e_{σ (n)}) = (- 1)^{σ}$

where the notation represents that this is the determinant of a matrix with columns $e_{σ (1)}, e_{σ (2)}, \dots$ respectively. This tells us the determinant has something to do with signs of permutations. In fact, combined with the fact that the determinant commutes with matrix multiplication, by multiplying any matrix $A$ with rows $A_{1}, A_{2}, \dots, A_{n}$ by a permutation matrix $(e_{σ (1)}, e_{σ (2)}, \dots, e_{σ (n)})$ , we can deduce

$det (A_{σ (1)}, A_{σ (2)}, \dots, A_{σ (n)}) = (- 1)^{σ} det (A)$

In other words, if seen as a map defined on the rows (or columns) of a matrix, the determinant is alternating: swapping two rows or columns changes the sign of its value. We can infer from this that if a matrix has two identical rows or columns its determinant will be zero.

Here we seem to be stuck again: the problem is the assumptions we've made so far are too weak. We can look to make them stronger while making the determinant have more and more nice properties in the process.

The nicest property we could ask for is that $det$ is linear as a function of matrices, that is, $det (A + B) = det (A) + det (B)$ . However, this condition is much too strong. For example, if we take the $3 \times 3$ identity matrix and apply the permutation $(123)$ to its columns twice to get a total of three matrices, it's easy to see that all of them are of determinant $1$ but their sum is a matrix with all entries equal to $1$ and so must have determinant $0$ . In other words, if we ask $det$ to commute with addition as well as multiplication, we can't continue our construction.

What is weaker than this that we could hope to ask for? Well, we've already seen that the determinant can be seen as a function of the columns of a matrix, so perhaps instead of being linear in the whole matrix it's linear just in the columns and rows. In other words,

$det (r_{1} A_{1} + r_{2} B_{1}, A_{2}, \dots, A_{n}) = r_{1} det (A_{1}, A_{2}, \dots, A_{n}) + r_{2} det (B_{1}, A_{2}, \dots, A_{n})$

and since the determinant is alternating this will generalize to all other columns as well.

It may not seem like it, but we already have enough information now to determine $det$ uniquely. Indeed, this is because any $n \times n$ matrix $A$ can be expressed as

$(n \sum i = 1 A_{i 1} e_{i}, n \sum i = 1 A_{i 2} e_{i}, \dots, n \sum i = 1 A_{i n} e_{i})$

If this is not clear to you, simply imagine we decompose each column into a linear combination of the elementary basis vectors $e_{i}$ .

We know the determinant is linear, and we know what values the determinant takes on permutation matrices, so we can actually evaluate the determinant of this directly using the properties we've postulated so far. We simply "expand out" the map using linearity to get

$det (n \sum i = 1 A_{i 1} e_{i}, n \sum i = 1 A_{i 2} e_{i}, \dots, n \sum i = 1 A_{i n} e_{i}) = \sum f : n \to n det (e_{f (1)}, e_{f (2)}, \dots, e_{f (n)}) n \prod i = 1 A_{i f (i)}$

where the outside sum runs over all functions from ${1, 2, \dots, n}$ to itself. Now all we have to do is to plug in the values of determinants we've already figured out. If $f$ is not a permutation, in other words if it takes two identical values, then the determinant will be zero. So we can assume $f$ is actually a permutation, in which case we already know that the determinant must equal the sign of $f$ . In other words,

$det A = \sum σ \in S_{n} (- 1)^{σ} n \prod i = 1 A_{i σ (i)}$

This is obviously alternating and multilinear, and we can show that any such map must actually commute with matrix multiplication, so we've found the map we were looking for.

Takeaways

The way I went about finding the determinant above might not be intuitive. At first glance, it looks counterproductive to add further assumptions about the map when we only want it to commute with matrix multiplication. However, there are two reasons why this is a good strategy:

If it works, the map we end up with is much more regular and well behaved than what we would've obtained if we picked an arbitrary map just satisfiyng our original condition. In our context, we clearly want $det$ to be as well behaved as possible, so this is only beneficial to us.
Narrowing the search space down to maps we understand better is a good start in any search process. Either it succeeds, in which case the additional assumptions will have only made things easier for us; or it fails, in which case we learn a useful fact that a map having some collection of properties is actually impossible. Failure can give us some insight into how we might want to weaken our conditions in order to be successful the next time around.

I always liked the interpretation of the determinant as measuring the expansion/contraction of n-dimensional volumes induced by a linear map, with the sign being negative if the orientation of space is flipped. This makes various properties intuitively clear such as non-zero determinant being equivalent to invertibility.

Yup, determinant is how much the volume stretches. And trace is how much the vectors stay pointing in the same direction (average dot product of v and Av). This explains why trace of 90 degree rotation in 2D space is zero, why trace of projection onto a subspace is the dimension of that subspace, and so on.

Thank you for that intuition into the trace! That also helps make sense of .

Interesting, can you give a simple geometric explanation?

My intuition for is that it tells you how an infinitesimal change accumulates over finite time (think compound interest). So the above expression is equivalent to $det (I + ε A) = 1 + ε t r (A) + O (ε^{2})$ . Thus we should think 'If I perturb the identity matrix, then the amount by which the unit cube grows is proportional to the extent to which each vector is being stretched in the direction it was already pointing'.

Hmm, this seems wrong but fixable. Namely, exp(A) is close to (I+A/n)^n, so raising both sides of det(exp(A))=exp(tr(A)) to the power of 1/n gives something like what we want. Still a bit too algebraic though, I wonder if we can do better.

Another thing to say is if then

$\frac{d}{d x} det (A (x)) {∣ ∣}_{x = 0} = t r (A^{'} (0))$ .

I think the determinant is more mathematically fundamental than the concept of volume. It just seems the other way around because we use volumes in every day life.

I think the good abstract way to think about the determinant is in terms of induced maps on the top exterior power. If you have an dimensional vector space $V$ and an endomorphism $L : V \to V$ , this induces a map $\land^{n} V \to \land^{n} V$ , and since $\land^{n} V$ is always one-dimensional this map must be of the form $v \to k v$ for some scalar $k$ in the ground field. It's this $k$ that is the determinant of $L$ .

This is indeed more fundamental than the concept of volume. We can interpret exterior powers as corresponding to volume if we're working over a local field, for example, but actually the concept of exterior power generalizes far beyond this special case. This is why the determinant still preserves its nice properties even if we work over an arbitrary commtuative ring; since such rings still have exterior powers behaving in the usual way.

I didn't present it like this in this post because it's actually not too easy to introduce the concept of "exterior power" without the post becoming too abstract.

This is close to one thing I've been thinking about myself. The determinant is well defined for endomorphisms on finitely-generated projective modules over any ring. But the 'top exterior power' definition doesn't work there because such things do not have a dimension. There are two ways I've seen for nevertheless defining the determinant.

View the module as a sheaf of modules over the spectrum of the ring. Then the dimension is constant on each connected component, so you can take the top exterior power on each and then glue them back together.
Use the fact that finitely-generated projective modules are precisely those which are the direct summands of a free module. So given an endomorphism you can write $V \oplus V^{'} = R^{n}$ and then define $det (A) = det (A \oplus {i d}_{V^{'}})$ .

These both give the same answer. However, I don't like the first definition because it feels very piecemeal and nonuniform, and I don't like the second because it is effectively picking a basis. So I've been working on by own definition where instead of defining $Λ^{n} (V)$ for natural numbers $n$ we instead define $Λ^{W} (V)$ for finitely-generated projective modules $W$ . Then the determinant is defined via $Λ^{V} (V)$ .

I'm curious about this. I can see a reasonable way to define in terms of sheaves of modules over $S p e c (R)$ : Over each connected component, $W$ has some constant dimension $n$ , so we just let $Λ^{W} (V)$ be $Λ^{n} (V)$ over that component. But it sounds like you might not like this definition, and I'd be interested to know if you had a better way of defining $Λ^{W} (V)$ (which will probably end up being equivalent to this). [Edit: Perhaps something in terms of generators and relations, with the generators being linear maps $W \to V$ ?]

I'm curious about this. I can see a reasonable way to define in terms of sheaves of modules over $S p e c (R)$ : Over each connected component, $W$ has some constant dimension $n$ , so we just let $Λ^{W} (V)$ be $Λ^{n} (V)$ over that component.

If we call this construction $F (W, V)$ then the construction I'm thinking of is $F (W, V) \otimes F (W, W)^{*}$ . Note that $F (W, W)$ is locally $1$ -dimensional, so my construction is locally isomorphic to yours but globally twisted. It depends on $W$ via more than just its local dimension. Also note that with this definition we will get that $Λ^{V} (V)$ is always isomorphic to $R$ .

But it sounds like you might not like this definition,

Right. I'm hoping for a simple definition that captures what the determinant 'really means' in the most general case. So it would be nice if it could be defined with just linear algebra without having to bring in the machinery of the spectrum.

and I'd be interested to know if you had a better way of defining $Λ^{W} (V)$ (which will probably end up being equivalent to this).

I'm still looking for a nice definition. Here's what I've got so far.

If we pick a basis of $R^{n}$ then it induces a bijection between $H o m (R^{n}, V)$ and $V \times \dots \times V$ . So we could define a map $H o m (R^{n}, V) \to U$ to be 'alternating' if and only if the corresponding map $V \times \dots \times V \to U$ is alternating. The interesting thing I noticed about this definition is that it doesn't depend on which basis you pick for $R^{n}$ . So I have some hope that since this construction isn't basis dependent, I might be able to write down a basis-independent definition of it. Then it would apply equally well with $R^{n}$ replaced with $W$ , whereupon we can define $Λ^{W} (V)$ as the universal alternating map out of $H o m (W, V)$ .

[Edit: Perhaps something in terms of generators and relations, with the generators being linear maps $W \to V$ ?]

Yeah exactly. That's probably a simpler way to say what I was describing above. One embarrassing thing is that I don't even know how to describe the simplest relations, i.e. what the $0$ s should be.

If we call this construction then the construction I'm thinking of is $F (W, V) \otimes F (W, W)^{*}$ . Note that $F (W, W)$ is locally $1$ -dimensional, so my construction is locally isomorphic to yours but globally twisted. It depends on $W$ via more than just its local dimension. Also note that with this definition we will get that $Λ^{V} (V)$ is always isomorphic to $R$

Oh right, I was picturing $W$ being free on connected components when I suggested that. Silly me.

If we pick a basis of $R^{n}$ then it induces a bijection between $H o m (R^{n}, V)$ and $V \times \dots \times V$ . So we could define a map $H o m (R^{n}, V) \to U$ to be 'alternating' if and only if the corresponding map $V \times \dots \times V \to U$ is alternating. The interesting thing I noticed about this definition is that it doesn't depend on which basis you pick for $R^{n}$ . So I have some hope that since this construction isn't basis dependent, I might be able to write down a basis-independent definition of it.

$F$ is alternating if $F (f \circ g) = det (g) F (f)$ , right? So if we're willing to accept kludgy definitions of determinant in the process of defining $Λ^{W} (V)$ , then we're all set, and if not, then we'll essentially need another way to define determinant for projective modules because that's equivalent to defining an alternating map?

if not, then we'll essentially need another way to define determinant for projective modules because that's equivalent to defining an alternating map?

There's a lot of cases in mathematics where two notions can be stated in terms of each other, but it doesn't tell us which order to define things in.

The only other thought I have is that I have to use the fact that is projective and finitely generated. This is equivalent to $W$ being dualisable. So the definition is likely to use $W^{*}$ somewhere.

For what it's worth, I strongly agree (1) that exterior algebra is the Right Way to think about determinants, conditional on there being other reasons in your life for knowing exterior algebra, and (2) that doing it that way in this post would have had too much overhead.

(+1, came here to say this. Seems deficient to think of determinants without including this interpretation.)

I felt dumb recently when I noticed that the determinant is sort of "the absolute value for matrices", considering that it's literally written using the same signs as the absolute value. Although I guess the determinant of the representation of a complex number as a matrix is , not $| a + b i |$ . The "signed volume" idea seems related to this, insofar as multiplying a complex number $z_{1}$ by another $z_{2}$ will stretch / smush $z_{1}$ by $| z_{2} |$ (in addition to rotating it).

This is actually (hopefully) the first post in a series, and I'll talk about this way of looking at the determinant in a subsequent post. It actually generalizes past vector spaces over if you do it in the appropriate way.

The problem is that the equivalence of this to the usual determinant is not easy to prove unless you have some machinery at your disposal already. It's "obvious" that volume should be invariant under skew translations, for example, but the easiest proof I know of this simply goes through the determinant of a skew translation matrix and shows it's equal to $1$ .

If you have the time, try thinking of how you would prove that the characterization you give here is actually equivalent to the characterization of the determinant in the post - alternating and multilinear map that's equal to $1$ on the identity matrix. The "multilinear" part turns out to be rather tricky to establish properly.

Thinking about how to prove the multilinearity of the volume of a parallelepiped definition I like this sketched approach:

The two dimensional case is a “cute” problem involving rearranging triangles and ordinary areas (or you solve this case in any other way you want). The general case then follows from linearity of integrals (you get the higher dimensional cases by integrating the two dimensional case appropriately).

So, this is not exactly a rigorous proof, but off the top of my head, I would justify/remember the properties like: identity has determinant 1 because it doesn't change the size of any volumes, determinant is alternating because swapping two axes of a parallelpiped is like reflecting those axes in a mirror, changing the orientation. Multilinearity is equivalent to showing that the volume of a parallelpiped is linear in all the $a_{i}$ s. But this follows since the volume of $A$ is equal to the volume of $a_{2} \land (. . .) \land a_{n}$ multiplied by the component of $a_{1}$ projected onto the axis orthogonal to $a_{2}, . . ., a_{n}$ , which is clearly linear in $a_{1}$ . This last fact is a bit weird, to justify intuitively I imagine having a straight tower of blocks which you then 'slant' by pulling the top in a given direction without changing the volume, this corresponding to the components of $a_{1}$ not orthogonal to $a_{2}, . . ., a_{n}$ .

Yeah, that's what I mean by saying it's "obvious". Similar to the change of variables theorem in that way.

I like to think it in this way: the determinant is the product of the eigenvalues of a matrix, which you can conveniently compute without reducing the matrix to diagonal form. All interesting properties of the determinant are very easy (and often trivial!) to show for the product of the eigenvalues.

More in the spirit of your post, I don't remember how hard it is to show that the determinant is invariant under unitary transformation, but not too hard I think. It's not the only invariant of course (the trace is as well, I don't remember if there are others). But you could definitely start from the product of eigenvalues idea and make it invariant to get the formula for det.

det(AB) = det(A)det(B), so the determinant is invariant to any change of basis, not merely unitary ones:

$= det (A) det (B) det (A^{- 1})$ $= det (A A^{- 1}) det (B)$ $= det (B)$

Now, we see a connection with the sign of a permutation: it's the only nontrivial way we know (and in fact it's the only way to do it at all!) to assign a scalar value to a permutation, which in this special case we know the determinant must do.

Huh? Off the top of my head, here's another way to assign a scalar value to a permutation: multiply together the lengths of all the cycles it contains. (No idea whether this is useful for anything. Taking the least common multiple of the lengths of all the cycles tells you the order of the permutation, i.e. how many times you have to apply it before you get the identity, though.)

The assignment has to commute with multiplication, and your proposed assignment would not. Just consider, say, .

I've edited the post to make this clearer, thanks for the comment.

Thanks. Yeah, I knew there was some qualifier missing that would make it true, I just couldn't intuit exactly what it was.

Edited to add: Actually I would say that the determinant distributes through multiplication. Commutativity: . Distributivity: $a ⨀ (b ⨁ c) = (a ⨀ b) ⨁ (a ⨀ c)$ . Neither is a perfect analog, because the determinant is a unary operation, but distributivity at least captures that there are two operations involved. But unlike my other comment, this one doesn't actually impair comprehension, as there's not really a different thing you could be trying to say here using the word "commutes".

The theorem proved here is that if d : square matrices -> numbers does what the determinant does to permutation matrices and is "linear on rows", then d is precisely the determinant. (Or, equivalently: if d is alternating and linear on rows, then it is precisely the determinant.)

Since the motivation for the first condition is that you want d(AB) = d(A) d(B), it may be worth pointing out that it's also true that if d(AB) = d(A) d(B) and d is "linear on rows" then d is either identically 0 or precisely the determinant.

You can't do this by saying that if d(AB) = d(A) d(B) for permutation matrices then it has to be the same as the determinant for those, because that isn't true: you could have d(A)=1 whenever A is a permutation matrix. Also, you'd need a proof of something stated but not proved in the OP, namely that the only ways to map permutations to numbers multiplicatively are always-1 and what the determinant does. (Though that's pretty easy.)

Anyway, here's a sketch of how to prove that multiplicativity + row-linearity => being the determinant.

First, consider the matrix C that you get by starting with the identity and then moving one of the 1s on its diagonal to a different place on that row. Premultiplication by this matrix is the operation of copying one row on top of another. We must have d(C)=0, because for any matrix A we have CA = CB where B is what you get by replacing the "copied-onto" row of A with all-zeros, and d(B)=0 by row-linearity, so d(C)d(A)=0 whatever A is, so either d maps everything to 0 or d(C)=0. In either case d(C) = 0.

Any matrix with two equal rows is CA for some A, so any matrix with two equal rows maps to 0.

So now take any matrix A, pick two rows, and write f(u,v) for what you get when you overwrite those two rows of A with u and v respectively and apply d to the result. By row-linearity we have f(u+v,u+v) = f(u,u) + f(u,v) + f(v,u) + f(v,v). But f(w,w)=0 for any w, so this says f(u,v) + f(v,u) = 0: swapping two rows changes the sign of d.

Now, d(identity) is its own square, so is either 0 or 1. If it's 0 then d is identically zero. Otherwise, d(identity)=1; then d(T)=-1 where T is the permutation matrix for any transposition; any permutation is a product of transpositions, so if P is any permutation matrix that's a product of k transposition, d(P) = (-1)^k. In other words, d must do to permutation matrices the same thing that the determinant does. And now we can apply the argument in the OP.

LESSWRONG
LW

LESSWRONG
LW

28

Whence the determinant?

28

Signs

Finding the determinant

Takeaways

28

28