Then the eigenvectors of consist precisely of the entries on the diagonal of that upper-triangular matrix
I think this is a typo and should be "eigenvalues" instead of "eigenvectors"?
The determinant is negative when the operator flips all the vectors it works on.
This could be misleading. E.g. the operator f(v) := -v that literally just flips all vectors has determinant (-1)^n, where n is the dimension of the space it's working on. The sign of the determinant tells you whether an operator flips the orientation of volumes, it can't tell you anything about what it does to individual vectors.
(Regarding "orientation of volumes": in the 2D case, think of R^2 as a sheet of paper, then f(v) := -v is just a 180 degree rotation, so the same side stays up, and the determinant is positive. In contrast, flipping along an axis requires turning over the paper, so negative determinant. Unfortunately this can't really be visualized the same way in 3D, so then you have to think about ordered bases.)
Let now specifically be a one-dimensional subspace of such that, for all ,
I think such can not exist in most cases, and it should instead read '... for some ...'
The expression for is describing the span of the vector , so certainly if is more than one-dimensional, if some subspace has this property for all then it has this property for linearly independent vectors in , which is a contradiction.
The definition of matrix ("the basis maps to:") ought to come after the "uniquely determines the linear map" that justifies it.
For interpreting v as a slim matrix, I would use bra-ket notation: |v> for the function of type V <- R, <v| for the function whose type is the dual R <- V. Then <v|v> has type R <- R (and corresponds to multiplication by a scalar) and |v><v| has type V <- V.
An inner product just maps |v> to <v|. (Though I don't quite see what the symmetry is for.)
Mapping a point cloud through a linear map thins it by a factor of the determinant; this generalizes to smooth maps, since they are locally linear.
Epistemic status: A brisk walkthrough of (what I take to be) the highlights of this book's contents.
The big one for mathematically understanding ML!
The idea responsible for getting me excited about linear algebra is:
Linear algebra is about the tripartite relationship between (1) homomorphisms[1] between vector spaces, (2) sets of equations, and (3) grids of numbers.
However, grids of numbers ('matrices'), the usual star of the show in a presentation of linear algebra, aren't foregrounded in this book. Instead, this is a book chiefly treating the homomorphisms ('linear maps') themselves, directly.
Contents and Notes
1. Vector Spaces
Vector spaces are fairly substantial mathematical structures, if you're pivoting out of thinking about set theory! Intuitively, a vector space is a space Rn for which (1) ray addition and (2) scaling rays (emanating from the origin out to points)[2] are both nicely defined.
Precisely, a vector space is a set V defined over a field F[3] in which
- V is closed under vector addition, and vector addition is commutative, associative, there is an additive identity →0, and there is an additive inverse for every vector →v∈V;
- V is closed under scalar multiplication, scalar multiplication is associative, and there is a multiplicative identity 1;
- and vector addition and scalar multiplication are connected by distribution such that, for all a,b∈F and →v,→x∈V,[4]
a(→v+→x)=a→v+a→x(a+b)→v=a→v+b→vA subspace S of a vector space V is any subset S⊂V that is still itself a vector space, under the same two operations of V. Vector spaces can be decomposed into their subspaces, where you think of adding vectors drawn the different subspace via their common addition operation.
2. Finite-Dimensional Vector Spaces
You live at the origin of R3, and your tools are the vectors that emanate out from your home. Because we have both vector addition and scalar multiplication, we have two ways of extending (or shortening) any single vector out from the origin arbitrarily far. If we're interested in reaching points in R3, one immediate way to get to points we didn't have a vector directly to... is by extending a too-short vector pointed in the right direction! Furthermore, because we can always multiply a vector by −1 to reverse its direction, both the exactly right and exactly wrong directions will suffice to reach out and touch a point in R3.
We can also use vector addition to add two vectors pointing off in differing directions (directions which aren't exact opposites). If we have vectors →v=[0.5,0,0]T, →x=[0,45,0]T, and →q=[0,0,0.11]T,[5] we have all the tools we need to produce any vector in R3! The awkward lengths of all the vectors are irrelevant, because we can scale all of them arbitrarily. We use some amount of vertical, horizonal, and z-dimensional[6] displacement to get to anywhere via addition and multiplication! More formally, we say that the set {→v,→x,→q} spans R3.
Intuitively, a minimal spanning set is called a basis for a vector space. {→v,→x,→q} is a basis for the vector space R3, because none of the vectors are "redundant": you could not produce every vector in R3 without all three elements in {→v,→x,→q}. If you added any further vector to that spanning set, though, the set would now have a redundant vector, as R3 is already spanned. The set would no longer be a minimal spanning set in this sense, and so would cease to be a basis for R3.
Every finite-dimensional, nonzero[7] vector space containing infinitely many vectors has infinitely many bases (pp. 29-32). Each basis for an n-dimensional vector space is a set containing n vectors, where each vector is an ordered set containing n numbers drawn from F (p. 32).
3. Linear Maps
Intuitively, a linear map is a function that translates addition and multiplication between two vector spaces.
Formally, a linear map f:V→W is a function from a vector space V to a vector space W (taking vectors and returning vectors) such that
f(→v+→x)=f(→v)+f(→x)f(a→v)=a(f(→v))for all →v,→x∈V; all f(→v),f(→x)∈W; and all a∈F. Note that both are homomorphism properties: one for addition across vector spaces and one for multiplication across vector spaces! We'll call the former relationship additivity, the latter, homogeneity.
The symbol L(V,W) stands for the set of all the linear maps from V to W.[8]
Some example linear maps (pp. 38-9) include:
f1(→v)=0→vf2(→v)=→vWhen the vector spaces are specifically the set of all real-valued polynomials p(x):[9]
f3(→p)=dp(x)dxf4(→p)=∫p(x)dxtranslating between →p and p(x).
As linear maps are functions, they can be composed when they have matching domains and co-domains, giving us our notion of products between linear maps.
The kernel of a linear map f∈L(V,W) is the subset ker(f)⊂V containing all and only the vectors →v∈V that f maps to →0∈W. Note that linear maps can only "get rid" of vectors by shrinking them down all the way, i.e., by sending them to →0. If a function between vector spaces simply sent everything to a nonzero vector, it would violate the linear map axioms! All kernels are subspaces of V (p. 42). A linear map is injective whenever ker(f)={→0} (p. 43).
The image im(f) of f is the subset of W covered by some f(→v). All images are subspaces of W (p. 44). A linear map is obviously surjective whenever im(f)=W.
The Matrix of a Linear Map
A matrix M is an array of numbers, with m rows and n columns:
M=⎡⎢ ⎢ ⎢⎣a1,1⋯a1,n⋮⋱⋮am,1⋯am,n⎤⎥ ⎥ ⎥⎦(Matrices are a generalization of vectors into the horizontal dimension, and vectors can be thought of as skinny m-by-1 matrices.)
The vector f(→v)=M(f)→v, with matrix multiplication on the right side of the equation (pp. 53-4).
4. Polynomials
5. Eigenvalues and Eigenvectors
We now begin our study of operator theory!
Operators are linear maps from V to itself. Notationally, L(V):=L(V,V).
We call a subspace S⊂V invariant under f∈L(V) if, for all →s∈S,
f(→s)∈SLet S now specifically be a one-dimensional subspace of V such that, fixing any nonzero →v∈V,
S={a→v:a∈F}In the above equation f(→v)=λ→v, the scalar λ is called an eigenvalue of f, and the corresponding vector →v is called an eigenvector of f.
Polynomials Applied to Operators
An operator raised to a power m is just that operator composed with itself m times.
Because we have a notion of functional products, functional sums, and now operators raised to powers, we can now construct arbitrary polynomials with operators as the variables!
Upper-Triangular Matrices
A square matrix is an m×m matrix.
An upper-triangular matrix is a square matrix for which all entries under the principal diagonal equal 0.
Diagonal Matrices
A diagonal matrix is a square matrix for which all entries off the principal diagonal equal 0.
6. Inner-Product Spaces
where xn is the nth entry in →x, and similarly for yn and →y (p. 98; notation converted).
Inner products are just a generalization of dot products to arbitrary vector spaces V. (With some finagling, both dot products and inner products generally can be interpreted as linear maps.) An inner-product space is an ordered set containing a vector space V and an inner product on it.
Intuitively, the norm of a vector is the length of that vector, interpreted as a ray, from the origin to its tip. More formally, the norm of a vector →v in an inner-product space is defined to be the square root of the inner product of that vector →v with itself:
∥→v∥:=√→v⋅→vNote that this looks just like c=√a2+b2, the Pythagorean theorem for the sides a,b,c of a right triangle in Euclidian space. That's because other inner products on other vector spaces are meant to allow for a generalization of the Pythagorean theorem in those vector spaces!
Intuitively, two vectors are orthogonal when they're perpendicular. Formally, two vectors are called orthogonal when their inner product is 0. With the opposite and adjacent sides →a,→b of the right unit triangle in the vector space R2,
→a⋅→b=a1b1+a2b2=(0)(1)+(1)(0)=0"It's all just right triangles, dude."
7. Operators on Inner-Product Spaces
The singular values of f are the eigenvalues of √f∗f, where each eigenvalue λ is repeated dimker(f∗f−λI) times (p. 155).
8. Operators on Complex Vector Spaces
9. Operators on Real Vector Spaces
The Cayley-Hamilton theorem also holds on complex vector spaces generally (p. 173).
10. Trace and Determinant
Intuitively, the determinant of an operator f is the change in volume f effects. The determinant is negative when the operator
flips all the vectors"inverts the volume" it works on.Intuitively, a a homomorphism is a function showing how the operation of vector addition can be translated from one vector space into another and back.
More precisely, a homomorphism is a function (here, from a vector space V to a vector space W) such that
f(→v+→x)=f(→v)+f(→x)with →v,→x∈V and f(→v),f(→x)∈W.
The vector addition symbol + on the left side of the equality, inside the function, is defined in V, and the addition symbol + on the right side of the equality, between the function values, is defined in W.
Vectors can be interpreted geometrically as rays from the origin out to points in a space. Vectors can also be understood algebraically as ordered sets of numbers (with each number representing a coordinate over in the ray interpretation).
As far as notation goes, we'll use variables with arrows →v for vectors, lowercase variables x for numbers, and capital variables V for other larger mathematical structures, such as vector spaces.
In this book, that field F will be either the reals R or the complexes C.
Take note of how homomorphism-ish the below distributive relationships are!
Vectors are conventionally written vertically. But each vector →v=[10] has a transpose [1,0]T=→v=[10], where the vector is written out horizontally instead.
So we'll use vector transposes to stay in line with conventional notation while not writing out those giant vertical vectors everywhere.
One deep idea out of mathematics is that the dimensionality of a system is just the number of variables in that system that can vary independently of every other variable. You live in 3-dimensional space because you can vary your horizontal, vertical, and z-dimensional position without necessarily changing your position in the other two spatial dimensions by doing so.
Note that the set {→0}, where →0 is a vector containing only 0 any number n∈N of times, satisfies the vector space axioms!
→0+→0=→0=(→0+→0)+→0=→0+(→0+→0)establishes closure under addition, existence of an additive identity, existence of an additive inverse for all vectors, additive commutativity, and additive associativity. Letting the field be the reals with n,m∈R
n→0=→0=m(n→0)=(mn)→0=1(→0)establishes closure under multiplication, multiplicative associativity, and the existence of a multiplicative identity. Finally,
n(→0+→0)=n→0+n→0=→0=(n+m)→0=n→0+m→0establishes distributivity.
Any such vector space {→0} has just one basis, ∅. Intuitively, since you live at the origin, the origin is already spanned by no vectors at all -- i.e., the empty set of vectors. Any additional vector would be redundant, so no other sets constitute bases for {→0}.
In math, the bigger and/or fancier the symbol, the bigger the set or class that symbol usually stands for.
A vector →p can stand for a polynomial by containing all the coefficients in the polynomial, coefficients ordered by the degree of each coefficient's monomial.
This is addition of functions, (f+g)x=f(x)+g(x), on the left side of the equation. I is the identity function.
dimV is the dimension of V, formalized as the number of vectors in any basis of V.
Intuitively, orthonormal sets are nice sets of vectors like {[1,0,0]T,[0,1,0]T,[0,0,1]T}, where each vector has length one and is pointing out in a separate dimension.
More precisely, a set of vectors is called orthonormal when its elements are pairwise orthogonal and each vector has a norm of 1. We will especially care about orthonormal bases, like the set above with respect to R3.
The adjoint of a linear map f:V→W is a linear map f∗:W→V such that the inner product of f(→v) and →w equals the inner product of →v and f∗(→w) for all →v∈V and →w∈W.
Remember that inner products aren't generally commutative, so the order of arguments matters. Adjoints feel very anticommutative.
An operator f∈L(V) on an inner-product space V is called normal when
ff∗=f∗fAn operator f is self-adjoint when f=f∗.
Characteristic polynomials can also be defined for real vector spaces, though the reals are a little less well behaved as vector spaces than the complexes.