Very nice! Two mistakes though:
Also there are a lot of discontinuous linear maps out there. A textbook example is considering the vector space of polynomials interpreted as functions on the closed interval , equipped with supremum norm. The derivative map is not continuous, and you can verify this directly by searching for a sequence of functions that converges to 0 whose image does not converge to 0.
Probably too late at this point for you, but in case other people come along... I'd recommend learning functional analysis first in the context of a theoretical mechanics course/textbook, rather than a math course/textbook. The physicists tend to do a better job explaining the intuitions (and give far more exposure to applications), which I find is the most important thing for a first exposure. Full rigorous detail is something you can pick up later, if and when you need it.
Personally I did the exact opposite, and found that very refreshing. Whenever I ran into a snippet of applied functional analysis without knowing the formal background it just confused me.
Foreword
What is functional analysis? A satisfactory answer requires going back to where it all started.
A Friendly Approach to Functional Analysis
I didn't actually find the book overly hard (it took me seven days to complete, which is how long it took for my first book, Naïve Set Theory), although there were some parts I skipped due to unclear exposition. it's actually one of my favorite books I've read in a while – it's for sure my favorite since the last one. That said, I'm very glad I didn't attempt this early in my book-reading journey.
My brain won't stop line to me
Some part of me insisted that the left-shift mapping
(x1,x2,…)↦(x2,x3,…):ℓ∞→ℓ∞
is "non-linear" because it incinerates x1! But wait, brain, this totally is linear, and it's also continuous with respect to the ambient supremum norm!
Formally, a map T is linear when T(αx+βy)=αT(x)+βT(y).
Informally, linearity is about being able to split a problem into small parts which can be solved individually. It doesn't have to "look like a line", or something. In fact, lines[1] y=mx are linear because putting in Δx more x gets you m⋅Δx more y!
Linearity and continuity
Two things surprised me.
First, a(n infinite-dimensional) linear function can be discontinuous. (?!)
Second, a linear function T is continuous if and only if it is bounded; that is, there is an M>0 such that ∀x,x0:||T(x−x0)||≤M||x−x0||.
What the hell are functional derivatives?
Derivatives tell you how quickly a function is changing in each input dimension. In single-variable calculus, the derivative of a function f:R→R is a function f′:R→R.
In multi-variable calculus, the derivative of a function g:Rn→R is a function g′:Rn→Rn – for a given n-dimensional input vector, the real-valued output of g can change differently depending on in which input dimension change occurs.
You can go even further and consider the derivative of h:Rn→Rm, which is the function h′:Rn→Rn×m – for a given n-dimensional input vector, h again can change its vector-valued output differently depending on in which input dimension change occurs.
But what if we want to differentiate the following function, with domain C[a,b] and range R:
L(f):=∫10(f(t))2dt.
How do you differentiate with respect to a function? I'm going to claim that
L′f(g)=∫102f(t)g(t)dt.
It's not clear why this is true, or what it even means. Here's an intuition: at any given point, there are uncountably many partial derivatives in the function space C[a,b] – there are many, many "directions" in which we could "push" a function f around. L′f(g) gives us the partial derivative at f with respect to g.
This concept is important because it's what you use to prove e.g. that a line is the shortest continuous path between two points.
Below is an exchange between me (in plain text) and TheMajor (quoted text), reproduced and slightly edited with permission.
I'm having trouble understanding functional derivatives. I'm used to thinking about derivatives as with respect to time, or with respect to variations along the input dimensions. But when I think about a derivative on function space, I'm not sure what the "time" is, even though I can think about the topology and the neighborhoods around a given function.
And I know the answer is that there isn't "time", but I'm not sure what there is.
An interesting concept that comes to mind is thinking about a functional derivative with respect to e.g. a straight-line homotopy, where you really could say how a function is changing at every point with respect to time. But I don't think that's the same concept.
By normal map, is that something like a normal operator?
Wouldn't it still output a function, g′ maybe? wait. Would the derivative wrt λ just be g?
ah ya. duh (ETA: my brain was still acting as if differentiation had to be from the real numbers to the real numbers, so it searched for a real/complex number in the problem formalization and found λ.)
Unfortunately, I don't think it's clear yet. So I see how this is a one-dimensional subspace,[2] because it's generated by one basis function (g).
But I don't see how this translates to a normal complex derivative, in particular, I don't quite understand what the range of this function is.
I guess I'm confused why we're using that type signature if we're taking a derivative on the whole function – but maybe that'll be clear after I get the rest.
Okay, that makes sense so far.
So, given some arbitrary function L:X→C which is "differentiable" at f, we define a function L′f:g↦ (derivative of L at f with respect to g)?
You could even maybe think of each input g as projecting the derivative of L at f? Or specifying one of many possible directions.
this sounds pretty computationally easy? Or are you calculating L′ for a general test function g, in which case, how do you get any nontrivial information out of that?
ETA: Back in my Topology review, I discussed a similar phenomenon: continuity in multiple input dimensions requires not just continuity in each input variable, but in all sequences converging to the point in question:
"Continuity in the variables says that paths along the axes converge in the right way. But for continuity overall, we need all paths to converge in the right way. Directional continuity when the domain is R is a special case of this: continuity from below and from above if and only if continuity for all sequences converging topologically to x."
Similarly, for a function to be differentiable, the existence of all of its partial derivatives isn't enough – you need derivatives for every possible approach to the point in question. Here, the existence of all of the partials automatically guarantees the derivatives for every possible approach, because there's a partial for every function.
yeah, because L′ has to exist for… all g? That seems a little tough.
hm. That's because of the definition of linearity, right? it's a homomorphism for both the operations of addition and scalar multiplication... Wait, I intuitively understand why linearity means it's the same everywhere, but I'm having trouble coming up with the formal justification…
ah, got it!
I'm ready to be reconfused.
Other notes
Final thoughts
The book is pretty nice overall, with some glaring road bumps – apparently, the Euler-Lagrange equation is one of the most important equations of all time, and Sasane barely spends any effort explaining it to the reader!
And if I didn't have the help of TheMajor, I wouldn't have understood the functional derivative, which, in my opinion, was the profoundly important insight I got from this book. My models of function space structure feel qualitatively improved. I can look at a Fourier transform and see what it's doing – I can feel it, to an extent. Without a doubt, that single insight makes it all worth it.
Forward
I'm probably going to finish up an epidemiology textbook, before moving on to complex analysis, microeconomics, or... something else – who knows! If you're interested in taking advantage of quarantine to do some reading, feel free to reach out and maybe we can work through something together. 🙂
Lines y=mx+b (b≠0) aren't actually linear functions, because they don't go through the origin. Instead, they're affine. ↩︎
To be more specific, f+Cg:={f+λg:λ∈C} is often an affine subspace, because the zero function is not necessarily a member. ↩︎