Introduction
A recent popular tweet did a "math magic trick", and I want to explain why it works and use that as an excuse to talk about cool math (functional analysis). The tweet in question:
This is a cute magic trick, and like any good trick they nonchalantly gloss over the most important step. Did you spot it? Did you notice your confusion?
Here's the key question: Why did they switch from a differential equation to an integral equation? If you can use when , why not use it when ?
Well, lets try it, writing for the derivative:
So now you may be disappointed, but relieved: yes, this version fails, but at least it fails-safe, giving you the trivial solution, right?
But no, actually can fail catastrophically, which we can see if we try a nonhomogeneous equation like (which you may recall has solution ):
However, the integral version still works. To formalize the original approach: we define the function (for integral) to take in a function and produce the function defined by . This rigorizes the original trick, elegantly incorporates the initial conditions of the differential equation, and fully generalizes to solving nonhomogeneous versions like (left as an exercise to the reader, of course).
So why does fail, but works robustly? The answer is functional analysis!
Functional Analysis
Savvy readers may already be screaming that the trick for numbers only holds true for , and this is indeed the key to explaining what happens with and ! But how can we define the "absolute value" of "the derivative function" or "the integral function"?
What we're looking for is a norm, a function that generalizes absolute values. A norm is a function satisfying these properties:
- for all (positivity), and if and only if (positive-definite)
- for all and (triangle inequality)
- for all and real numbers , where denotes the usual absolute value (absolute homogeneity)
Here's an important example of a norm: fix some compact subset of , say , and for a continuous function define , which would commonly be called the -norm of . (We may use a maximum here due to the Extreme Value Theorem. In general you would use a supremum instead.) Again I shall leave it to the reader to check that this is a norm.
This example takes us halfway to our goal: we can now talk about the "absolute value" of a continuous function that takes in a real number and spits out a real number, but and take in functions and spit out functions (what we usually call an operator, so what we need is an operator norm).
Put another way, the -norm is "the largest output of the function", and this will serve as the inspiration for our operator norm. Doing the minimal changes possible, we might try to define . There are two problems with this:
- First, since is linear, you can make arbitrarily large by scaling by 10x, or 100x, etc. We can fix this by restricting the set of valid f for these purposes, just like how for the example restricted the inputs of to the compact set . Unsurprisingly nice choice of set to restrict to is the "unit ball" of functions, the set of functions with .
- Second, we must bid tearful farewell to the innocent childhood of maxima, and enter the liberating adulthood of suprema. This is necessary since ranges over the infinite-dimensional vector space of continuous functions, so the Heine-Borel theorem no longer guarantees the unit ball is compact, and therefore the extreme value theorem no longer guarantees that we will attain a maximum.
So the proper definition of the norm of and are:
(and you can define similar norms for any linear operator, including , etc.) A good exercise is to show these equivalent definitions of the operator norm for any linear function L:
So another way of thinking of the operator norm is the maximum stretching factor of the linear operator. The third definition also motivates the terminology of bounded linear operators: each such is a bound on the operator , and the least such bound is the norm. Fun exercise: show that a linear operator is bounded if and only if it is continuous (with respect to the correct topologies). Hint: you'll need to work in infinite dimensional spaces here, because any finite-dimensional linear operator must be bounded.
Now let's actually compute these norms! For , remember that our -norm is defined over the interval . First observe that for the constant function , , so . Thus . To show that this is indeed the maximum we use the triangle inequality for integrals:
So we have shown ! Put a pin in that while we check .
For , we have a problem: for any positive number , . In other words, can stretch functions by any amount, so it has no norm, or we'd write (and I promise this is a failure of , not of our definitions). Put another way, is not bounded as a linear operator, since it can stretch functions by an arbitrary amount.
But now let's return to . We said that (if we're defining it relative to the -norm on ), but isn't only true when ? For real numbers, yes, but for operators, something magical happens: ! (Its like there's a whole algebra of these operators...)
In fact, you can show that assumes its maximum value when applied to the constant function , and hence have . Since grows faster than exponential functions, is converging to 0 quickly, so is a Cauchy sum, and it is then straightforward to show that the limit is the multiplicative inverse of . Thus, is a valid expression that you can apply to any continuous (or bounded) function on any compact set . This convergence happens regardless of the choice of the compact set, though it will happen at different rates, analogous to uniform convergence on compact sets.
Summary
- Writing for derivative and for integral, we showed that can fail, even though is always true.
- To explain this, we have to show that is fundamentally better behaved than , in a way analogous to .
- We built this up in two steps. First, we defined the -norm for real-valued functions, which lets you say how "large" those functions are. Then, we extended this to function-valued functions (operators), having to make two slight modifications along the way.
- With this machinery in place, we could show that , or we can say that is bounded. The resulting norm depends on the domain of the functions under consideration, but any compact domain is allowable. Also, since , the exact value doesn't matter since the norm of each term goes to 0.
- Since sufficiently quickly, we can say that is Cauchy as a sequence of operators. In other words, if you apply the partial sums as operators to any function , the functions will converge with respect to the -norm. Writing for the function they converge to, it follows that , so we may write as a statement about linear operators.
- In contrast, is unbounded as an operator, meaning . Thus algebra tricks like will break down if you put in the wrong function .
Heh, sure.
Promote f from a function to a linear operator on the space of functions, F. The action of this operator is just "multiply by f". We'll similarly define F∼,F∼2 meaning to multiply by the first, second integral of f, etc.
Observe:
IF=F∼−IF∼D
IF=F∼−F∼2D+F∼3D2−⋯
Now we can calculate what we get when applying k times. The calculation simplifies when we note that all terms are of the form F∼a(−D)(a−k). Result:
IkF=∞∑j=k(j−1k−1)F∼j(−D)j−k
Now we apply the above operator to p:
IkFp=∞∑j=k(j−1k−1)F∼j(−D)j−kp
Ik(fp)=∞∑j=k(j−1k−1)(Ijf)(−D)j−kp
The sum terminates because a polynomial can only have finitely many derivatives.