This doesn't completely explain the trick, though. In the step where you write f=(1-I)^{-1} 0, if you interpret I as an operator then you get f=0 as the result. To get f=Ce^x you need to have f=(1-I)^{-1} C in that step instead. You can get this by replacing \int f by If+C at the beginning.
Ah sorry, I skipped over that derivation! Here's how we'd approach this from first principals: to solve f=Df, we know we want to use the (1-x)=1+x+x^2+... trick, but now know that we need x=I instead of x=D. So that's why we want to switch to an integral equation, and we get
f=Df
If=IDf = f-f(0)
where the final equality is the fundamental theorem of calculus. Then we rearrange:
f-If=f(0)
(1-I)f=f(0)
and solve from there using the (1-I)=1+I+I^2+... trick! What's nice about this is it shows exactly how the initial condition of the DE shows up.
Here's a puzzle I came up with in undergrad, based on this idea:
Let be a function with nice derivatives and anti-derivatives (like exponentials, sine, or cosine) and be a polynomial. Express the th anti-derivative of in terms of derivatives and anti-derivatives of and .
Can provide link to a post on r/mathriddles with the answer in the comments upon request
Use integration by parts:
Then is another polynomial (of smaller degree), and is another "nice" function, so we recurse.
This is true, but I'm looking for an explicit, non-recursive formula that needs to handle the general case of the kth anti-derivative (instead of just the first).
The solution involves doing something funny with formal power series, like in this post.
Heh, sure.
Promote from a function to a linear operator on the space of functions, . The action of this operator is just "multiply by ". We'll similarly define meaning to multiply by the first, second integral of , etc.
Observe:
Now we can calculate what we get when applying times. The calculation simplifies when we note that all terms are of the form . Result:
Now we apply the above operator to :
The sum terminates because a polynomial can only have finitely many derivatives.
Very nice! Notice that if you write as , and play around with binomial coefficients a bit, we can rewrite this as:
which holds for as well, in which case it becomes the derivative product rule. This also matches the formal power series expansion of , which one can motivate directly
(By the way, how do you spoiler tag?)
You can make work out, if you are prepared to make your mathematics even more deranged.
So lets look at
Think of the not as but as some infinitesimal times some unknown function .
If that function is then we get which is finite, so multiplied by it becomes infinitesimal.
If then we get and as we know because
So this case is the same as before.
But for we get which doesn't converge. The infinite largeness of this sum cancels with the infinitesimally small size of (Up to an arbitrary finite constant).
So
Great. Now lets apply the same reasoning to
. First note that this is infinite, it's , so undefined. Can we make this finite. Well think of as actually being and in this case, take
For the final term, the smallness of epsilon counteracts having to sum to infinity. For the first and middle term, the sum is
Which is
Now
So we have
The first term is negligible. So
Note that the can be ignored, because we have for arbitrary (finite) C as before.
Now is big, but it's probably less infinite than somehow. Let's just group it into the and hope for the best.
Edit: looks like was already raised by Dacyn and answered to my satisfaction by Robert_AIZI. Correctly applying the fundamental theorem of calculus will indeed prevent that troublesome zero from appearing in the RHS in the first place, which seems much preferable to dealing with it later.
My real analysis might be a bit rusty, but I think defining I as the definite integral breaks the magic trick.
I mean, in the last line of the 'proof', gets applied to the zero function.
Any definitive integral of the zero function is zero, so you end up with f(x)=0, which is much less impressive.
More generally, asking the question Op(f)=0 for any invertable linear operator Op is likely to set yourself up for disappointment. Since the trick relies on inverting an operator, we might want to use a non-linear operator.
where C is some global constant might be better. (This might affect the radius of convergence of that Taylor series, do not use for production yet!)
This should result in... uhm... ?
Which is a lot more work to reorder than the original convention used in the 'proof' where all the indefinite integrals of the zero function are conveniently assumed to be the same constant, and all other indefinite integrals conveniently have integration constants of zero.
Even if we sed s/C// and proclaim that should be small (e.g. compared to x) and we are only interested in the leading order terms, this would not work. What one would have to motivate is throwing everything but the leading power of x out for every evaluation, then later meticulously track these lower order terms in the sum to arrive at the Taylor series of the exponential.
Introduction
A recent popular tweet did a "math magic trick", and I want to explain why it works and use that as an excuse to talk about cool math (functional analysis). The tweet in question:
This is a cute magic trick, and like any good trick they nonchalantly gloss over the most important step. Did you spot it? Did you notice your confusion?
Here's the key question: Why did they switch from a differential equation to an integral equation? If you can use (1−x)−1=1+x+x2+... when x=∫, why not use it when x=d/dx?
Well, lets try it, writing D for the derivative:
f′=f(1−D)f=0f=(1+D+D2+...)0f=0+0+0+...f=0
So now you may be disappointed, but relieved: yes, this version fails, but at least it fails-safe, giving you the trivial solution, right?
But no, actually (1−D)−1=1+D+D2+... can fail catastrophically, which we can see if we try a nonhomogeneous equation like f′=f+ex (which you may recall has solution xex):
f′=f+ex(1−D)f=exf=(1+D+D2+...)exf=ex+ex+ex+...f=∞?
However, the integral version still works. To formalize the original approach: we define the function I (for integral) to take in a function f(x) and produce the function If defined by If(x)=∫x0f(t)dt. This rigorizes the original trick, elegantly incorporates the initial conditions of the differential equation, and fully generalizes to solving nonhomogeneous versions like f′=f+ex (left as an exercise to the reader, of course).
So why does (1−D)−1=1+D+D2+... fail, but (1−I)−1=1+I+I2+... works robustly? The answer is functional analysis!
Functional Analysis
Savvy readers may already be screaming that the trick (1−x)−1=1+x+x2+... for numbers only holds true for |x|<1, and this is indeed the key to explaining what happens with D and I! But how can we define the "absolute value" of "the derivative function" or "the integral function"?
What we're looking for is a norm, a function that generalizes absolute values. A norm is a function x↦||x|| satisfying these properties:
Here's an important example of a norm: fix some compact subset of R, say X=[−10,10], and for a continuous function f:X→R define ||f||∞=maxx∈X|f(x)|, which would commonly be called the L∞-norm of f. (We may use a maximum here due to the Extreme Value Theorem. In general you would use a supremum instead.) Again I shall leave it to the reader to check that this is a norm.
This example takes us halfway to our goal: we can now talk about the "absolute value" of a continuous function that takes in a real number and spits out a real number, but D and I take in functions and spit out functions (what we usually call an operator, so what we need is an operator norm).
Put another way, the L∞-norm is "the largest output of the function", and this will serve as the inspiration for our operator norm. Doing the minimal changes possible, we might try to define ||I||=maxf continuous||If||∞. There are two problems with this:
So the proper definition of the norm of I and D are:
||I||=supf continuous,||f||∞≤1||If||∞
||D||=supf continuous,||f||∞≤1||Df||∞
(and you can define similar norms for any linear operator, including I2, etc.) A good exercise is to show these equivalent definitions of the operator norm for any linear function L:
||L||=supf continuous,||f||∞≤1||Lf||∞=supf continuous,||f||∞=1||Lf||∞=inf{b≥0:||Lf||∞≤b||f||∞ for all f}
So another way of thinking of the operator norm is the maximum stretching factor of the linear operator. The third definition also motivates the terminology of bounded linear operators: each such b is a bound on the operator L, and the least such bound is the norm. Fun exercise: show that a linear operator is bounded if and only if it is continuous (with respect to the correct topologies). Hint: you'll need to work in infinite dimensional spaces here, because any finite-dimensional linear operator must be bounded.
Now let's actually compute these norms! For I, remember that our L∞-norm is defined over the interval X=[−10,10]. First observe that for the constant function f(x)=1, If(x)=x, so ||If||∞=10. Thus ||I||≥10. To show that this is indeed the maximum we use the triangle inequality for integrals:
||If||=maxx∈X|∫x0f(t)dt|≤maxx∈X∫x0|f(t)|dt≤maxx∈X∫x01dt=maxx∈Xx=10
So we have shown ||I||=10! Put a pin in that while we check ||D||.
For D, we have a problem: for any positive number k, D(ekx)=kekx. In other words, D can stretch functions by any amount, so it has no norm, or we'd write ||D||=∞ (and I promise this is a failure of D, not of our definitions). Put another way, D is not bounded as a linear operator, since it can stretch functions by an arbitrary amount.
But now let's return to I. We said that ||I||=10 (if we're defining it relative to the L∞-norm on X=[−10,10]), but isn't (1−x)−1=1+x+x2+... only true when |x|<1? For real numbers, yes, but for operators, something magical happens: ||In||≠||I||n! (Its like there's a whole algebra of these operators...)
In fact, you can show that ||In|| assumes its maximum value when applied to the constant function f(x)=1, and hence have ||In||=1n!||I||n. Since n! grows faster than exponential functions, ||In|| is converging to 0 quickly, so 1+I+I2+I3+... is a Cauchy sum, and it is then straightforward to show that the limit is the multiplicative inverse of 1−I. Thus, (1−I)−1=1+I+I2+... is a valid expression that you can apply to any continuous (or bounded) function f on any compact set X. This convergence happens regardless of the choice of the compact set, though it will happen at different rates, analogous to uniform convergence on compact sets.
Summary