Introduction

A recent popular tweet did a "math magic trick", and I want to explain why it works and use that as an excuse to talk about cool math (functional analysis). The tweet in question:

Image

This is a cute magic trick, and like any good trick they nonchalantly gloss over the most important step. Did you spot it? Did you notice your confusion?

Here's the key question: Why did they switch from a differential equation to an integral equation? If you can use  when , why not use it when 

Well, lets try it, writing  for the derivative:

So now you may be disappointed, but relieved: yes, this version fails, but at least it fails-safe, giving you the trivial solution, right?

But no, actually  can fail catastrophically, which we can see if we try a nonhomogeneous equation like  (which you may recall has solution ):

However, the integral version still works. To formalize the original approach: we define the function  (for integral) to take in a function  and produce the function  defined by . This rigorizes the original trick, elegantly incorporates the initial conditions of the differential equation, and fully generalizes to solving nonhomogeneous versions like  (left as an exercise to the reader, of course).

So why does  fail, but  works robustly? The answer is functional analysis!

Functional Analysis

Savvy readers may already be screaming that the trick  for numbers only holds true for , and this is indeed the key to explaining what happens with  and ! But how can we define the "absolute value" of "the derivative function" or "the integral function"?

What we're looking for is a norm, a function that generalizes absolute values. A norm is a function  satisfying these properties:

  1.  for all  (positivity), and  if and only if  (positive-definite)
  2.  for all  and  (triangle inequality)
  3.  for all  and real numbers , where  denotes the usual absolute value (absolute homogeneity)

Here's an important example of a norm: fix some compact subset of , say , and for a continuous function  define , which would commonly be called the -norm of . (We may use a maximum here due to the Extreme Value Theorem. In general you would use a supremum instead.) Again I shall leave it to the reader to check that this is a norm.

This example takes us halfway to our goal: we can now talk about the "absolute value" of a continuous function that takes in a real number and spits out a real number, but  and  take in functions and spit out functions (what we usually call an operator, so what we need is an operator norm).

Put another way, the -norm is "the largest output of the function", and this will serve as the inspiration for our operator norm. Doing the minimal changes possible, we might try to define . There are two problems with this:

  1. First, since  is linear, you can make  arbitrarily large by scaling  by 10x, or 100x, etc. We can fix this by restricting the set of valid f for these purposes, just like how for the  example restricted the inputs of  to the compact set . Unsurprisingly nice choice of set to restrict to is the "unit ball" of functions, the set of functions with .   
  2. Second, we must bid tearful farewell to the innocent childhood of maxima, and enter the liberating adulthood of suprema. This is necessary since  ranges over the infinite-dimensional vector space of continuous functions, so the Heine-Borel theorem no longer guarantees the unit ball is compact, and therefore the extreme value theorem no longer guarantees that we will attain a maximum.

So the proper definition of the norm of  and  are: 

(and you can define similar norms for any linear operator, including , etc.) A good exercise is to show these equivalent definitions of the operator norm for any linear function L:

So another way of thinking of the operator norm is the maximum stretching factor of the linear operator. The third definition also motivates the terminology of bounded linear operators: each such  is a bound on the operator , and the least such bound is the norm. Fun exercise: show that a linear operator is bounded if and only if it is continuous (with respect to the correct topologies). Hint: you'll need to work in infinite dimensional spaces here, because any finite-dimensional linear operator must be bounded.

Now let's actually compute these norms! For , remember that our -norm is defined over the interval . First observe that for the constant function , so . Thus . To show that this is indeed the maximum we use the triangle inequality for integrals:

So we have shown ! Put a pin in that while we check .

For , we have a problem: for any positive number . In other words,  can stretch functions by any amount, so it has no norm, or we'd write  (and I promise this is a failure of , not of our definitions). Put another way,  is not bounded as a linear operator, since it can stretch functions by an arbitrary amount. 

But now let's return to . We said that  (if we're defining it relative to the -norm on ), but isn't  only true when ? For real numbers, yes, but for operators, something magical happens: ! (Its like there's a whole algebra of these operators...) 

In fact, you can show that  assumes its maximum value when applied to the constant function , and hence have . Since  grows faster than exponential functions,  is converging to 0 quickly, so  is a Cauchy sum, and it is then straightforward to show that the limit is the multiplicative inverse of . Thus,  is a valid expression that you can apply to any continuous (or bounded) function  on any compact set . This convergence happens regardless of the choice of the compact set, though it will happen at different rates, analogous to uniform convergence on compact sets.

Summary 

  • Writing  for derivative and  for integral, we showed that  can fail, even though  is always true.
  • To explain this, we have to show that  is fundamentally better behaved than , in a way analogous to .
  • We built this up in two steps. First, we defined the -norm for real-valued functions, which lets you say how "large" those functions are. Then, we extended this to function-valued functions (operators), having to make two slight modifications along the way.
  • With this machinery in place, we could show that , or we can say that  is bounded. The resulting norm depends on the domain of the functions  under consideration, but any compact domain is allowable. Also, since , the exact value doesn't matter since the norm of each term goes to 0.
  • Since  sufficiently quickly, we can say that  is Cauchy as a sequence of operators. In other words, if you apply the partial sums  as operators to any function , the functions  will converge with respect to the -norm. Writing  for the function they converge to, it follows that , so we may write  as a statement about linear operators.
  • In contrast,  is unbounded as an operator, meaning . Thus algebra tricks like  will break down if you put in the wrong function .

New to LessWrong?

New Comment


10 comments, sorted by Click to highlight new comments since:

This doesn't completely explain the trick, though. In the step where you write f=(1-I)^{-1} 0, if you interpret I as an operator then you get f=0 as the result. To get f=Ce^x you need to have f=(1-I)^{-1} C in that step instead. You can get this by replacing \int f by If+C at the beginning.

Ah sorry, I skipped over that derivation! Here's how we'd approach this from first principals: to solve f=Df, we know we want to use the (1-x)=1+x+x^2+... trick, but now know that we need x=I instead of x=D. So that's why we want to switch to an integral equation, and we get
f=Df
If=IDf = f-f(0)
where the final equality is the fundamental theorem of calculus. Then we rearrange:
f-If=f(0)
(1-I)f=f(0)
and solve from there using the (1-I)=1+I+I^2+... trick! What's nice about this is it shows exactly how the initial condition of the DE shows up.

Here's a puzzle I came up with in undergrad, based on this idea:

Let  be a function with nice derivatives and anti-derivatives (like exponentials, sine, or cosine) and  be a polynomial. Express the th anti-derivative of  in terms of derivatives and anti-derivatives of  and .

Can provide link to a post on r/mathriddles with the answer in the comments upon request

Use integration by parts:

Then  is another polynomial (of smaller degree), and  is another "nice" function, so we recurse.

This is true, but I'm looking for an explicit, non-recursive formula that needs to handle the general case of the kth anti-derivative (instead of just the first).

The solution involves doing something funny with formal power series, like in this post.

Heh, sure.

Promote from a function to a linear operator on the space of functions, . The action of this operator is just "multiply by ". We'll similarly define meaning to multiply by the first, second integral of , etc.

Observe:

Now we can calculate what we get when applying times. The calculation simplifies when we note that all terms are of the form . Result:

Now we apply the above operator to :

The sum terminates because a polynomial can only have finitely many derivatives.

Very nice! Notice that if you write   as , and play around with binomial coefficients a bit, we can rewrite this as:
 

which holds for  as well, in which case it becomes the derivative product rule. This also matches the formal power series expansion of , which one can motivate directly

(By the way, how do you spoiler tag?)

Oh, very cool, thanks! Spoiler tag in markdown is:

:::spoiler
text here
:::

You can make  work out, if you are prepared to make your mathematics even more deranged. 

So lets look at  

Think of the  not as  but as some infinitesimal  times some unknown function .

If that function is  then we get  which is finite, so multiplied by  it becomes infinitesimal. 

If  then we get  and as we know  because 

So this case is the same as before. 

But for  we get  which doesn't converge. The infinite largeness of this sum cancels with the infinitesimally small size of  (Up to an arbitrary finite constant). 

So 

Great. Now lets apply the same reasoning to 

. First note that this is infinite, it's , so undefined. Can we make this finite. Well think of  as actually being  and in this case, take 

For the final term, the smallness of epsilon counteracts having to sum to infinity. For the first and middle term, the sum is 

Which is  

Now 

So we have 

The first term is negligible. So 

Note that the  can be ignored, because we have  for arbitrary (finite) C as before. 

Now  is big, but it's probably less infinite than  somehow. Let's just group it into the  and hope for the best. 

Edit: looks like was already raised by Dacyn and answered to my satisfaction by Robert_AIZI. Correctly applying the fundamental theorem of calculus will indeed prevent that troublesome zero from appearing in the RHS in the first place, which seems much preferable to dealing with it later. 

My real analysis might be a bit rusty, but I think defining I as the definite integral breaks the magic trick. 

I mean, in the last line of the 'proof',  gets applied to the zero function. 

Any definitive integral of the zero function is zero, so you end up with f(x)=0, which is much less impressive. 

More generally, asking the question Op(f)=0 for any invertable linear operator Op is likely to set yourself up for disappointment. Since the trick relies on inverting an operator, we might want to use a non-linear operator. 

 where C is some global constant might be better. (This might affect the radius of convergence of that Taylor series, do not use for production yet!)

This should result in... uhm... ?

Which is a lot more work to reorder than the original convention used in the 'proof' where all the indefinite integrals of the zero function are conveniently assumed to be the same constant, and all other indefinite integrals conveniently have integration constants of zero. 

Even if we sed s/C// and proclaim that  should be small (e.g. compared to x) and we are only interested in the leading order terms, this would not work. What one would have to motivate is throwing everything but the leading power of x out for every  evaluation, then later meticulously track these lower order terms in the sum to arrive at the Taylor series of the exponential.