But this is impossible by definition

The title may seem like a contradiction. How can you differentiate something that’s not even continuous?

The usual definition of the derivative of a function f at a point a is given by the limit:

 

 

If  is differentiable at , then it is continuous at . But what if it’s NOT even continuous? Then how the hell can it be differentiable?

Consider the step function, also known as the Heaviside step function. The Heaviside step function, is defined as:

 

 

What’s its derivative? Forget the formal definition of the derivative for a moment, and just consider the step function and the intuitive idea of a derivative as a slope.

Outside of 0, the step function is flat. So the derivative should be 0 everywhere besides 0.

Look at the step function, left to right. At 0, supposing for a moment there is a derivative, then it can’t be any standard number. It’s not 0 since it’s certainly not flat. It’s not negative since it’s increasing to the right. It’s bigger than 

Since it goes from 0 to 1 in the space of a single point, the derivative would have to be infinitely big because its difference quotient is . The Problem is that no real number is infinitely big.

The usual way mathematicians deal with the Problem is to introduce generalized functions. They are also called distributions, which is confusing since probability distributions are completely different.

The formal definition of a generalized function is: an element of the continuous dual space of a space of smooth functions.

Well, what the fuck does that mean?

Luckily, you don’t have to care. The Problem was that no real number is infinitely big, so let’s just deal with that directly. Instead of introducing some abstract dual space of smooth functions, we can just enlarge the space of numbers. You’re used to this since childhood, ever since you first learned about negative numbers, irrational numbers, and imaginary numbers. This is one more enlargement.

The reason for the Problem is that we are really thinking about a line but identifying it with the real numbers. Identifying a line with the real numbers is just that – an identification – and it’s not even the best one.

We’ll use the hyperreal numbers from the unsexily named field of nonstandard analysis to offer a radically elementary way of thinking about the problem. We will think of the line as the hyperreals rather than the reals. Consider this picture, where  is some infinite number, and  is some infinitesimal number.

the hyperreals

Now rather than a dual space, you may think of a fundamentally nonstandard function. Its domain or range (or both) are nonstandard. And unlike a generalized “function”, it’s a bona fide function, just not over the reals1. For the rest of this post, I will call the usual generalized functions “generalized functions” and this concept “nonstandard functions”. This class of nonstandard functions includes the generalized functions, but is bigger. Rather than cutting down the space of functions to a smaller space, we’re enlarging the space of numbers and functions between them.

Take the Dirac delta function. It’s infinitely tall, infinitely skinny, and has area 1 under it. So its nonzero domain is infinitesimal, and its range is infinitely big. We can formalize “infinitely tall, infinitely skinny, and has area 1 under it” very literally, as you’ll see. We will also free it from living only under an integral sign, like Cauchy did when he first defined it in 1827.2

the dirac delta as a nonstandard function

What does it mean to be infinitely big?

A number  is infinitely big if   In other words,  is bigger than any standard number. This does not imply it’s the biggest number. There are (uncountably) many infinite numbers bigger than  and uncountably many infinite numbers smaller than . There are also countably many finite natural numbers smaller than . If  is a whole number, it’s called a hypernatural number, or sometimes hyperfinite.

Here are the hyperintegers , which are only the whole numbers, finite and infinite. As in, 0 fractional part.

the hyperintegers

A world in every grain of sand

A number  is infinitesimal if it can be written as  for some infinitely big  (not necessarily a whole number ), or if it’s .

 is the only number which is both standard and infinitesimal, and it’s the smallest infinitesimal. There is no second smallest infinitesimal, just like how there’s no second smallest positive real number.

We say 2 numbers  and  are infinitely close (in notation: ) if  is infinitesimal. This is an equivalence relation on the hyperreals, which divides them into infinitesimal neighborhoods. Because it’s an equivalence relation, the infinitesimal neighborhoods of different standard numbers are disjoint.

There is a standard part function, which takes a hyperreal number and gives you the standard real number that it’s infinitely close to, if it exists. Infinitely big numbers have no standard part, since they’re not infinitely close to any real number. The standard part of an infinitesimal is 0. This is the "rounding off higher order terms" so common in calculus.

Infinitesimal neighborhoods are disjoint

What’s wrong with ?

The problem with  is that it’s not a number. It’s a concept used to describe the idea of something that’s unbounded or limitless. You can’t do full-on algebra with it.

Unlike the usual conception of infinity,  is exactly  bigger than , whereas .. This throws away valuable information.  is smaller by exactly . And so on. You can form expressions that previously were gibberish like , but are now as meaningful as . Not as easy to compute though.

Examples of hyperfinite quantities

Here's a video of an example since my friend Ben asked. It features some leaves since I was taking a walk.

The Number of Pieces an Integral is Cut Into

You’re probably familiar with the idea that each piece has infinitesimal width, but what about the question of ‘how MANY pieces are there?’. The answer to that is a hypernatural number. Let’s call it  again.

Hyperfinitely many pieces

The Number of Sides of a Circle

Consider a circle as a regular polygon with infinitely many sides.

In the plot below, even 100 sides is barely discernible from a true circle. A 100 sided regular polygon

The Number of Colors in the Spectrum

In our everyday experience, we perceive colors as a continuous spectrum, seamlessly transitioning from one hue to another. However, when we apply nonstandard analysis, we can think of the color spectrum as being divided into  distinct colors, where  is a hypernatural number.

Imagine splitting the visible spectrum, say from 400 nm (violet) to 700 nm (red), into equally spaced colors. Each color occupies an infinitesimal range of wavelengths, or . This captures one of the great uses of infinitesimals: 2 things that can be considered different or the same as needed. This captures the idea of shades of a single color. Imagine 2 shades of red—one a bit darker than the other. We can idealize this by saying they have infinitely close wavelengths, say 700 nm and () nm. In reality, shades are an non-infinitesimal wavelength apart, but because they can be arbitrarily close, this idealization is legit.

Visible spectrum split into 10,000 colors

Germ of Generality: The Step Function

Now we’ll elaborate our running example: the step function H(x). We will “approximate” it by a nonstandard function. Keep in mind that this “approximation” is an approximation in the same way that a Riemann sum is an approximation to an integral. It’s infinitely close at every point.

We can model the step function dynamically or statically.

Dynamically

A sequence of logistic functions that approach the step function.

But we can do it statically as well. Instead of making a sequence, why not just use ONE number? We can skip to the end of the process and just let  be infinitely big.3

KEY POINT: Our nonstandard logistic function, the point of this whole post, is:

 where  is infinitely big.

This nonstandard logistic function is appreciably indiscernible from the step function. The difference between them is “one (standard) point thick”.

Nonstandard Logistic Function

The Derivative of the Nonstandard Logistic Function

N may be infinite, but it’s still a FIXED number, AKA a constant.

KEY(ER) POINT: To take the derivative, you just, uh, take the derivative. Treat N as a constant and differentiate with respect to x.

If you want a formal definition of this “new” derivative:

The derivative of a function  at a point  is:

 

 where  is a fixed, nonzero infinitesimal.

 

This definition captures the essence of the derivative without relying on limits. It’s worth noting that for standard differentiable functions, this definition agrees with the classical one (up to an infinitesimal difference that can be rounded away). So this is actually easier in a way, since you do the same thing with 1 less step (no limit).

If you did this with the usual definition of generalized functions, you wouldn’t figure out how to compute anything with them until about halfway through math grad school. Or never, since I just asked my friend Elliot Glazer and he said they never got to the actual computation, just the definition. But a motivated AP calculus high school student can do this. Helluva simplication.

DERIVATIVE:

Here’s a graph of the derivative:

Derivative of the nonstandard logistic function is the Dirac delta

Spoiler alert: it’s the Dirac delta. Which makes sense, since the delta is (nearly) 0 everywhere except at the origin, where it’s infinitely big. Which is what we expected from the intuitive analysis.

Case Analysis: Exploring Different Regimes

Let’s analyze how the derivative behaves across different scales of x: Appreciable, infinitesimal, and 0 (actual zero, not just infinitesimal).

Positive appreciable  (Not infinitesimal and not )

A number is appreciable if it’s not infinitesimal. Y’know, big enough that you could appreciate (see) it.

Let’s plug in a (standard) positive rational number for  and see what we get.4 Since it’s rational,  for some standard finite coprime integers  and . Neither  nor  are .

Consider the subexpression . If  is infinitely big, then the numerator  is infinitely big too. If it wasn’t, then N wouldn’t be infinitely big. Same for . So  is infinitely big. Let’s call it  and  are of the same order because  is a standard number. Something that’s NOT of the same order is  and , because  which is not standard. If  is of order 1 (think of  like a meter stick, setting a scale for the infinite) , then  is of order 2. Infinitesimals are of negative order, like  is of order . All standard numbers are order 0.

So we can replace  with .

Now our expression is:

Intuitively,  is like , which is . So  should be infinitesimal. Using the Taylor series of , we can see it’s an infinitesimal of (much) smaller order than . To make this a bit easier, first rewrite  as .

Let’s examine the Taylor series of  around :

Since  is infinitely big and positive, the denominator () is infinitely big. The series has a larger sum than any standard polynomial in  (or , since  and  have the same order), because it contains the term for every standard natural number . Because the later terms have higher order than previous, they are strictly larger --regardless of the division by -- since  is only finite but  is infinite and of a strictly larger order than all the previous terms.

This shows that  is infinitesimal since it's 1 over something that’s infinitely big.

Then the numerator  is infinitesimal since it’s an infinite number of order 1 times something that’s infinitesimal of (much) lower order.

The bottom term  is infinitely close to 1 by similar reasoning. So the whole denominator  is also infinitely close to 1 since it's a term that's nearly 1 times itself.

Putting it all together, the whole fraction is an infinitesimal number divided a number that’s nearly 1. AKA it’s infinitesimal. So outside of 0 it looks flat.

Ditto for negative appreciable x.

Infinitesimal nonzero x

This is a bit more complicated, since it depends on the order of . The short of it is that the logistic function is continuous, so by the Intermediate Value Theorem, it will take on all values from infinitesimal to infinite in the infinitesimal neighborhood of 0. All this “weirdness” is crammed into an infinitesimal slice of space, invisible if you took the standard part.

Let’s just take 1 specific value to illustrate. Consider .

The original formula is:

Plugging in , we get:

 

So this particular value  has infinite derivative.

At 

Remember that I said that 0 is the smallest infinitesimal? That means that any number (even an infinite one) times 0 is still 0. EXACTLY 0.

Exactly at , the derivative becomes:

Since  is an infinite number,  is also infinite. This captures the essence of the Dirac delta function’s “infinite spike.” But now while the delta may be infinite, it has an end, a specific height. Which is .

This highlights something that is very difficult to even think about in the standard approach: the EXACT height of something infinitely tall. Delta functions are familiar to physicists and engineers, and they’re even familiar with the idea that the domain is infinitesimal. But the height is always treated as if it’s some sort of magic symbol called INFINITY. If you asked them "ok but HOW tall is it?", they’d just say INFINITY. But here it’s not just infinite, but a specific infinite number divided by 4.

Let me tell you, people look at you funny if you say something is infinity over 4 tall. In the standard approach,  is just . But a physicist would NEVER mix up  and  because that would be confusing a length (line) element with an area (plane) element. So why not treat infinity the same way?

Higher derivatives

No reason to stop at the first derivative. The logistic function is infinitely differentiable, so we can just keep taking derivatives.

The second derivative is:

This function is sometimes called the Laplacian of the indicator, or the dipole moment of a magnet.

Personally, I find the magnet picture intuitively helpful. A point charge flips from positive to negative in the space of a single point, and the closer you get, the higher the value of the magnetic potential. infinitely close to the magnet and the potential is infinitely big.

The graph of this one looks confusing plotted but here it is:

Second Derivative of Step Function

Dipole of a magnet

The higher derivatives are called multipole moments but I’ll stop at 2.

Conclusion

Sometimes, it’s easier to solve a problem by reexamining old assumptions than by introducing heavy machinery. Often.

By using nonstandard analysis and infinite numbers, we’ve found a way to differentiate the Heaviside step function using actual functions rather than distributions. This approach offers a more intuitive understanding of discontinuous functions and their derivatives, bridging the gap between mathematical rigor and intuitive comprehension.

In the realm of nonstandard analysis, infinite numbers aren’t obstacles but stepping stones—bridges that connect the discontinuities of mathematics with the continuity of intuition.

Here is a video I made on this. And a software library. Here’s a calculator that works with infinitesimals and infinitely big numbers.

Credit to Euler, Cauchy, Mikhail Katz.


  1. This concept can be extended far beyond the reals, but that’s a topic for another post. 
  2. Differential forms are another thing nonstandard analysis can free from the tyranny of life under the yoke of the integral sign. But that’s for another post. 
  3. How did I know to use the logistic function? Luck. Before that bums you out too much, keep in mind that determining whether a (standard or not) function is even continuous at a point is undecidable. That’s why no one gave you a general formula to find limits, because there isn’t one. This is just another example of that. 
  4. The reason for rationals is that you can make them as close as you want to any real number, and they’re easy to work with. 

If you have anything you want to say to me (compliments, criticism, requests for future posts, etc.), please fill out my feedback form

New Comment
5 comments, sorted by Click to highlight new comments since:

The more standard way to do this is with objects called distributions which you can loosely think of as "things you can integrate to get a function."

What’s its derivative?

The graph is nonstandard and misleading. It should not have a vertical segment at 0, it should have an open-circle at 0,0, and a closed circle at 0,1, showing that the lines do not and do contain the 0 point, respectively.

This makes the intuition pump a little easier.  The derivative at all nonzero Xs is 0.  The derivative AT ZERO, is 0 to the right (as X increases), and undefined to the left (as X decreases).  There is no connection between 0 and 0 - epsilon, and therefore no slope.

You CAN use more complicated models to describe some features of it (hyperreals, or just limits), but those are modeling tools to answer different questions than the intuitive use of derivative (slope of a continuous curve).  It's probably not right to say that any of them are "true", without some caveats.

 

added some open circles

Your definition of the Heaviside step function has H(0) = 1.
Your definition of L has L(0) = 1/2, so you're not really taking the derivative of the same function.

I don't really believe nonstandard analysis helps us differentiate the Heaviside step function. You have found a function that is quite a lot like the step function and shown that it has a derivative (maybe), but I would need to be convinced that all functions have the same derivative to be convinced that something meaningful is going on.  (And since all your derivatives have different values, this seems like a not useful definition of a derivative)

I adjusted H to use heaviside's 1/2 convention, good catch.