A Very Mathematical Explanation of Derivatives

Heighn

This post is meant for readers familiar with algebra and derivatives, but want to deepen their understanding and/or need a refresh.

Linear functions

Let's start with a family of very basic functions: the linear functions, expressed as . You might remember its derivative is $f^{'} (x) = a$ , because $x$ is multiplied by $a$ and the constant $b$ "disappears" when taking the derivative. This is correct, but let's actually calculate the derivative. Since $f (x)$ is a linear function, $f^{'} (x)$ is the same for all $x .$ That is, a linear function "goes up" with the same "speed" everywhere, as can be seen in the following graph for $g (x) = 2 x + 5$ :

For example, between $x = 0$ and $x = 1$ , $f (x)$ increases with $2$ , just like it does between e.g. $x = 2$ and $x = 3$ . Therefore, determining the average slope between $x$ and $x + d$ will do. The average slope between $x$ and $x + d$ is how much $f (x)$ increases between $x$ and $x + d$ , divided by the difference between $x$ and $x + d$ (which is $d$ ). Let $d = 1$ , in which case we don't have to do the division, as $f^{'} (x) = \frac{f (x + 1) - f (x)}{1} = f (x + 1) - f (x)$ . Filling in $a x + b$ for $f (x)$ and $a (x + 1) + b$ for $f (x + 1)$ , we get:

$f^{'} (x) = f (x + 1) - f (x) = (a (x + 1) + b) - (a x + b) = a x + a + b - a x - b = a$

There it is! $f^{'} (x) = a$ . So for $g (x) = 2 x + 5$ , where $a = 2$ , this means $g^{'} (x) = 2$ .

Polynomials (and more)

Polynomials are functions with the following form:

$f (x) = a_{0} x^{n} + a_{1} x^{n - 1} + . . . + a_{n - 1} x + a_{n}$

Determining their derivative is a bit more tricky than determining the derivative of a linear function, because now, the derivative isn't necessarily the same everywhere. After all, take $g (x) = x^{2}$ :

We can see this is a curved line, and so the derivative is constantly changing. We can sill do something like the "trick" we did with linear functions, but we can't determine $f^{'} (x)$ by looking how $f (x)$ changes between $x$ and $x + 1$ : that would assume $f^{'} (x)$ is the same between $x$ and $x + 1$ , which isn't true. For $x$ and $x + 0.001$ , we would have a better estimate of $f^{'} (x)$ , but we'd still assume $f^{'} (x)$ to be constant between these values. We need to determine how $f (x)$ changes between $x$ and some $x + d$ , where $d$ needs to approach zero: the smaller $d$ gets, the more accurate our calculation for $f^{'} (x)$ becomes. We can do this using limits:

$f^{'} (x) = {lim}_{d \to 0} \frac{f (x + d) - f (x)}{d}$

(Since $d$ isn't 1 now, we need to do the division.) We can read this as follows: what value does $\frac{f (x + d) + f (x)}{d}$ approach when $d$ approaches $0$ ?

Let's do this for the simple polynomial $g (x) = x^{2}$ :

$g^{'} (x) = {lim}_{d \to 0} \frac{g (x + d) - g (x)}{d} = {lim}_{d \to 0} \frac{(x + d)^{2} - x^{2}}{d} = {lim}_{d \to 0} \frac{x^{2} + 2 d x + d^{2} - x^{2}}{d} = {lim}_{d \to 0} \frac{2 d x + d^{2}}{d}$

$g^{'} (x) = {lim}_{d \to 0} (\frac{2 d x}{d} + \frac{d^{2}}{d}) = {lim}_{d \to 0} (2 x + d)$

When $d$ approaches $0$ , $2 x + d$ becomes $2 x$ :

$g^{'} (x) = {lim}_{d \to 0} (2 x + d) = 2 x$ .

So for $g (x) = x^{2}$ , $g^{'} (x) = 2 x$ . You might have learned the general rule:

For $f (x) = a x^{b}$ , $f^{'} (x) = a b x^{b - 1}$

This is known as the power rule, and indeed works for $g (x) = x^{2}$ , where $a = 1$ and $b = 2$ and $1 * 2 * x^{2 - 1} = 2 x^{1} = 2 x$ . It also works for the linear function $h (x) = 2 x$ , where $a = 2$ and $b = 1$ : $h^{'} (x) = 1 * 2 x^{1 - 1} = 1 * 2 x^{0} = 1 * 2 * 1 = 2$ . But does it work in general? Yes, and we can proof it. Let's first proof it works for all natural numbers ( $0, 1, 2, 3, . . .$ ) $b$ : $b \in N$ . We need the product rule and mathematical induction for this proof though, so let's discuss those first.

Product rule

The product rule states that when $f (x) = g (x) * h (x)$ , $f^{'} (x) = g^{'} (x) * h (x) + g (x) * h^{'} (x)$ . So when e.g. $g (x) = 2 x$ and $h (x) = 3 x^{2}$ , $f (x) = 2 x * 3 x^{2}$ and

$f^{'} (x) = g^{'} (x) * h (x) + g (x) * h^{'} (x) = 2 * 3 x^{2} + 2 x * 6 x = 6 x^{2} + 12 x^{2} = 18 x^{2}$ . We can show the product rule is correct by determining what $f^{'} (x)$ should be using the original definition of the derivative:

$f^{'} (x) = {lim}_{d \to 0} \frac{f (x + d) - f (x)}{d} = \frac{g (x + d) * h (x + d) - g (x) * h (x)}{d}$

Since we want to write $f^{'} (x)$ as $g^{'} (x) * h (x) + g (x) * h^{'} (x)$ , let's rewrite the divisor to include the terms $g (x + d) - g (x)$ and $h (x + d) - h (x)$ :

$f^{'} (x) = {lim}_{d \to 0} \frac{h (x) (g (x + d) - g (x)) + g (x + d) (h (x + d) - h (x))}{d}$

and note that indeed, $h (x) (g (x + d) - g (x)) + g (x + d) (h (x + d) - h (x)) =$

$h (x) g (x + d) - g (x) h (x) + g (x + d) h (x + d) - h (x) g (x + d) =$

$g (x + d) h (x + d) - g (x) h (x)$ , which was our original divisor.

Simplifying $f^{'} (x) = {lim}_{d \to 0} \frac{h (x) (g (x + d) - g (x)) + g (x + d) (h (x + d) - h (x))}{d}$ , we get

$f^{'} (x) = {lim}_{d \to 0} \frac{h (x) (g (x + d) - g (x))}{d} + {lim}_{d \to 0} \frac{g (x + d) (h (x + d) - h (x))}{d}$

Since $h (x)$ doesn't contain $d$ , we can take it outside the first limit term. We can also rewrite the second term:

$f^{'} (x) = h (x) * {lim}_{d \to 0} \frac{g (x + d) - g (x)}{d} + {lim}_{d \to 0} g (x + d) * {lim}_{d \to 0} \frac{h (x + d) - h (x)}{d}$

When $d$ approaches $0$ , $l i m_{d \to 0} g (x + d)$ becomes $g (x)$ . Furthermore, by definition,

${lim}_{d \to 0} \frac{g (x + d) - g (x)}{d} = g^{'} (x)$ and ${lim}_{d \to 0} \frac{h (x + d) - h (x)}{d} = h^{'} (x)$ ,

so we now have $f^{'} (x) = h (x) * g^{'} (x) + g (x) * h^{'} (x) = g^{'} (x) * h (x) + g (x) * h^{'} (x)$ ,

which is the product rule!

Mathematical induction

Mathematical induction is a method for proving something is true for all natural numbers $N$ . For example, say we want to proof that for every natural number $n \in N$ , $S (n) = \frac{n (n + 1)}{2}$ , where $S (n)$ is simply $0 + 1 + 2 + . . . + n$ . We can do this by first showing the condition holds for $n = 0$ . That's Step 1, and yes, it does: $\frac{0 (0 + 1)}{2} = 0 = S (0)$ . Then, we show that if the condition holds for some $n$ , it also holds for $n + 1$ . That's Step 2. So for this step we assume $S (n) = \frac{n (n + 1)}{2}$ , and need to show that $S (n + 1) = \frac{(n + 1) ((n + 1) + 1)}{2}$ . That holds as well: if $S (n) = \frac{n (n + 1)}{2}$ , then $S (n + 1) = 0 + 1 + 2 + . . . + n + (n + 1) = S (n) + n + 1$ . Since for this step we assumed $S (n) = \frac{n (n + 1)}{2}$ , we have $S (n + 1) = \frac{n (n + 1)}{2} + n + 1 = \frac{n (n + 1)}{2} + \frac{2 (n + 1)}{2}$ . So $S (n + 1) = \frac{n (n + 1) + 2 (n + 1)}{2} = \frac{(n + 1) (n + 2)}{2} = \frac{(n + 1) ((n + 1) + 1)}{2}$ .

So we now know that our condition holds for $n = 0$ and that if it holds for some $n$ , it must also hold for $n + 1$ . But then it holds for all natural numbers! Does our condition hold for $n = 3$ ? Yes! It holds for $n = 0$ by Step 1, so it holds for $n = 1$ by Step 2; but then, since it holds for $n = 1$ , it also holds for $n = 2$ , again by Step 2. Applying Step 2 one more time gives that the condition $S (n) = \frac{n (n + 1)}{2}$ holds for $n = 3$ as well. And we can apply this process to every natural number!

Proof of the power rule for natural numbers

Using the product rule and mathematical induction, we can show that the power rule (for $f (x) = a x^{b}$ , $f^{'} (x) = a b x^{b - 1}$ ) works for all $b \in N$ .

Step 1 is to show this is true for $b = 0$ . Yes: then $f (x) = a x^{b} = a x^{0} = a$ , and $a b x^{b - 1} = a * 0 x^{0 - 1} = 0 = f^{'} (x)$ . (Since $f (x)$ is constant ( $a$ ), its derivative $f^{'} (x)$ is indeed $0$ .

Step 2 is to show that if $f^{'} (x) = a b x^{b - 1}$ for some $b \in N$ and $f (x) = a x^{b}$ , then for $b + 1$ and $g (x) = a x^{b + 1}$ , $g^{'} (x) = a (b + 1) x^{b}$ .

We can write $g (x) = a x^{b + 1}$ as $g (x) = x * a x^{b}$ . Then, define $h (x) = x$ . Then $g (x) = h (x) * f (x)$ , and then the product rule says $g^{'} (x) = h^{'} (x) * f (x) + h (x) * f^{'} (x)$ . But by the assumption of Step 2, $f^{'} (x) = a b x^{b - 1}$ . Furthermore, $h^{'} (x) = 1$ . So $g^{'} (x) = 1 * a x^{b} + x * a b x^{b - 1} = a x^{b} + a b x x^{b - 1} = a x^{b} + a b x^{b} = a (b + 1) x^{b}$ , which is what we wanted to proof!

So we have shown the power rule works for $b \in N$ . We could extend this proof to e.g. cover negative integers for $b$ as well. But I'd like to use a different method of proof, that proofs the power rule works for $b \in R$ . For this, we first need to know the chain rule, the constant multiple rule, Euler's number and how to take the derivative of the natural logarithm.

Chain rule

Define $f (x) = (3 x)^{2}$ . (Note this is distinct from $3 x^{2}$ .) We want to determine its derivative. We could say $f (x) = (3 x)^{2} = 9 x^{2}$ , which would make $f^{'} (x) = 18 x$ . This is true, but let's take the opportunity to study the chain rule. Define $g (x) = 3 x$ and $h (x) = x^{2}$ . We can then write $f (x)$ as $h (g (x))$ . Then:

$f^{'} (x) = {lim}_{d \to 0} \frac{f (x + d) - f (x)}{d} = {lim}_{d \to 0} \frac{h (g (x + d)) - h (g (x))}{d}$

Multiplying by $\frac{g (x + d) - g (x)}{g (x + d) - g (x)}$ , which equals 1 and is allowed if $g (x + d) \neq g (x)$ (otherwise we are dividing by 0), gives:

$f^{'} (x) = {lim}_{d \to 0} \frac{(h (g (x + d)) - h (g (x))) * (g (x + d) - g (x))}{(g (x + d) - g (x)) * d}$

or $f^{'} (x) = {lim}_{d \to 0} \frac{h (g (x + d)) - h (g (x))}{g (x + d) - g (x)} * {lim}_{d \to 0} \frac{g (x + d) - g (x)}{d}$

Note that ${lim}_{d \to 0} \frac{h (g (x + d)) - h (g (x))}{g (x + d) - g (x)} = h^{'} (g (x))$ , and ${lim}_{d \to 0} \frac{g (x + d) - g (x)}{d} = g^{'} (x)$ . So we now have $f^{'} (x) = h^{'} (g (x)) * g^{'} (x)$ . That's the chain rule, and it holds whenever we can write a function $f (x)$ as $f (x) = h (g (x))$ . Originally, we said $f (x) = (3 x)^{2}$ , with $g (x) = 3 x$ and $h (x) = x^{2}$ . Then $h^{'} (g (x)) = 2 g (x) = 2 * 3 x = 6 x$ and $g^{'} (x) = 3$ . According to the chain rule, then, $f^{'} (x) = 6 x * 3 = 18 x$ , which is also what we got by applying the power rule to $f (x) = (3 x)^{2} = 9 x^{2}$ .

Before, we temporarily assumed $g (x + d) \neq g (x)$ . What if $g (x + d) = g (x)$ ? Well, then $h (g (x + d)) = h (g (x))$ , and $h (g (x + d)) - h (g (x)) = 0$ . Then $f^{'} (x) = {lim}_{d \to 0} \frac{0}{d} = 0$ , and $g^{'} (x) = {lim}_{d \to 0} \frac{g (x + d) - g (x)}{d} = {lim}_{d \to 0} \frac{0}{d} = 0$ . So the chain rule would still apply, as $h^{'} (g (x)) * g^{'} (x) = h^{'} (g (x)) * 0 = 0 = f^{'} (x)$ .

Constant multiple rule

If $f (x) = a * g (x)$ , $f^{'} (x) = a * g^{'} (x)$ . This might make intuitive sense, but it also follows from the chain rule: define $h (x) = a x$ and $f (x) = h (g (x))$ . Then $f^{'} (x) = h^{'} (g (x)) * g^{'} (x) = a * g^{'} (x)$ , which is the constant multiple rule. Indeed, this same rule also follows from the product rule: if $f (x) = a * g (x)$ , define $h (x) = a$ . Then $f (x) = h (x) * g (x)$ and $f^{'} (x) = h^{'} (x) * g (x) + h (x) * g^{'} (x) = 0 * g (x) + a * g^{'} (x) = a * g^{'} (x)$ .

Euler's number and the natural logarithm

You might know that Euler's number $e$ , which is chosen so that if $f (x) = e^{x}$ , $f^{'} (x) = e^{x}$ . You may also remember the natural logarithm $ln$ , where $e^{ln x} = x$ . What's the derivative of $f (x) = ln x$ ? We can find it with the chain rule! Define $g (x) = e^{f (x)} = e^{ln x}$ and $h (x) = e^{x}$ . Then $g (x) = h (f (x))$ , and applying the chain rule gives $g^{'} (x) = h^{'} (f (x)) * f^{'} (x) = e^{ln x} * f^{'} (x)$ . But also, $g (x) = e^{ln x} = x$ , so $g^{'} (x) = 1$ . So we learn $g^{'} (x) = e^{ln x} * f^{'} (x) = 1$ , or $x * f^{'} (x) = 1$ , and so $f^{'} (x) = \frac{1}{x}$ . So for $f (x) = ln x$ , $f^{'} (x) = \frac{1}{x}$ .

General proof of the power rule

Now we're ready to proof the power rule (for $f (x) = a x^{b}$ , $f^{'} (x) = a b x^{b - 1}$ ) works for $b \in R$ . Let's rewrite $x$ as $e^{ln x}$ . Then $f (x) = a (e^{ln x})^{b} = a e^{b ln x}$ . Define $g (x) = a e^{x}$ and $h (x) = b ln x$ . Then $f (x) = g (h (x))$ , and via the chain rule (and the constant multiple rule) $f^{'} (x) = g^{'} (h (x)) * h^{'} (x) = a e^{b ln x} * b * \frac{1}{x} = a e^{b ln x} * \frac{b}{x}$ . Remember $a e^{b ln x} = a (e^{ln x})^{b} = a x^{b}$ , so $f^{'} (x) = a x^{b} * \frac{b}{x} = \frac{a b x^{b}}{x} = a b x^{b - 1}$ , which is what we need to proof the power rule for $b \in R$ .

Local maxima, local minima and second derivatives

As you might know, polynomials like $f (x) = - x^{2}$ can have local maxima (or peaks, where the graph first goes up and then goes down) and local minima (or minima, where the graph first goes down and then goes up). When a graph goes up, the derivative is positive; when it goes down, the derivative is negative. In the peak, the derivative must be $0$ ! It's similar for valleys - the derivative is $0$ there, too. That means we can find local maxima and local minima by setting the derivative to $0$ ! For $f (x) = - x^{2}$ , $f^{'} (x) = - 2 x$ . $f^{'} (x) = 0$ gives $- 2 x = 0$ and thus $x = 0$ . Therefore, there must be a local maximum or minimum at $f (0)$ . Which is it? Well, note that in a local maximum, the derivative must be decreasing (through $0$ ): otherwise, the graph wouldn't first go up and then go down. But if the derivative is decreasing, the derivative of the derivative, called the second derivative (written $f^{''} (x)$ ), must be negative! Conversely, in a local minimum, the second derivative must be positive. For $f (x) = - x^{2}$ , $f^{'} (x) = - 2 x$ and $f^{''} (x) = - 2 < 0$ . So $f (x) = 0$ is a local maximum!

Now consider $g (x) = x^{3} + x^{2}$ . We have $g^{'} (x) = 3 x^{2} + 2 x$ , and $g^{'} (x) = 0$ gives $3 x^{2} + 2 x = 0$ or $x (3 x + 2) = 0$ . Then $x = 0 \lor 3 x + 2 = 0$ and so $x = 0 \lor x = - \frac{2}{3}$ . We have a local maximum or minimum in $x = 0$ and a local maximum or minimum in $x = - \frac{2}{3}$ . $g^{''} (x) = 6 x + 2$ , $g^{''} (0) = 6 * 0 + 2 = 2 > 0$ and $g^{''} (- \frac{2}{3}) = 6 * - \frac{2}{3} + 2 = - 2 < 0$ . Therefore, we have a local minimum in $x = 0$ and a local maximum in $x = - \frac{2}{3}$ .

Multivariable functions

Multivariable functions are functions with, well, more than one variable. Take for example $f (x, y) = 2 x + 3 y$ . For $x = 4$ and $y = 5$ , we have $f (4, 5) = 2 * 4 + 3 * 5 = 23$ . Or we can take $g (x, y, z) = x + 5 y + 2 z$ , with $g (1, 1, 1) = 1 + 5 + 2 = 8$ .

Partial derivatives

A partial derivative of a multivariable function is determined by treating all but one variable like constants and taking the derivative with respect to the one variable left. For example, for $f (x, y) = 2 x + 3 y$ , we can derive with respect to $x$ : $f_{x}^{'} (x, y) = 2$ and with respect to $y$ : $f_{y}^{'} = 3$ . More generally, for any 2-variable function $f (x, y)$ , $f_{x}^{'} (x, y) = {lim}_{d \to 0} \frac{f (x + d, y) - f (x, y)}{d}$ and $f_{y}^{'} (x, y) = {lim}_{d \to 0} \frac{f (x, y + d) - f (x, y)}{d}$ . For $f (x, y) = 2 x + 3 y$ , this means $f_{x}^{'} (x, y) = {lim}_{d \to 0} \frac{f (x + d, y) - f (x, y)}{d} = {lim}_{d \to 0} \frac{(2 (x + d) + 3 y) - (2 x + 3 y)}{d} = {lim}_{d \to 0} \frac{(2 x + 2 d + 3 y) - 2 x - 3 y)}{d}$ , which indeed simplifies to $f_{x}^{'} (x, y) = {lim}_{d \to 0} \frac{2 d}{d} = {lim}_{d \to 0} 2 = 2$ .