Iteration Fixed Point Exercises

Scott Garrabrant; SamEisenstat

Iteration Fixed Point Exercises — LessWrong

Fixed Points

34 Iteration Fixed Point Exercises

by Scott Garrabrant, SamEisenstat

22nd Nov 2018

AI Alignment Forum

3 min read

34 Ω 10

This is the third of three sets of fixed point exercises. The first post in this sequence is here, giving context.

Note: Questions 1-5 form a coherent sequence and questions 6-10 form a separate coherent sequence. You can jump between the sequences.

Let $(X, d)$ be a complete metric space. A function $f : X \to X$ is called a contraction if there exists a $q < 1$ such that for all $x, y \in X$ , $d (f (x), f (y)) \leq q \cdot d (x, y)$ . Show that if $f$ is a contraction, then for any $x$ , the sequence ${x_{n} = f^{n} (x_{0})}$ converges. Show further that it converges exponentially quickly (i.e. the distance between the $n$ th term and the limit point is bounded above by $c \cdot a^{n}$ for some $a < 1$ )
(Banach contraction mapping theorem) Show that if $(X, d)$ is a complete metric space and $f$ is a contraction, then $f$ has a unique fixed point.
If we only require that $d (f (x), f (y)) < d (x, y)$ for all $x \neq y$ , then we say $f$ is a weak contraction. Find a complete metric space $(X, d)$ and a weak contraction $f : X \to X$ with no fixed points.
A function $f : R^{n} \to R$ is convex if $f (t x + (1 - t) y) \leq t f (x) + (1 - t) f (y)$ , for all $t \in [0, 1]$ and $x, y \in R^{n}$ . A function $f$ is strongly convex if you can subtract a positive parabaloid from it and it is still convex. (i.e. $f$ is strongly convex if $x \mapsto f (x) - ε | | x | |^{2}$ is convex for some $ε > 0$ .) Let $f$ be a strongly convex smooth function from $R^{n}$ to $R$ , and suppose that the magnitude of the second derivative $∥ \nabla^{2} f ∥$ is bounded. Show that there exists an $ε > 0$ such that the function $g : R^{n} \to R^{n}$ given by $x \mapsto x - ε (\nabla f) (x)$ is a contraction. Conclude that gradient descent with a sufficiently small constant step size converges exponentially quickly on a strongly convex smooth function.
A finite stationary Markov chain is a finite set $S$ of states, along with probabilistic rule $A : S \to Δ S$ for transitioning between the states, where $Δ S$ represents the space of probability distributions on $S$ . Note that the transition rule has no memory, and depends only on the previous state. If for any pair of states $s, t \in Δ S$ , the probability of passing from $s$ to $t$ in one step is positive, then the Markov chain $(S, A)$ is ergodic. Given an ergodic finite stationary Markov chain, use the Banach contraction mapping theorem to show that there is a unique distribution over states which is fixed under application of transition rule. Show that, starting from any state $s$ , the limit distribution ${lim}_{n \to \infty} A^{n} (s)$ exists and is equal to the stationary distribution.
A function $f$ from a partially ordered set to another partially ordered set is called monotonic if $x \leq y$ implies that $f (x) \leq f (y)$ . Given a partially ordered set $(P, \leq)$ with finitely many elements, and a monotonic function from $P$ to itself, show that if $f (x) \geq x$ or $f (x) \leq x$ , then $f^{n} (x)$ is a fixed point of $f$ for all $n > | P |$ .
A complete lattice $(L, \leq)$ is a partially ordered set in which each subset of elements has a least upper bound and greatest lower bound. Under the same hypotheses as the previous exercise, extend the notion of $f^{n} (x)$ for natural numbers $n$ to $f^{α} (x)$ for ordinals $α$ , and show that $f^{α} (x)$ is a fixed point of $f$ for all $x \in X$ with $f (x) \leq x$ or $f (x) \geq x$ and all $| α | > | L |$ ( $| A | \leq | B |$ means there is an injection from $A$ to $B$ , and $| A | > | B |$ means there is no such injection).
(Knaster-Tarski fixed point theorem) Show that the set of fixed points of a monotonic function on a complete lattice themselves form a complete lattice. (Note that since the empty set is always a subset, a complete lattice must be nonempty.)
Show that for any set $A$ , $(P (A), \subseteq)$ forms a complete lattice, and that any injective function from $A$ to $B$ defines a monotonic function from $(P (A), \subseteq)$ to $(P (B), \subseteq)$ . Given injections $f : A \to B$ and $g : B \to A$ , construct a subset $A^{'}$ of $A$ and a subset of $B^{'}$ of $B$ such that $B^{'} = f (A^{'})$ and $A - A^{'} = g (B - B^{'})$ .
(Cantor–Schröder–Bernstein theorem) Given sets $A$ and $B$ , show that if $| A | \leq | B |$ and $| A | \geq | B |$ , then $| A | = | B |$ . ( $| A | \leq | B |$ means there is an injection from $A$ to $B$ , and $| A | = | B |$ means there is a bijection)

Please use the spoilers feature - the symbol '>' followed by '!' followed by space -in your comments to hide all solutions, partial solutions, and other discussions of the math. The comments will be moderated strictly to hide spoilers!

I recommend putting all the object level points in spoilers and including metadata outside of the spoilers, like so: "I think I've solved problem #5, here's my solution <spoilers>" or "I'd like help with problem #3, here's what I understand <spoilers>" so that people can choose what to read.

Tomorrow's AI Alignment Forum Sequences post will be "Approval-directed agents: overview" by Paul Christiano in the sequence Iterated Amplification.

The next post in this sequence will be released on Saturday 24th November, and will be 'Fixed Point Discussion'.

Fixed Point Theorems

Frontpage

34 Ω 10

Diagonalization Fixed Point Exercises

26 comments45 karma

Fixed Point Discussion

3 comments46 karma

Iteration Fixed Point Exercises

More from Scott Garrabrant

Curated and popular this week

13Comments

New Comment

13 comments, sorted by

top scoring

Click to highlight new comments since: Today at 1:53 AM

[-]Gurkenglas8yΩ3120

#3:

$x \to \frac{x + 1}{x}$ on $x \geq 1$ shortens all distances but is strictly monotonic.

#6: (the "show that if" condition follows from the property, the question is likely misstated)

The iteration is so long that it must visit an element twice. We can't have a cycle in the order so the repetition must be immediate.

[-]Scott Garrabrant8yΩ360

Thanks, I actually wanted to get rid of the earlier condition that $f (x) \geq x$ for all $x$ , and I did that.

[-]Donald Hobson8yΩ2100

Answer to question 1.

Let $x_{i + 1} = f (x_{i})$ for arbitrary $x_{0}$ . Call $c = d (x_{0}, x_{1})$ . Then by induction ( $i < j$ ) $d (x_{i}, x_{j}) \leq \sum_{k = i}^{k = j - 1} d (x_{k}, x_{k + 1}) \leq \sum_{k = i}^{k = j - 1} c q^{k} \leq \frac{c q^{i}}{1 - q}$ (power series simplification)

Therefore $\forall δ > 0 : \exists n \in N \forall i > n, j > i : d (i, j) < \frac{c q^{n}}{1 - q} < δ$ ie $x_{i}$ is a cauchy sequence. However $(X, d)$ is said to be complete, which by definition means any cauchy sequence is convergent. So $x_{n} \to y$ and $d (x_{i}, y) \leq {sup}_{j = i}^{\infty} d (x_{i}, x_{j}) \leq \frac{c q^{i}}{1 - q}$ So $x_{n}$ converges exponentially quickly

Answer to question 2.

From part 1, as $f$ is continuous, $y = {lim}_{n \to \infty} (f (x_{n + 1})) = {lim}_{n \to \infty} (f (x_{n})) = f ({lim}_{n \to \infty} (x_{n})) = f (y)$ So $y$ is a fixed point. Suppose $x$ and $y$ are both fixed points of $f (x)$ a contraction map. Then $f (x) = x$ and $f (y) = y$ so $d (f (x), f (y)) \leq q d (x, y) = q d (f (x), f (y))$ therefore $d (x, y) = 0$ so $x = y$ . Thus $f$ has a unique fixed point.

Answer to question 3.

$(R, d (x, y) = | x - y |)$ is a metric space. Its the real line with normal distance. Let $f (x) = \sqrt{1 + x^{2}}$ . Then $f$ is a contraction map because $f$ is differentiable and $f^{'} (x) = \frac{x}{\sqrt{1 + x^{2}}}$ has the property $\forall x : | f^{'} (x) | < 1$ . However no fixed point exists as $\forall x : f (x) > x$ . This works because the sequence $x_{i}$ generated from repeated applications of $f$ will tend to infinity, despite successive terms becoming ever closer.

[-]Czynski8yΩ230

For Q2, I believe you aren't done:

You have established that there is at most one fixed point, but not that a fixed point exists.

[-]martinkunev2y10

For question 2

you haven't proven f is continuous

For question 3 you say

is a contraction map because $f$ is differentiable and ... $\forall x : | f^{'} (x) | < 1$

I would think proving this is part of what is asked for.

[-]SatvikBeri8yΩ190

#6:

Assume WLOG $f (x) >= x$ Then by monotonicity, we have $x <= f (x) <= f^{2} (x) <= . . . <= f^{| P |} (x)$ If this chain were all strictly greater, than we would have $| P | + 1$ istinct elements. Thus there must be some $k$ uch that $f^{k} (x) = f^{k + 1} (x)$ By induction, $f^{n + 1} (x) = f^{n} (x) = f^{k} (x)$ or all $n > k$

#7:

Assume $f (x) >= x$ nd construct a chain similarly to (6), indexed by elements of $α$ If all inequalities were strict, we would have an injection from $α$ o L.

#8:

Let F be the set of fixed points. Any subset S of F must have a least upper bound $x$ n L. If x is a fixed point, done. Otherwise, consider $f^{α} (x)$ which must be a fixed point by (7). For any q in S, we have $f (q) \leq x \Rightarrow f^{α} (q) \leq f^{α} (x) \Rightarrow q \leq f^{α} (x)$ Thus $f^{α} (x)$ s an upper bound of S in F. To see that it is the least upper bound, assume we have some other upper bound b of S in F. Then $x <= b \Rightarrow f^{α} (x) <= f^{α} (b) = b$

To get the lower bound, note that we can flip the inequalities in L and still have a complete lattice.

#9:

P(A) clearly forms a lattice where the upper bound of any set of subsets is their union, and the lower bound is the intersection.

To see that injections are monotonic, assume $A_{0} \subseteq A_{1}$ nd $f$ s an injection. For any function, $f (A_{0}) \subseteq f (A_{1})$ If $a \notin A_{0}$ nd $f (a) \in f (A_{0})$ that implies $f (a) = f (a^{'})$ or some $a^{'} \in A_{0}$ which is impossible since $f$ s injective. Thus $f$ s (strictly) monotonic.

Now $h := g \circ f$ s an injection $A \to A$ Let $X$ e the set of all points not in the image of $g$ and let $A^{'} = X \cup h (X) \cup h^{2} (X) \cup . . .$ ote that $h (A^{'}) = h (X) \cup h^{2} (X) \cup h^{3} (X) \cup . . . = A^{'} - X$ since no element of $X$ s in the image of $h$ Then $g (B - f (A^{'})) = g (B) - h (A^{'}) = g (B) - (A^{'} - X) = g (B) - A^{'} + g (B) \cap X = g (B) - A^{'}$ On one hand, every element of A not contained in $g (B)$ s in $A^{'}$ y construction, so $A - A^{'} \subseteq g (B)$ On the other, clearly $g (B) \subseteq A$ so $g (B) - A^{'} \subseteq A - A^{'}$ QED.

#10:

We form two bijections using the sets from (9), one between A' and B', the other between A - A' and B - B'.

Any injection is a bijection between its domain and image. Since $B^{'} = f (A^{'})$ nd $f$ s an injection, $f$ s a bijection where we can assign each element $b^{'} \in B^{'}$ o the $a^{'} \in A^{'}$ uch that $f (a^{'}) = b^{'}$ Similarly, $g$ s a bijection between $B - B^{'}$ nd $A - A^{'}$ Combining them, we get a bijection on the full sets.

[-]Czynski8yΩ290

We wish to show that the terms of $x_{n}$ form a Cauchy sequence, which suffices to demonstrate they converge in a complete space. Take $m, n \in N^{+}$ , and WLOG $m < n$ . Then we know from the definition of contraction that $d (x_{m}, x_{n}) \leq q^{m} \cdot d (x_{0}, x_{n - m})$ . This converges to 0 as m increases, so the sequence is Cauchy.

It's easy to see that this makes the rate of convergence between terms of the Cauchy sequence exponentially quick. Intuitively that seems like it ought to make the sequence converge to its limit with the same speed, but I don't think that can be made rigorous without more steps.

Take a sequence ${x_{n} = f^{n} (x_{0})}$ . This converges to some $L$ . Suppose $L$ was not a fixed point. Then choose an $ϵ = (L - f (L)) / 10$ . A sequence ${x_{n}}$ which converges to a limit has, for every $ϵ$ , some $N$ such that $\forall n >= N : | x_{n} - L | < ϵ$ . Then we know that $d (x_{N}, L) < ϵ$ but $d (f (x_{N}), f (L)) > ϵ$ , contradicting the contraction condition. So there is at least one fixed point, $L$ .

Suppose there are two fixed points, $f (x) = x$ , $f (y) = y$ for distinct $x$ and $y$ . If so, $d (f (x), f (y)) = d (x, y)$ , which again contradicts the contraction condition. So there is at most one fixed point.

Take as the space ${n \in R^{+} : n \geq 1}$ , with the usual metric. Define $f (x) = \frac{x^{2} + 1}{x}$ . This is a weak contraction (toward infinity) and has no fixed points within this space.

[-]Adele Lopez8yΩ350

Ex 6:

If at any point $f^{n} (b) = f^{n - 1} (x)$ , then we're done. So assume that we get a strict increase each time up to $n = | P |$ . Since there are only $| P |$ elements in the entire poset, and $f$ is monotone, $f^{n + 1} (x)$ has to equal $f^{n} (x)$ .

Ex 7:

For a limit ordinal $α$ , define $f^{α} (x)$ as the least upper bound of $f^{n} (x)$ for all $n < α$ . If $α > | L |$ , then the set $f^{n} (x)$ for $n < α$ is a set of size $α$ that maps into a set of size $L$ by taking the value of the element. Since there are no injections between these sets, there must be two ordinals $n < m$ such that $f^{n} (x) = f^{m} (x)$ . Since $f$ is monotone, that implies that for every ordinal $l > n$ , $f^{l} (x) = f^{n} (x)$ and thus is a fixed point. Since $n < α$ this proves the exercise.

Ex 8:

Starting from $x$ , we can create a fixed point via iteration by taking $α > | L |$ , and iterating $α$ times as demonstrated in Ex 7. Call this fixed point $f_{x}$ . Suppose there was a fixed point $k$ such that $x \leq k$ and $k \leq f_{x}$ . Then at some point $f^{n} (x) \leq f^{n} (k) = k$ , but $f^{n + 1} (x) \geq f^{n + 1} (k) = k$ , which breaks the monotonicity of $f$ unless $k = f_{x}$ . So $f_{x}$ generated this way is always the smallest fixed point greater than $x$ .

Say we have fixed points $x_{i}$ . Then let $x$ be the least upper bound of $x_{i}$ , and generate a fixed point from $f_{x}$ . So $f_{x}$ will be greater than each element of $x_{i}$ since $f$ is monotone, and is the smallest such fixed point as shown in the above paragraph. So the poset of fixed points is semi-complete with upper bounds.

Now take our fixed points $x_{i}$ again. Now let $x$ be the greatest lower bound of $x_{i}$ , and generate a fixed point $f_{x}$ . Since $x \leq x_{i}$ and $f$ is monotonic, $f^{α} (x) \leq f^{α} (x_{i}) = x_{i}$ , and so $f_{x}$ is a lower bound of $x_{i}$ . It has to be the greatest such bound because $x$ itself is already the greatest such bound in our poset, and $f$ is monotonic.

Thus the lattice of fixed points has all least upper bounds and all greatest lower bounds, and is thus complete!

[-]Rafael Harth6y40

Ex1

Let , let $D := d (x_{0}, x_{1})$ , and let $k \in N$ . Then $d (x_{k}, x_{k + 1}) = d (f (x_{k - 1}, f (x_{k})) \leq q d (x_{k - 1}, x_{k}) \dots \leq q^{k} D$ . For each $ϵ \in R_{+}$ , we find an $n \in N$ such that $q^{n} < \frac{ϵ}{D}$ , then $d (x_{n}, x_{n + 1}) \leq q^{n} D < ϵ$ . This proves that $(x_{k})_{k \in N}$ is a Cauchy-sequence, which (because $(X, d)$ is complete) means it converges to some point $x^{*} \in X$ .

Furthermore, given a position $n \in N$ , we have

$d (x_{n}, x^{*}) \leq \infty \sum k = n d (x_{k}, x_{k + 1}) \leq \infty \sum k = n q^{k} D = q^{n} D \infty \sum k = 0 q_{k} = q^{n} D \frac{1}{1 - q} = D \cdot \frac{q^{n}}{1 - q} < D \cdot (\frac{q}{1 - q})^{n}$ .

Ex2

Given a any sequence $(x_{k})_{k \in N}$ in $X$ , it converges to some point $x^{*}$ , and it's easy to see that $x^{*}$ is a fixed point of $f$ . Let $y^{*}$ be a fixed point of $X$ . Then, $d (f (x^{*}), f (y^{*})) = d (x^{*}, y^{*})$ , hence $x = y$ . (Otherwise, this contradicts the fact that $f$ is a contraction.)

Ex3

Choose $X ⊊ R^{2}$ as $X := R \times N_{+}$ . Then $X$ is complete because, given any Cauchy sequence $(x_{k})_{k \in N}$ , it's easy to prove that there is an $n \in N$ such that all but finitely many $x_{k}$ are in the subspace $R \times {n}$ . However, the map $f : X \to X$ given by $f (x, n) := f (x + \frac{1}{n}, n + 1)$ has no fixed points since it moves each point by at least 1. (And it's straight-forward to verify that $f$ is a weak contraction.)

[-]seed6y30

$d (f^{n} (x), f^{n + 1} (x)) \leq q^{n} d (x, f (x))$ - can show by induction.

$\forall m > n, d (f^{n} (x), f^{m} (x)) \leq q^{n} + q^{n + 1} + \dots + q^{m - 1} \leq \frac{q^{n}}{1 - q} \to_{n \to \infty} 0$

Therefore, $f^{n} (x)$ is a Cauchy sequence, and since (X, d) is complete, it must have a limit in X. Suppose $y = {lim}_{n \to \infty} f^{n} (x)$ . Then $d (y, f^{n + 1} (x)) \leq q d (y, f^{n} (x))$ , therefore $d (y, f^{n} (x)) \leq q^{n} d (x, y)$

Suppose $y = {lim}_{n \to \infty} f^{n} (x)$ . Let's show that y is a fixed point. Indeed, for any n, $d (f^{n} (x), f (y)) \leq q d (f^{n + 1} (x), y)$ , and if we take the limit in both sides, we get $d (y, f (y)) \leq q d (y, y) = 0$ .

Let's show uniqueness: suppose x and y are fixed points, then $d (x, y) = d (f (x), f (y)) \leq q d (x, y)$ , therefore d(x,y) = 0.

$X = [1, + \infty)$ , f(x) = x + 1/x.

Suppose $f (x) = ϵ | | x | |^{2} + h (x)$ , where h is some convex function and $ϵ < 1 / 2$ . Take $x, y \in R^{n}$ . Since h is convex on segment [x,y], its directional derivative is nondecreasing. Its directional derivative is a projection of gradient of g on the [x,y] line. Therefore, we have $⟨ \nabla h (y), y - x ⟩ \geq ⟨ \nabla h (x), y - x ⟩$ , or $⟨ \nabla h (y) - \nabla h (x), y - x ⟩ \geq 0$ . Hence,

$| | x - \nabla f (x) - y + \nabla f (y) | | = | | x - 2 ϵ x - \nabla h (x) - y - 2 ϵ y + \nabla h (y) | | \leq (1 - 2 ϵ) | | x - y | |$

Therefore, g is a contraction mapping, and from problem 1 it follows that the gradient descent converges exponentially quickly.

Suppose A is an NxN positive matrix, and e is its minimal entry. (Then e < 1/N). Then we can write A = eJ + (1 - Ne)Q, where J is a matrix whose entries are all 1, and Q is a matrix whose entries are all nonnegative and the sum of each column is 1 (because the sum of each column is 1 in A and Ne in J). Suppose x and y are probability distributions, i.e. N-dimensional vectors with nonnegative entries whose sum is 1. Then $| | A x - A y | |_{1} = | | e J (x - y) + (1 - N e) Q (x - y) | |_{1} = (1 - N e) | | Q (x - y) | |$

Denote $x^{+} = max (x - y, 0)$ , $x^{-} = min (x - y, 0)$ (pointwise max/min). Then $x - y = x^{+} - x^{-}$ , $| | Q (x - y) | |_{1} = | | Q x^{+} - Q x^{-} | |_{1} \leq | | Q x^{+} | |_{1} + | | Q x^{-} | |_{1} = | | x^{+} | |_{1} + | | x^{-} | |_{1} = | | x - y | |_{1}$ ,

so $| | A x - A y | |_{1} \leq (1 - N e) | | x - y | |_{1}$ . The space of all probability distributions with metric induced by $| | . | |_{1}$ - norm is a compact subset of $R^{n}$ , so it is a complete metrics space, therefore, the sequence $A^{n} (x)$ converges to a unique fixed point.

Let us assume $x \leq f (x)$ (the proof for $x \geq f (x)$ is the same). Then, from monotonicity of f, $x \leq f (x) \leq f (f (x)) \leq \dots$ is an ascending chain. This sequence cannot have more that |P| distinct elements, so an element of this sequence is going to repeat: $f^{m} (x) = f^{n} (x), m < n, m < | P |$ . Then all the inequalities in $f^{m} (x) \leq f^{m + 1} (x) \leq \dots \leq f^{n} (x)$ must be equalities, so $f (f^{m} (x)) = f^{m} (x)$ , $f^{m} (x)$ is a fixed point.

[-]XelaP2mo20

I have fond memories of the contraction mapping theorem, because it was the first fixed point theorem I ever learned.

Let c = d(x, f(x)). Then d(f(x), f^2(x)) <= q d(x, f(x)) = qc, and in general d(f^n(x), f^{n+1}(x)) <= q^n c.

Now, d(f^n(x), f^{m}(x)) <= c * \sum_{n <= i <= m} q^i which is a geometric series. The sum of this is then c q^n (1 - q^{m-n})/(1 - q) <= the full series (which converges since q < 1) = c q^n/(1 - q)

This can be made arbitrarily small, so the sequence is Cauchy, so since we are in a complete metric space it converges.

The distance to the limit point is at most that value c q^n/(1 - q), because all terms after the nth iterate are within that of the nth iterate. This gives us the exponential convergence.

First, there's at most one fixed point for a contraction mapping. For if we had two, then we could take d(x,y) = d(f(x), f(y)) <= q d(x,y) < d(x,y) for the two fixed points x,y. But then, we get a contradiction - unless x = y, in which case q d(x,y) = d(x,y) = 0.

For existence: the limit of the iterates of any x is our fixed point. This is because f(lim f^n(x)) = lim f(f^n(x)) = lim f^{n+1}(x) = lim f^n(x), where the first equality is from continuity of f

Take the space of infinite binary sequences, where the distance we assign is 1/N where N is the earliest index for which two sequences differ (or 0 if they are the same always). This is a complete metric space, actually (as you can easily check). But the right shift operator is continuous (since if you differ no earlier than the Nth spot, then the image of you differs by no earlier than the (N+1)th spot), and has no fixed point, despite the fact that it also is a weak contraction (by virtue of pushing any differences to the right). This is because N/(N+1) isn't bounded by a constant.

4: TODO

Take two points x, y. We'd like to show that there's an epsilon such that ||(x - eps grad(f)(x)) - (y - eps grad(f)(y))|| is bounded by ||x - y|| times some constant less than 1.

We have ||x - y + eps (grad(f)(y) - grad(f)(x))|| =

5: TODO

WLOG suppose x <= f(x). Then f(x) <= f^2(x), and we can repeat upwards. For n > |P|, by the pigeonhole principle the nth iterate is equal to some earlier iterate, so there's n and m such that m < n f^n(x) = f^m(x). Thus all the iterates between these two are also equal. But then f(f^m(x)) = f^m(x) , so f^m(x) is a fixed point, but this is the same as f^n(x).

f^n(x) is the same as max {f(f^m(x)) : m < n}. So let's define f^alpha(x) as sup {f(f^beta(x)) : beta < alpha}

WLOG suppose again that x <= f(x). Now if alpha is a limit ordinal than f(f^beta(x)) for any beta < alpha will just be equal to f^gamma(x) with gamma = beta + 1 < alpha. Thus this is really the sup of all previous iterates, and thus larger than it. For successor ordinals, we need to iterate the previous iterate once more - but we can use monotonicity here to get that we are larger than the previous iterate.

Thus we see that f^beta(x) <= f^alpha(x) for all beta < alpha.

Suppose we have some alpha such that |alpha| > |L|. Then if this sequence is made of all unique elements, we would be able to monotonically inject alpha into L by sending each smaller ordinal to the associated iterate. This totally ordered chain of iterates has an ordinal type, which we just mapped alpha to an initial element of, and so we have |alpha| <= |L| as ordinals, a contradiction (if instead we were to take them as cardinals, we'd still have a contradiction, because we can ordinal inject the least ordinal with the same cardinality with alpha, which then gives us an ordinal injection to L). Thus we have some repeat where f^beta(x) = f^alpha(x) for beta < alpha. But then all the iterates between these two are also equal.

So f(f^(beta)(x)) = f^beta(x), so f^beta(x) is a fixed point of f, f^beta(x) = f^alpha(x) so f^alpha(x) is a fixed point of f.

The question is asking whether the set of fixed points has a sup and an inf.

Suppose we have some subset S of fixed points. I hope that the sup of these fixed points in the original lattice is actually a fixed point.

For any element x, the supremum s is larger than x, and so f(x) = x <= f(s). Thus f(s) is an upper bound for the subset, and thus s <= f(s). But then for some alpha, f^alpha(s) is a fixed point of f that's an upper bound. So now I hope f^alpha(s) is our supremum in the fixed point lattice.

Suppose there was an upper bound that was a fixed point, u. Since s is a supremum in the encompassing lattice L, we have s <= u. Thus f(s) <= f(u) = u. In general, f^n(s) <= f^n(u) = u. We can also take a sup of them to get f^beta(s) <= u for limit ordinals beta. Therefore f^alpha(s) <= u. But then f^alpha(s) is smaller than all upper bounds that are fixed points, and so f^alpha(s) is a supremum in the fix point lattice

Likewise for infimums.

Incidentally: The least element l satisfies l <= f(l), so there's an alpha such that the alpha iterate is a fixed point. If x is a fixed point, then since l <= x for all x, we have f^alpha(l) <= f^alpha(x) = x. But then f^alpha(l) is the least fixed point.

9: TODO

P(A) is clearly a lattice. To take sups and infs, we can just take arbitrary unions and intersections. (I think technically you need the axiom of choice, but whatever). Given a function f: A -> B, we have that S <= S' <= A implies f^img(S) <= f^img(S') (we don't need injectivity for this), so the induced f is monotone.

Since f,g are injective, we know that the induced functions on the powerset lattices are also injective - that is, that f^img(S) != f^img(S') if S != S'. Also, for injective functions, f(X - Y) = f(X) - f(Y)

...

10:

Using problem 9, there is a set A' and B' such that f(A') = B' and A - A' = g(B - B')

Now, we'd like to biject A' with B' and A - A' with B - B'. To do this, we can take the restriction of f on A' (restricting the codomain to B' as well), and likewise take the restriction of g. The restrictions are bijections, because they are surjections and we already knew they were injections.

Then we can take the union of the relation associated to the restriction of f and the inverse of the relation associated with the restriction of g. This is then a function A -> B, and it's a bijection.

[-]Rafael Harth6y*20

(The second half made me realize how much more comfortable I am with abstract exercises than with regular Analysis à la Ex4.)

Ex6

If , then all $f^{n} (x)$ are comparable to each other: we have

$f (x) \leq x ⟹ f (f (x)) \leq f (x) ⟹ f^{3} (x) \leq f^{2} (x)$

and so on. Furthermore, if $n \in N$ is such that $f^{n} (x) \neq f^{n - 1} (x)$ , then $f^{k} (x) \neq f^{k - 1} (x)$ for all $k \in [n]$ as well (verify by looking at the contrapositive). Consequently (set $n := | P |$ ), if $f^{n} (x)$ were not a fixed point of $f$ , then $f^{n + 1} (x) < f^{n} (x)$ , and hence ${x, f (x), f^{2} (x), . . ., f^{n + 1} (x)} \subseteq P$ , which means $P$ would have $| P | + 2$ elements.

If $x \leq f (x)$ , we get $x \leq f (x) \leq f^{2} (x)$ and so on, leading to the same argument.

Ex7

Wlog, assume $x \leq f (x)$ . Set $f^{0} (x) := x$ . Given any non-limit ordinal $β$ , we find a predecessor $α$ and set $f^{β} (x) := f (f^{α} (x))$ . Given any limit ordinal $ω$ , we set $f^{ω} (x) := sup {f^{α} (x) | α \in ω}$ .

Suppose this doesn't define $f^{α} (x)$ for all ordinals $α$ . Then, there is some smallest ordinal $α^{*}$ such that $f^{α^{*}} (x)$ is not defined. This immediately yields a contradiction (regardless of whether $α^{*}$ is a limit ordinal or not).

We want this construction to have the properties that $f (x) = x ⟹ f^{β} (x) = x$ and that $x \leq y ⟹ f^{β} (x) \leq f^{β} (y)$ . Thus, let $x, y \in L$ and $β$ be an ordinal. If $β$ has a predecessor, the check for both properties are easy. If not, then $f^{β} (x) = sup {f^{α} (x) | α \in β}$ and $f^{β} (y) = sup {f^{α} (y) | α \in β}$ . Then, for the first property, note that the upper-bound of a one-element set is just the element itself. For the second, note that each element in the first set is smaller than some element in the second set, so $f^{β} (y)$ is an upper-bound for the first set, which implies that $f^{β} (x) \leq f^{β} (y)$ since $f^{β} (x)$ is the lowest upper-bound.

Now, given an ordinal $α$ , our construction defines a function $ϕ : α \to L$ . If $f (f^{α} (x)) \neq f^{α} (x)$ , then the chain doesn't become stationary at any earlier point either (to verify, take a smallest $α$ such that [ $f (f^{α} (x)) \neq f^{α} (x)$ but the chain is stationary for smaller ordinals] and derive a contradiction), and hence $ϕ$ is injective, proving that $α \leq | L |$ . (This is the generalized version of the argument from Ex6.)

Ex8

Let $f : L \to L$ be monotonic and let $L^{'}$ be the set of fixed points of $f$ . Then $L^{'}$ inherits the partial order from $L$ ; what needs doing is verify the least upper-bound property. So let $X \subseteq L^{'}$ . Then, $X$ has a least upper-bound $u$ in $L$ .

Let $ω$ be some ordinal with $ω > | L |$ . From the previous exercise, we know that $f (f^{ω} (u)) = f^{ω} (u)$ . Choose the smallest $α$ such that $f (f^{α} (u)) = f^{α} (u)$ . Then, $f^{α} (u) \in L^{'}$ and $x \leq u \leq f^{α} (u)$ , hence $f^{α} (u)$ is an upper-bound of $X$ .

It remains to show that it is the least upper-bound. Thus, let $u^{'} \in L^{'}$ be another upper-bound of $x$ . Then, $u \leq u^{'}$ in $L$ , hence $f^{α} (u) \leq f^{α} (u^{'}) = u^{'}$ (apply Ex7).

Ex9

A least upper-bound is obtained via $⋃$ on all sets, and the greatest lower-bound via $⋂$ . (Easy checks.) Given any function $f : A \to B$ , we trivially have $X \subseteq Y ⟹ f (X) \subseteq f (Y)$ ; injectivity is not needed.

We define

$A^{(0)} := A$
$A^{(n + 1)} := A^{(n)} - g (B - f (A^{(n)})) \forall n \in N$
$A^{'} := ⋂_{j = 0}^{\infty} A^{(j)}$ (i.e., greatest lower bound of the $A$ 's)

We need to verify that $g (B - f (A^{'})) = A - A^{'}$ , then $A^{'}$ and $f (A^{'})$ are the desired sets.

" $\subseteq$ ": Let $y \in B - f (A^{'})$ . Then, there exists some smallest $j \in N$ such that $y \in B - f (A^{(j)})$ . (The case $j = 0$ is possible and included.) We have $A^{(j + 1)} = A^{(j)} - g (B - f (A^{(j)})$ , hence $g (y) \notin A^{(j)}$ . Then, $g (y) \notin A^{'}$ , hence $g (y) \in A - A^{'}$ .
" $\supseteq$ ": Let $x \in A - A^{'}$ . Then, there exists some smallest $k \in N$ such that $x \notin A^{(k)}$ . In this case, we must have $k > 0$ , so we know that $x \in A^{(k - 1)}$ . It follows that $x$ was lost at this step, i.e.,

$x \in A^{(k - 1)} - A^{(k)} = g (B - f (A^{(j)})) \subseteq g (B - f (A^{'}))$

Ex10

Let $A^{'}$ be the set constructed in Ex9. Then, we can define a bijection $ϕ : A \to B$ via

$ϕ : x \mapsto {\begin{matrix} f (x) & x \in A^{'} g^{- 1} (x) & x \notin A^{'} \end{matrix}$

[-]Rafael Harth6y20

Ex5 (this is super ugly but I don't think it's worth polishing and it does work. All important ideas are in the first third of the proof, the rest just inelegantly resolves the details.)

We define our metric space as where $X := [0, 1]^{d}$ is the set of probability distributions, and $d (x, y) = \sum_{j = 1}^{d} | x_{j} - y_{j} |$ . Let $x, y \in X$ and let $Δ := x - y$ , then $d (A x, A y)$ can be computed as

$d \sum i = 1 | (A x - A y)_{i} | = d \sum i = 1 | (A Δ)_{i} | = d \sum i = 1 | d \sum k = 1 a_{k, i} δ_{k} | \leq d \sum i = 1 d \sum k = 1 | a_{i, k} δ_{k} | = d \sum i = 1 δ_{i}$

where the last step holds because multiplying a vector with the state-transition matrix leaves the sum of entries unchanged. (Reasonably easy to verify using that each column of $A$ sums up to 1.)

If $x \neq y$ , then $Δ$ has at least one negative entry and the inequality is strict. In that case, let $k = {argmax}_{i \in {1, . . ., d}} δ_{i}$ and $ℓ = {argmin}_{i \in {1, . . ., d}} δ_{i}$ . In particular, we have $δ_{k} > 0 > δ_{ℓ}$ . Note that, when two numbers $a, b \in R$ have different sign, then $| a + b | = | | a | - | b | |$ and thus $| a | + | b | - | a + b | = min (2 | a |, 2 | b |)$ . Therefore, the amount that gets canceled out is at least

$d \sum i = 1 | a_{i, k} δ_{k} | + | a_{i, ℓ} δ_{ℓ} | - | a_{i, k} δ_{k} + a_{i, ℓ} δ_{ℓ} | = d \sum i = 1 2 min (| a_{i, k} δ_{k} |, | a_{i, ℓ} δ_{ℓ} |)$

Let $a^{'}$ be the smallest entry in $A$ , then we can lower-bound the above as

$d \sum i = 1 a^{'} 2 min (| δ_{k} |, | δ_{ℓ} |) = 2 a^{'} d min (| δ_{k} |, | δ_{ℓ} |)$

Wlog, let $| δ_{k} | > | δ_{ℓ} |$ . Let $K$ be the sum of all postive entires of $Δ$ , then $\sum_{i = 1}^{d} | δ_{i} | = 2 K$ , so the term we want to lower-bound is $\frac{2 a^{'} d δ_{ℓ}}{2 K} = \frac{a^{'} d δ_{ℓ}}{K}$ . The sum of the negative entries is $- K$ , which means that the one with largest norm among them has norm at least $\frac{1}{d - 1} | K |$ . Thus, the relative decrease is at least $\frac{a^{'} d \frac{K}{d - 1}}{K} = \frac{a^{'} d}{d - 1} > a^{'}$

Then, $\frac{d (x, y) - d (A (x), A (y))}{d (x, y)} \geq a^{'}$ , hence $\frac{d (A (x), A (y)}{d (x, y)} \leq 1 - a^{'}$ . This proves that $A$ is a contraction; apply Banach's theorem.

Moderation Log