## Meetup : Bristol meetup

4 15 October 2013 08:04PM

## Discussion article for the meetup : Bristol meetup

WHEN: 20 October 2013 02:00:00PM (+0100)

WHERE: Hodgkin House, 3 Meridian Place, Bristol BS8 1JG

We'll have another meetup in Bristol this upcoming Sunday, October 20, at the student house where I live. We'll officially start at 2pm to hopefully make it reasonably convenient for everyone, but at least two of us will be there from around 12, so if you want to come earlier and hang out a bit more, let me know! (benja.fallenstein@gmail.com, or PM me.)

I'll put up a LessWrong sign outside saying this, but please call me at 07463169075 or (from around 2pm on) ring the buzzer marked "Basement" when you arrive.

Also, whether or not you can attend this time, if you're interested in future meetups, please join the Google group for organizing meetup times!

## Meetup : Second Bristol meetup & mailing list for future meetups

2 06 June 2013 04:47PM

## Discussion article for the meetup : Second Bristol meetup & mailing list for future meetups

WHEN: 16 June 2013 03:00:00PM (+0100)

WHERE: Hodgkin House, 3 Meridian Place, Bristol BS8 1JG

At our lovely first meetup (four people came, if I count myself), I unfortunately forgot to take the opportunity to sort out when a good time for the next meetup would be. Sorry!

Since I'm about to be away for a while and I think others are leaving for the summer as well, I decided to just be bold once more and announce a time and hope that somebody else is free as well. But to make it easier to find good times in the future, please join the Google Group I've just created!

Last time, we ended up sitting in the cafe for hours without consuming much, so for this meetup I've booked the dining room at the student house where I live, which should be a quiet and comfortable place to talk. I'll also put up a LessWrong sign outside saying this, but please ring the buzzer marked "Basement", or you can call me at +43-660-1461996 (unfortunately I don't have a UK mobile yet, but if you ring just once, I'll come up and meet you).

Time & date is Sunday, the 16th of June, starting at 3pm. Hope that I didn't pick a terrible time and someone will be able to join me! :-)

## Meetup : First Bristol meetup

4 17 May 2013 04:52PM

## Discussion article for the meetup : First Bristol meetup

WHEN: 25 May 2013 03:00:00PM (+0100)

WHERE: Friska Queens Road (on the Clifton triangle), Bristol

Back in 2010, Bristol had 4000+ unique LW visitors, but we've never had a meetup -- let's try and see what happens! I'll be in the Friska on Queens Road (on the Clifton triangle, right next to the university campus) on Saturday the 25th at 3pm, with a LessWrong sign and a paperback of HPMOR. Anyone going to join me? :-)

## Discussion article for the meetup : First Bristol meetup

21 04 January 2013 01:57AM

In what became 5th most-read new post on LessWrong in 2012, Morendil told us about a study widely cited in its field... except that source cited, which isn't online and is really difficult to get, makes a different claim — and turns out to not even be the original research, but a PowerPoint presentation given ten years after the original study was published!

Fortunately, the original study turns out to be freely available online, for all to read; Morendil's post has a link. The post also tells us the author and the year of publication. But that's all: Morendil didn't provide a list of references; he showed how the presentation is usually cited, but didn't give a full citation for the original study.

The link is broken now. The Wayback machine doesn't have a copy. The address doesn't give hints about the study's title. I haven't been able to find anything on Google Scholar with author, year, and likely keywords.

I rest my case.

## Math appendix for: "Why you must maximize expected utility"

8 13 December 2012 01:11AM

This is a mathematical appendix to my post "Why you must maximize expected utility", giving precise statements and proofs of some results about von Neumann-Morgenstern utility theory without the Axiom of Continuity. I wish I had the time to make this post more easily readable, giving more intuition; the ideas are rather straight-forward and I hope they won't get lost in the line noise!

The work here is my own (though closely based on the standard proof of the VNM theorem), but I don't expect the results to be new.

*

I represent preference relations as total preorders $\preccurlyeq$ on a simplex $\Delta_N$; define $\prec$, $\sim$, $\succcurlyeq$ and $\succ$ in the obvious ways (e.g., $x\sim y$ iff both $x\preccurlyeq y$ and $y\preccurlyeq x$, and $x\prec y$ iff $x\preccurlyeq y$ but not $y\preccurlyeq x$). Write $e^i$ for the $i$'th unit vector in $\mathbb{R}^N$.

In the following, I will always assume that $\preccurlyeq$ satisfies the independence axiom: that is, for all $x,y,z\in\Delta_N$ and $p\in(0,1]$, we have $x\prec y$ if and only if $px + (1-p)z \prec py + (1-p)z$. Note that the analogous statement with weak preferences follows from this: $x\preccurlyeq y$ holds iff $y\not\prec x$, which by independence is equivalent to $py + (1-p)z \not\prec px + (1-p)z$, which is just $px + (1-p)z \preccurlyeq py + (1-p)z$.

Lemma 1 (more of a good thing is always better). If $x\prec y$ and $0\le p < q \le 1$, then $(1-p)x + py\prec (1-q)x + qy$.

Proof. Let $r := q-p$. Then, $(1-p)x + py = \big((1-q)x + py\big) + rx$ and $(1-q)x + qy = \big((1-q)x + py\big) + ry$. Thus, the result follows from independence applied to $x$$y$, $\textstyle\frac{1}{1-r}\big((1-q)x + py\big)$, and $r$.$\square$

Lemma 2. If $x\preccurlyeq y\preccurlyeq z$ and $x\prec z$, then there is a unique $p\in[0,1]$ such that $(1-q)x + qz \prec y$ for $q\in[0,p)$ and $y\prec (1-q)x + qz$ for $q\in(p,1]$.

Proof. Let $p$ be the supremum of all $r\in[0,1]$ such that $(1-r)x + rz\preccurlyeq y$ (note that by assumption, this condition holds for $r=0$). Suppose that $0\le q. Then there is an $r\in(q,p]$ such that $(1-r)x + rz\preccurlyeq y$. By Lemma 1, we have $(1-q)x + qz \prec (1-r)x + rz$, and the first assertion follows.

Suppose now that $p < q \le 1$. Then by definition of $p$, we do not have $(1-q)x + qz\preccurlyeq y$, which means that we have $(1-q)x + qz\succ y$, which was the second assertion.

Finally, uniqueness is obvious, because if both $p$ and $p'$ satisfied the condition, we would have $\textstyle y \prec \big(1 - \frac{p+p'}2\big)x + \frac{p+p'}2z \prec y$.$\square$

Definition 3. $x$ is much better than $y$, notation $x\succ_* y$ or $y\prec_* x$, if there are neighbourhoods $U$ of $x$ and $V$ of $y$ (in the relative topology of $\Delta_N$) such that we have $x' \succ y'$ for all $x'\in U$ and $y'\in V$. (In other words, the graph of $\succ_*$ is the interior of the graph of $\succ$.) Write $x\preccurlyeq_* y$ or $y\succcurlyeq_* x$ when $x\nsucc_* y$ ($x$ is not much better than $y$), and $x\sim_* y$ ($x$ is about as good as $y$) when both $x\preccurlyeq_* y$ and $x\succcurlyeq_* y$.

Theorem 4 (existence of a utility function). There is a $u\in\mathbb{R}^N$ such that for all $x,y\in\Delta_N$,

$\sum_i x_i\,u_i \;<\; \sum_i y_i\,u_i\;\;\iff\;\; x\prec_* y\;\;\implies\;\;x\prec y.$

Unless $x\sim y$ for all $x$ and $y$, there are $i,j\in\{1,\dotsc,N\}$ such that $u_i\neq u_j$.

Proof. Let $i$ be a worst and $j$ a best outcome, i.e. let $i,j\in\{1,\dotsc,N\}$ be such that $e^i\preccurlyeq e^k\preccurlyeq e^j$ for all $k\in\{1,\dotsc,N\}$. If $e^i\sim e^j$, then $e^i \sim e^k$ for all $k$, and by repeated applications of independence we get $x\sim e^i\sim y$ for all $x,y\in\Delta_N$, and therefore $x\sim_* y$ again for all $x,y\in\Delta_N$, and we can simply choose $u=0$.

Thus, suppose that $e^i\prec e^j$. In this case, let $u$ be such that for every $k\in\{1,\dotsc,N\}$, $u_k$ equals the unique $p$ provided by Lemma 2 applied to $e^i\preccurlyeq e^k\preccurlyeq e^j$ and $e^i\prec e^j$. Because of Lemma 1, $u_i = 0 \neq 1 = u_j$. Let $f(r) := (1-r)e^i + re^j$.

We first show that $\textstyle p := \sum_k x_k\,u_k < \sum_k y_k\,u_k =: q$ implies $x\prec y$. For every $k$, we either have $u_k < 1$, in which case by Lemma 2 we have $e^k \prec f(u_k + \epsilon_k)$ for arbitrarily small $\epsilon_k > 0$, or we have $u_k = 1$, in which case we set $\epsilon_k := 0$ and find $e^k\preccurlyeq e^j = f(u_k + \epsilon_k)$. Set $\textstyle \epsilon := \sum_k x_k\,\epsilon_k$. Now, by independence applied $N-1$ times, we have $\textstyle x = \sum_k x_k\,e^k \preccurlyeq \sum_k x_k f(u_k + \epsilon_k) = f(p+\epsilon)$; analogously, we obtain $y \succcurlyeq f(q-\delta)$ for arbitrarily small $\delta > 0$. Thus, using $p and Lemma 1, $x\preccurlyeq f(p+\epsilon)\prec f(q-\delta)\preccurlyeq y$ and therefore $x\prec y$ as claimed. Now note that if $\textstyle\sum_k x_k\,u_k < \sum_k y_k\,u_k$, then this continues to hold for $x'$ and $y'$ in a sufficiently small neighbourhood of $x$ and $y$, and therefore we have $x\prec_* y$.

Now suppose that $\textstyle \sum_k x_k\,u_k \ge \sum_k y_k\,u_k$. Since we have $u_i = 0$ and $u_j = 1$, we can find points $x'$ and $y'$ arbitrarily close to $x$ and $y$ such that the inequality becomes strict (either the left-hand side is smaller than one and we can increase it, or the right-hand side is greater than zero and we can decrease it, or else the inequality is already strict). Then, $x'\succ y'$ by the preceding paragraph. But this implies that $x\not\prec_* y$, which completes the proof.$\square$

Corollary 5. $\preccurlyeq_*$ is a preference relation (i.e., a total preorder) that satisfies independence and the von Neumann-Morgenstern continuity axiom.

Proof. It is well-known (and straightforward to check) that this follows from the assertion of the theorem.$\square$

Corollary 6. $u$ is unique up to affine transformations.

Proof. Since $u$ is a VNM utility function for $\preccurlyeq_*$, this follows from the analogous result for that case.$\square$

Corollary 7. Unless $x\sim y$ for all $x,y\in\Delta_N$, for all $r\in\mathbb{R}$ the set $\textstyle \{x\in\Delta_N : \sum_i x_i\,u_i = r\}$ has lower dimension than $\Delta_N$ (i.e., it is the intersection of $\Delta_N$ with a lower-dimensional subspace of $\mathbb{R}^N$).

Proof. First, note that the assumption implies that $N\ge 2$. Let $v\in\mathbb{R}^N$ be given by $v_i = 1$, $\forall i$, and note that $\Delta_N$ is the intersection of the hyperplane $A := \{x\in\mathbb{R}^N : x\cdot v = 1\}$ with the closed positive orthant $\mathbb{R}^N_+$. By the theorem, $u$ is not parallel to $v$, so the hyperplane $B_r := \{x\in\mathbb{R}^N : x\cdot u = r\}$ is not parallel to $A$. It follows that $A\cap B_r$ has dimension $N-2$, and therefore $\textstyle\{x\in\Delta_N : \sum_i x_i\,u_i = r\} \;=\; A\cap B_r\cap\mathbb{R}^N_+$ can have at most this dimension. (It can have smaller dimension or be the empty set if $A\cap B_r$ only touches or lies entirely outside the positive orthant.)$\square$

## Why you must maximize expected utility

18 13 December 2012 01:11AM

This post explains von Neumann-Morgenstern (VNM) axioms for decision theory, and what follows from them: that if you have a consistent direction in which you are trying to steer the future, you must be an expected utility maximizer. I'm writing this post in preparation for a sequence on updateless anthropics, but I'm hoping that it will also be independently useful.

The theorems of decision theory say that if you follow certain axioms, then your behavior is described by a utility function. (If you don't know what that means, I'll explain below.) So you should have a utility function! Except, why should you want to follow these axioms in the first place?

A couple of years ago, Eliezer explained how violating one of them can turn you into a money pump — how, at time 11:59, you will want to pay a penny to get option B instead of option A, and then at 12:01, you will want to pay a penny to switch back. Either that, or the game will have ended and the option won't have made a difference.

When I read that post, I was suitably impressed, but not completely convinced: I would certainly not want to behave one way if behaving differently always gave better results. But couldn't you avoid the problem by violating the axiom only in situations where it doesn't give anyone an opportunity to money-pump you? I'm not saying that would be elegant, but is there a reason it would be irrational?

It took me a while, but I have since come around to the view that you really must have a utility function, and really must behave in a way that maximizes the expectation of this function, on pain of stupidity (or at least that there are strong arguments in this direction). But I don't know any source that comes close to explaining the reason, the way I see it; hence, this post.

I'll use the von Neumann-Morgenstern axioms, which assume probability theory as a foundation (unlike the Savage axioms, which actually imply that anyone following them has not only a utility function but also a probability distribution). I will assume that you already accept Bayesianism.

*

Epistemic rationality is about figuring out what's true; instrumental rationality is about steering the future where you want it to go. The way I see it, the axioms of decision theory tell you how to have a consistent direction in which you are trying to steer the future. If my choice at 12:01 depends on whether at 11:59 I had a chance to decide differently, then perhaps I won't ever be money-pumped; but if I want to save as many human lives as possible, and I must decide between different plans that have different probabilities of saving different numbers of people, then it starts to at least seem doubtful that which plan is better at 12:01 could genuinely depend on my opportunity to choose at 11:59.

So how do we formalize the notion of a coherent direction in which you can steer the future?

## Pascal's Mugging for bounded utility functions

8 06 December 2012 10:28PM

This is Pascal's Mugging: Someone comes to you and says, "Give me five dollars, and I'll use my powers from outside the matrix to grant you 4^^^^4 years of fun." And they're lying, of course, but under a Solomonoff prior, the probability that they're not, though surely very small, isn't going to be less than one in 3^^^3; and so if you shut up and multiply, it's clear that the expected utility of paying up outweighs the expected utility of anything sensible you might be doing with those five dollars, and therefore—

Well, fortunately, if you're afraid that your utility-maximizing AI will end up paying all its money to the first clever mugger to come along and ask: never to worry! It will do so only if it can't think of anything better to do with five dollars, after all. So to avoid being mugged, all it has to do is to think of a harebrained scheme for spending \$5 that has more than a one-in-4^^^4 chance of providing 5^^^^5 years of fun. Problem solved.

If, however, you would like to be there be a chance greater than one-in-hell that your AI ends up doing something actually useful, you'll need to do something else. And the simplest answer is to adopt a bounded utility function: any positive singularity gives at least 50 utils, a billion years gives 80 utils, a googol years gives 99 utils, a googolplex years gives 99.9 utils, and 4^^^^4 years of fun give 100 utils (minus epsilon).

This will, indeed, solve the problem. Probability of getting mugged: used to be one (minus epsilon, of course); has now been brought down to zero. That's right: zero.

(Plus epsilon.)

But let's suppose that the impossible happens, and the universe turns out to be able to support TREE(100) years of fun, and we've already lived out 4^^^^4 of them, and the AI has long since folded up operations and faded out of existence because humanity has become sufficiently sane that we no longer need it—

And lo, someone comes to you and says, "Alas, you're not really experiencing 4^^^^4 years of fun here; you're really a mere billion-year-old living in a very convincing simulation. Give me five dollars, and I'll use my powers from outside the matrix to extend your lifespan to a googol years."

And they're lying, of course — but it has been a long time indeed since you last faced a choice that could make a difference of nineteen whole utils...

*

If you truly have a bounded utility function, you must agree that in this situation, paying up is exactly what you'd want to do. Even though it means that you will not experience 4^^^^4 years of fun, even conditional on the universe being capable of supporting TREE(100) of them.

[ETA: To clarify, by "4^^^^4", I really mean any number so large that your utility function assigns (100 - epsilon) utils to it. It's possible to have a utility function where this is only true for infinite numbers which are so incredibly infinite that, given a particular formal language, their definition is so long and complicated that no mere human-sized mind could comprehend it. See this comment thread for discussion of bounded utility functions that assign significant weight to very large lifetimes.]

## A model of UDT with a concrete prior over logical statements

39 28 August 2012 09:45PM

I've been having difficulties with constructing a toy scenario for AI self-modification more interesting than Quirrell's game, because you really want to do expected utility maximization of some sort, but currently our best-specified decision theories search through the theorems of one particular proof system and "break down and cry" if they can't find one that tells them what their utility will be if they choose a particular option. This is fine if the problems are simple enough that we always find the theorems we need, but the AI rewrite problem is precisely about skirting that edge. It seems natural to want to choose some probability distribution over the possibilities that you can't rule out, and then do expected utility maximization (because if you don't maximize EU over some prior, it seems likely that someone could Dutch-book you); indeed, Wei Dai's original UDT has a "mathematical intuition module" black box which this would be an implementation of. But how do you assign probabilities to logical statements? What consistency conditions do you ask for? What are the "impossible possible worlds" that make up your probability space?

Recently, Wei Dai suggested that logical uncertainty might help avoid the Löbian problems with AI self-modification, and although I'm sceptical about this idea, the discussion pushed me into trying to confront the logical uncertainty problem head-on; then, reading Haim Gaifman's paper "Reasoning with limited resources and assigning probabilities to logical statements" (which Luke linked from So you want to save the world) made something click. I want to present a simple suggestion for a concrete definition of "impossible possible world", for a prior over them, and for an UDT algorithm based on that. I'm not sure whether the concrete prior is useful—the main point in giving it is to have a concrete example we can try to prove things about—but the definition of logical possible worlds looks like a promising theoretical tool to me.

## How to cheat Löb's Theorem: my second try

14 22 August 2012 06:21PM

In his open problems talk, Eliezer explains how Löb's theorem prevents you from having a consistent proof system P with an axiom schema that anything P proves is actually true, and asks how we can then "build an AI that could completely rewrite itself, without decreasing the amount of trust it had in math every time it executed that self-rewrite" (18:46).

Recently, I posted about an attempt to apply a general trick for avoiding diagonalization problems to a minimal toy version of this problem. Since then, Wei Dai has posted an interesting quining approach to the same toy problem, and Giles had a promising idea for doing something similar in a different way and will hopefully do a write-up filling in the details. Unfortunately my own "proof" turned out to be broken.

I think I've fixed the problem and made the proof more comprehensible and intuitive in the process. (To avoid confusion, note that what I'm proving is slightly different from, though related to, what I did in the previous post.) However, getting the details right seems to be far from trivial, so I would very much appreciate if people checked my new argument, and told me that it looks okay / where it goes wrong / where they get lost. Thanks in advance!

I'll be more explicit about quoting/unquoting than before, which means I'll need to introduce some notation. However, to sustain you through the schlep of preliminaries, I thought I'd start with an informal summary.

## An angle of attack on Open Problem #1

28 18 August 2012 12:08PM

There is a problem with the proof here and I have to think about whether I can fix it. Thanks to I have posted a new and hopefully correct proof attempt. Thanks again to vi21maobk9vp!

In his talk on open problems in Friendly AI, Eliezer's first question is how, given Löb's theorem, an AI can replace itself with a better expected utility maximizer that believes in as much mathematics as the original AI. I know exactly one trick for that sort of problem, so I decided to try that on a toy variant. To my surprise, it more or less just worked. Therefore:

Professor Quirrell proposes a game. You start with a score of one. Professor Quirrell moves first, by choosing a computer program and showing you its source code. You then have three options: Take your winnings; double down; or self-destruct.

If you take your winnings, the game ends, and your score is converted to Quirrell points.

If you self-destruct, the game ends, your score is lost, you'll be sent to bed without dinner, you'll lose 150 House points, Rita Skeeter will write a feature alleging that you're a Death Eater, and Professor Quirrell will publicly critique your performance. You are advised not to pick this option.

If you double down, your score doubles, and you advance to the next round. Professor Quirrell again moves first by choosing a computer program. Then, it's your turn—except that this time, you don't get to choose your move yourself: instead, it'll be chosen by Professor Quirrell's program from the previous round.

Professor Quirrell will endeavor to present an educational sequence of programs.