## The January 2013 CFAR workshop: one-year retrospective

31 18 February 2014 06:41PM

About a year ago, I attended my first CFAR workshop and wrote a post about it here. I mentioned in that post that it was too soon for me to tell if the workshop would have a large positive impact on my life. In the comments to that post, I was asked to follow up on that post in a year to better evaluate that impact. So here we are!

Very short summary: overall I think the workshop had a large and persistent positive impact on my life.

### Important caveat

However, anyone using this post to evaluate the value of going to a CFAR workshop themselves should be aware that I'm local to Berkeley and have had many opportunities to stay connected to CFAR and the rationalist community. More specifically, in addition to the January workshop, I also

• visited the March workshop (and possibly others),
• attended various social events held by members of the community,
• taught at the July workshop, and
• taught at SPARC.

These experiences were all very helpful in helping me digest and reinforce the workshop material (which was also improving over time), and a typical workshop participant might not have these advantages.

### Answering a question

pewpewlasergun wanted me to answer the following question:

I'd like to know how many techniques you were taught at the meetup you still use regularly. Also which has had the largest effect on your life.

The short answer is: in some sense very few, but a lot of the value I got out of attending the workshop didn't come from specific techniques.

In more detail: to be honest, many of the specific techniques are kind of a chore to use (at least as of January 2013). I experimented with a good number of them in the months after the workshop, and most of them haven't stuck (but that isn't so bad; the cost of trying a technique and finding that it doesn't work for you is low, while the benefit of trying a technique and finding that it does work for you can be quite high!). One that has is the idea of a next action, which I've found incredibly useful. Next actions are the things that to-do list items should be, say in the context of using Remember The Milk. Many to-do list items you might be tempted to right down are difficult to actually do because they're either too vague or too big and hence trigger ugh fields. For example, you might have an item like

• Do my taxes

that you don't get around to until right before you have to because you have an ugh field around doing your taxes. This item is both too vague and too big: instead of writing this down, write down the next physical action you need to take to make progress on this item, which might be something more like

• Find tax forms and put them on desk

which is both concrete and small. Thinking in terms of next actions has been a huge upgrade to my GTD system (as was Workflowy, which I also started using because of the workshop) and I do it constantly.

But as I mentioned, a lot of the value I got out of attending the workshop was not from specific techniques. Much of the value comes from spending time with the workshop instructors and participants, which had effects that I find hard to summarize, but I'll try to describe some of them below:

### Emotional attitudes

The workshop readjusted my emotional attitudes towards several things for the better, and at several meta levels. For example, a short conversation with a workshop alum completely readjusted my emotional attitude towards both nutrition and exercise, and I started paying more attention to what I ate and going to the gym (albeit sporadically) for the first time in my life not long afterwards. I lost about 15 pounds this way (mostly from the eating part, not the gym part, I think).

At a higher meta level, I did a fair amount of experimenting with various lifestyle changes (cold showers, not shampooing) after the workshop and overall they had the effect of readjusting my emotional attitude towards change. I find it generally easier to change my behavior than I used to because I've had a lot of practice at it lately, and am more enthusiastic about the prospect of such changes.

(Incidentally, I think emotional attitude adjustment is an underrated component of causing people to change their behavior, at least here on LW.)

### Using all of my strength

The workshop is the first place I really understood, on a gut level, that I could use my brain to think about something other than math. It sounds silly when I phrase it like that, but at some point in the past I had incorporated into my identity that I was good at math but absentminded and silly about real-world matters, and I used it as an excuse not to fully engage intellectually with anything that wasn't math, especially anything practical. One way or another the workshop helped me realize this, and I stopped thinking this way.

The result is that I constantly apply optimization power to situations I wouldn't have even tried to apply optimization power to before. For example, today I was trying to figure out why the water in my bathroom sink was draining so slowly. At first I thought it was because the strainer had become clogged with gunk, so I cleaned the strainer, but then I found out that even with the strainer removed the water was still draining slowly. In the past I might've given up here. Instead I looked around for something that would fit farther into the sink than my fingers and saw the handle of my plunger. I pumped the handle into the sink a few times and some extra gunk I hadn't known was there came out. The sink is fine now. (This might seem small to people who are more domestically talented than me, but trust me when I say I wasn't doing stuff like this before last year.)

### Reflection and repair

Thanks to the workshop, my GTD system is now robust enough to consistently enable me to reflect on and repair my life (including my GTD system). For example, I'm quicker to attempt to deal with minor medical problems I have than I used to be. I also think more often about what I'm doing and whether I could be doing something better. In this regard I pay a lot of attention in particular to what habits I'm forming, although I don't use the specific techniques in the relevant CFAR unit.

For example, at some point I had recorded in RTM that I was frustrated by the sensation of hours going by without remembering how I had spent them (usually because I was mindlessly browsing the internet). In response, I started keeping a record of what I was doing every half hour and categorizing each hour according to a combination of how productively and how intentionally I spent it (in the first iteration it was just how productively I spent it, but I found that this was making me feel too guilty about relaxing). For example:

• a half-hour intentionally spent reading a paper is marked green.
• a half-hour half-spent writing up solutions to a problem set and half-spent on Facebook is marked yellow.
• a half-hour intentionally spent playing a video game is marked with no color.
• a half-hour mindlessly browsing the internet when I had intended to do work is marked red.

The act of doing this every half hour itself helps make me more mindful about how I spend my time, but having a record of how I spend my time has also helped me notice interesting things, like how less of my time is under my direct control than I had thought (but instead is taken up by classes, commuting, eating, etc.). It's also easier for me to get into a success spiral when I see a lot of green.

### Stimulation

Being around workshop instructors and participants is consistently intellectually stimulating. I don't have a tactful way of saying what I'm about to say next, but: two effects of this are that I think more interesting thoughts than I used to and also that I'm funnier than I used to be. (I realize that these are both hard to quantify.)

### etc.

I worry that I haven't given a complete picture here, but hopefully anything I've left out will be brought up in the comments one way or another. (Edit: this totally happened! Please read Anna Salamon's comment below.)

### Takeaway for prospective workshop attendees

I'm not actually sure what you should take away from all this if your goal is to figure out whether you should attend a workshop yourself. My thoughts are roughly this: I think attending a workshop is potentially high-value and therefore that even talking to CFAR about any questions you might have is potentially high-value, in addition to being relatively low-cost. If you think there's even a small chance you could get a lot of value out of attending a workshop I recommend that you at least take that one step.

## LINK: In favor of niceness, community, and civilisation

26 24 February 2014 04:13AM

Scott, known on LessWrong as Yvain, recently wrote a post complaining about an inaccurate rape statistic.

Arthur Chu, who is notable for winning money on Jeopardy recently, argued against Scott's stance that we should be honest in arguments in a comment thread on Jeff Kaufman's Facebook profile, which can be read here.

Scott just responded here, with a number of points relevant to the topic of rationalist communities.

I am interested in what LW thinks of this.

Obviously, at some point being polite in our arguments is silly. I'd be interested in people's opinions of how dire the real world consequences have to be before it's worthwhile debating dishonestly.

## How my math skills improved dramatically

20 05 March 2014 08:27PM

When I was a freshman in high school, I was a mediocre math student: I earned a D in second semester geometry and had to repeat the course. By the time I was a senior in high school, I was one of the strongest few math students in my class of ~600 students at an academic magnet high school. I went on to earn a PhD in math. Most people wouldn't have guessed that I could have improved so much, and the shift that occurred was very surreal to me. It’s all the more striking in that the bulk of the shift occurred in a single year. I thought I’d share what strategies facilitated the change.

## The Rationality Wars

18 27 February 2014 05:08PM

Ever since Tversky and Kahneman started to gather evidence purporting to show that humans suffer from a large number of cognitive biases, other psychologists and philosophers have criticized these findings. For instance, philosopher L. J. Cohen argued in the 80's that there was something conceptually incoherent with the notion that most adults are irrational (with respect to a certain problem). By some sort of Wittgensteinian logic, he thought that the majority's way of reasoning is by definition right. (Not a high point in the history of analytic philosophy, in my view.) See chapter 8 of this book (where Gigerenzer, below, is also discussed).

Another attempt to resurrect human rationality is due to Gerd Gigerenzer and other psychologists. They have a) shown that if you tweak some of the heuristics and biases (i.e. the research program led by Tversky and Kahneman) experiments but a little - for instance by expressing probabilities in terms of frequencies - people make much fewer mistakes and b) argued, on the back of this, that the heuristics we use are in many situations good (and fast and frugal) rules of thumb (which explains why they are evolutionary adaptive). Regarding this, I don't think that Tversky and Kahneman ever doubted that the heuristics we use are quite useful in many situations. Their point was rather that there are lots of naturally occuring set-ups which fool our fast and frugal heuristics. Gigerenzer's findings are not completely uninteresting - it seems to me he does nuance the thesis of massive irrationality a bit - but his claims to the effect that these heuristics are rational in a strong sense are wildly overblown in my opnion. The Gigerenzer vs. Tversky/Kahneman debates are well discussed in this article (although I think they're too kind to Gigerenzer).

A strong argument against attempts to save human rationality is the argument from individual differences, championed by Keith Stanovich. He argues that the fact that some intelligent subjects consistently avoid to fall prey to the Wason Selection task, the conjunction fallacy, and other fallacies, indicates that there is something misguided with the notion that the answer that psychologists traditionally has seen as normatively correct is in fact misguided.

Hence I side with Tversky and Kahneman in this debate. Let me just mention one interesting and possible succesful method for disputing some supposed biases. This method is to argue that people have other kinds of evidence than the standard interpretation assumes, and that given this new interpretation of the evidence, the supposed bias in question is in fact not a bias. For instance, it has been suggested that the "false consensus effect" can be re-interpreted in this way:

The False Consensus Effect

Bias description: People tend to imagine that everyone responds the way they do. They tend to see their own behavior as typical. The tendency to exaggerate how common one’s opinions and behavior are is called the false consensus effect. For example, in one study, subjects were asked to walk around on campus for 30 minutes, wearing a sign board that said "Repent!". Those who agreed to wear the sign estimated that on average 63.5% of their fellow students would also agree, while those who disagreed estimated 23.3% on average.

Counterclaim (Dawes & Mulford, 1996): The correctness of reasoning is not estimated on the basis of whether or not one arrives at the correct result. Instead, we look at whether reach reasonable conclusions given the data they have. Suppose we ask people to estimate whether an urn contains more blue balls or red balls, after allowing them to draw one ball. If one person first draws a red ball, and another person draws a blue ball, then we should expect them to give different estimates. In the absence of other data, you should treat your own preferences as evidence for the preferences of others. Although the actual mean for people willing to carry a sign saying "Repent!" probably lies somewhere in between of the estimates given, these estimates are quite close to the one-third and two-thirds estimates that would arise from a Bayesian analysis with a uniform prior distribution of belief. A study by the authors suggested that people do actually give their own opinion roughly the right amount of weight.

(The quote is from an excellent Less Wrong article on this topic due to Kaj Sotala. See also this post by himthis by Andy McKenzie, this by Stuart Armstrong and this by lukeprog on this topic. I'm sure there are more that I've missed.)

It strikes me that the notion that people are "massively flawed" is something of an intellectual cornerstone of the Less Wrong community (e.g. note the names "Less Wrong" and "Overcoming Bias"). In the light of this it would be interesting to hear what people have to say about the rationality wars. Do you all agree that people are massively flawed?

Let me make two final notes to keep in mind when discussing these issues. Firstly, even though the heuristics and biases program is sometimes seen as pessimistic, one could turn the tables around: if they're right, we should be able to improve massively (even though Kahneman himself seems to think that that's hard to do in practice). I take it that CFAR and lots of LessWrongers who attempt to "refine their rationality" assume that this is the case. On the other hand, if Gigerenzer or Cohen are right, and we already are very rational, then it would seem that it is hard to do much better. So in a sense the latter are more pessimistic (and conservative) than the former.

Secondly, note that parts of the rationality wars seem to be merely verbal and revolve around how "rationality" is to be defined (tabooing this word is very often a good idea). The real question is not if the fast and frugal heuristics are in some sense rational, but whether there are other mental algorithms which are more reliable and effective, and whether it is plausible to assume that we could learn to use them on a large scale instead.

## What we learned about Less Wrong from Cognito Mentoring advising

17 06 March 2014 09:40PM

In late December 2013, Jonah, my collaborator at Cognito Mentoring, announced the service on LessWrong. Information about the service was also circulated in other venues with high concentrations of gifted and intellectually curious people. Since then, we're received ~70 emails asking for mentoring from learners across all ages, plus a few parents. At least 40 of our advisees heard of us through LessWrong, and the number is probably around 50. Of the 23 who responded to our advisee satisfaction survey, 16 filled in information on where they'd heard of us, and 14 of those 16 had heard of us from LessWrong. The vast majority of student advisees with whom we had substantive interactions, and the ones we felt we were able to help the most, came from LessWrong (we got some parents through the Davidson Forum post, but that's a very different sort of advising).

In this post, I discuss some common themes that emerged from our interaction with these advisees. Obviously, this isn't a comprehensive picture of the LessWrong community the way that Yvain's 2013 survey results were.

• A significant fraction of the people who contacted us via LessWrong aren't active LessWrong participants, and many don't even have user accounts on LessWrong. The prototypical advisees we got through LessWrong don't have many distinctive LessWrongian beliefs. Many of them use LessWrong primarily as a source of interesting stuff to read, rather than a community to be part of.
• About 25% of the advisees we got through LessWrong were female, and a slightly higher proportion of the advisees with whom we had substantive interaction (and subjectively feel we helped a lot) were female. You can see this by looking at the sex distribution of the public reviews of us from students.
• Our advisees included people in high school (typically, grades 11 and 12) and college. Our advisees in high school tended to be interested in mathematics, computer science, physics, engineering, and entrepreneurship. We did have a few who were interested in economics, philosophy, and the social sciences as well, but this was rarer. Our advisees in college and graduate school were also interested in the above subjects but skewed a bit more in the direction of being interested in philosophy, psychology, and economics.
• Somewhat surprisingly and endearingly, many of our advisees were interested in effective altruism and social impact. Some had already heard of the cluster of effective altruist ideas. Others were interested in generating social impact through entrepreneurship or choosing an impactful career, even though they weren't familiar with effective altruism until we pointed them to it. Of those who had heard of effective altruism as a cluster of ideas, some had either already consulted with or were planning to consult with 80,000 Hours, and were connecting with us largely to get a second opinion or to get opinion on matters other than career choice.
• Some of our advisees had had some sort of past involvement with MIRI/CFAR/FHI. Some were seriously considering working in existential risk reduction or on artificial intelligence. The two subsets overlapped considerably.
• Our advisees were somewhat better educated about rationality issues than we'd expect others of similar academic accomplishment to be, and more than the advisees we got from sources other than LessWrong. That's obviously not a surprise at all.
• We hadn't been expecting it, but many advisees asked us questions related to procrastination, social skills, and other life skills. We were initially somewhat ill-equipped to handle these, but we've built a base of recommendations, with some help from LessWrong and other sources.
• One thing that surprised me personally is that many of these people had never spent time exploring Quora. I'd have expected Quora to be much more widely known and used by the sort of people who were sufficiently aware of the Internet to know LessWrong. But it's possible there's not that much overlap.

My overall takeaway is that LessWrong seems to still be one of the foremost places that smart and curious young people interested in epistemic rationality visit. I'm not sure of the exact reason, though HPMOR probably gets a significant fraction of the credit. As long as things stay this way, LessWrong remains a great way to influence a subset of the young population today that's likely to be disproportionately represented among the decision-makers a few years down the line.

It's not clear to me why they don't participate more actively on LessWrong. Maybe no special reasons are needed: the ratio of lurkers to posters is huge for most Internet fora. Maybe the people who contacted us were relatively young and still didn't have an Internet presence, or were being careful about building one. On the other hand, maybe there is something about the comments culture that dissuades people from participating (this need not be a bad feature per se: one reason people may refrain from participating is that comments are held to a high bar and this keeps people from offering off-the-cuff comments). That said, if people could somehow participate more, LessWrong could transform itself into an interactive forum for smart and curious people that's head and shoulders above all the others.

PS: We've now made our information wiki publicly accessible. It's still in beta and a lot of content is incomplete and there are links to as-yet-uncreated pages all over the place. But we think it might still be interesting to the LessWrong audience.

## Don't rely on the system to guarantee you life satisfaction

16 18 February 2014 05:48AM

A brief essay intended for high school students: any thoughts?

If you go to school, take the classes that people tell you to, do your homework, and engage in the extracurricular activities that your peers do, you'll be setting yourself up for an "okay" life. But you can do better than that.

## Meta: social influence bias and the karma system

15 17 February 2014 01:07AM

Given LW’s keen interest in bias, it would seem pertinent to be aware of the biases engendered by the karma system. Note: I used to be strictly opposed to comment scoring mechanisms, but witnessing the general effectiveness in which LWers use karma has largely redeemed the system for me.

In “Social Influence Bias: A Randomized Experiment” by Muchnik et al, random comments on a “social news aggregation Web site” were up-voted after being posted. The likelihood of such rigged comments receiving additional up-votes were quantified in comparison to a control group. The results show that users were significantly biased towards the randomly up-voted posts:

The up-vote treatment significantly increased the probability of up-voting by the first viewer by 32% over the control group ... Uptreated comments were not down-voted significantly more or less frequently than the control group, so users did not tend to correct the upward manipulation. In the absence of a correction, positive herding accumulated over time.

At the end of their five month testing period, the comments that had artificially received an up-vote had an average rating 25% higher than the control group. Interestingly, the severity of the bias was largely dependent on the topic of discussion:

We found significant positive herding effects for comment ratings in “politics,” “culture and society,” and “business,” but no detectable herding behavior for comments in “economics,” “IT,” “fun,” and “general news”.

The herding behavior outlined in the paper seems rather intuitive to me. If before I read a post, I see a little green ‘1’ next to it, I’m probably going to read the post in a better light than if I hadn't seen that little green ‘1’ next to it. Similarly, if I see a post that has a negative score, I’ll probably see flaws in it much more readily. One might say that this is the point of the rating system, as it allows the group as a whole to evaluate the content. However, I’m still unsettled by just how easily popular opinion was swayed in the experiment.

This certainly doesn't necessitate that we reprogram the site and eschew the karma system. Moreover, understanding the biases inherent in such a system will allow us to use it much more effectively. Discussion on how this bias affects LW in particular would be welcomed. Here are some questions to begin with:

• Should we worry about this bias at all? Are its effects negligible in the scheme of things?
• How does the culture of LW contribute to this herding behavior? Is it positive or negative?
• If there are damages, how can we mitigate them?

#### Notes:

In the paper, they mentioned that comments were not sorted by popularity, therefore “mitigating the selection bias.” This of course implies that the bias would be more severe on forums where comments are sorted by popularity, such as this one.

For those interested, another enlightening paper is “Overcoming the J-shaped distribution of product reviews” by Nan Hu et al, which discusses rating biases on websites such as amazon. User gwern has also recommended a longer 2007 paper by the same authors which the one above is based upon: "Why do Online Product Reviews have a J-shaped Distribution? Overcoming Biases in Online Word-of-Mouth Communication"

## I like simplicity, but not THAT much

15 14 February 2014 07:51PM

Followup to: L-zombies! (L-zombies?)
Reply to: Coscott's Preferences without Existence; Paul Christiano's comment on my l-zombies post

In my previous post, I introduced the idea of an "l-zombie", or logical philosophical zombie: A Turing machine that would simulate a conscious human being if it were run, but that is never run in the real, physical world, so that the experiences that this human would have had, if the Turing machine were run, aren't actually consciously experienced.

One common reply to this is to deny the possibility of logical philosophical zombies just like the possibility of physical philosophical zombies: to say that every mathematically possible conscious experience is in fact consciously experienced, and that there is no kind of "magical reality fluid" that makes some of these be experienced "more" than others. In other words, we live in the Tegmark Level IV universe, except that unlike Tegmark argues in his paper, there's no objective measure on the collection of all mathematical structures, according to which some mathematical structures somehow "exist more" than others (and, although IIRC that's not part of Tegmark's argument, according to which the conscious experiences in some mathematical structures could be "experienced more" than those in other structures). All mathematically possible experiences are experienced, and to the same "degree".

So why is our world so orderly? There's a mathematically possible continuation of the world that you seem to be living in, where purple pumpkins are about to start falling from the sky. Or the light we observe coming in from outside our galaxy is suddenly replaced by white noise. Why don't you remember ever seeing anything as obviously disorderly as that?

And the answer to that, of course, is that among all the possible experiences that get experienced in this multiverse, there are orderly ones as well as non-orderly ones, so the fact that you happen to have orderly experiences isn't in conflict with the hypothesis; after all, the orderly experiences have to be experienced as well.

One might be tempted to argue that it's somehow more likely that you will observe an orderly world if everybody who has conscious experiences at all, or if at least most conscious observers, see an orderly world. (The "most observers" version of the argument assumes that there is a measure on the conscious observers, a.k.a. some kind of magical reality fluid.) But this requires the use of anthropic probabilities, and there is simply no (known) system of anthropic probabilities that gives reasonable answers in general. Fortunately, we have an alternative: Wei Dai's updateless decision theory (which was motivated in part exactly by the problem of how to act in this kind of multiverse). The basic idea is simple (though the details do contain devils): We have a prior over what the world looks like; we have some preferences about what we would like the world to look like; and we come up with a plan for what we should do in any circumstance we might find ourselves in that maximizes our expected utility, given our prior.

*

In this framework, Coscott and Paul suggest, everything adds up to normality if, instead of saying that some experiences objectively exist more, we happen to care more about some experiences than about others. (That's not a new idea, of course, or the first time this has appeared on LW -- for example, Wei Dai's What are probabilities, anyway? comes to mind.) In particular, suppose we just care more about experiences in mathematically really simple worlds -- or more precisely, places in mathematically simple worlds that are mathematically simple to describe (since there's a simple program that runs all Turing machines, and therefore all mathematically possible human experiences, always assuming that human brains are computable). Then, even though there's a version of you that's about to see purple pumpkins rain from the sky, you act in a way that's best in the world where that doesn't happen, because that world has so much lower K-complexity, and because you therefore care so much more about what happens in that world.

There's something unsettling about that, which I think deserves to be mentioned, even though I do not think it's a good counterargument to this view. This unsettling thing is that on priors, it's very unlikely that the world you experience arises from a really simple mathematical description. (This is a version of a point I also made in my previous post.) Even if the physicists had already figured out the simple Theory of Everything, which is a super-simple cellular automaton that accords really well with experiments, you don't know that this simple cellular automaton, if you ran it, would really produce you. After all, imagine that somebody intervened in Earth's history so that orchids never evolved, but otherwise left the laws of physics the same; there might still be humans, or something like humans, and they would still run experiments and find that they match the predictions of the simple cellular automaton, so they would assume that if you ran that cellular automaton, it would compute them -- except it wouldn't, it would compute us, with orchids and all. Unless, of course, it does compute them, and a special intervention is required to get the orchids.

So you don't know that you live in a simple world. But, goes the obvious reply, you care much more about what happens if you do happen to live in the simple world. On priors, it's probably not true; but it's best, according to your values, if all people like you act as if they live in the simple world (unless they're in a counterfactual mugging type of situation, where they can influence what happens in the simple world even if they're not in the simple world themselves), because if the actual people in the simple world act like that, that gives the highest utility.

You can adapt an argument that I was making in my l-zombies post to this setting: Given these preferences, it's fine for everybody to believe that they're in a simple world, because this will increase the correspondence between map and territory for the people that do live in simple worlds, and that's who you care most about.

*

I mostly agree with this reasoning. I agree that Tegmark IV without a measure seems like the most obvious and reasonable hypothesis about what the world looks like. I agree that there seems no reason for there to be a "magical reality fluid". I agree, therefore, that on the priors that I'd put into my UDT calculation for how I should act, it's much more likely that true reality is a measureless Tegmark IV than that it has some objective measure according to which some experiences are "experienced less" than others, or not experienced at all. I don't think I understand things well enough to be extremely confident in this, but my odds would certainly be in favor of it.

Moreover, I agree that if this is the case, then my preferences are to care more about the simpler worlds, making things add up to normality; I'd want to act as if purple pumpkins are not about to start falling from the sky, precisely because I care more about the consequences my actions have in more orderly worlds.

But.

*

Imagine this: Once you finish reading this article, you hear a bell ringing, and then a sonorous voice announces: "You do indeed live in a Tegmark IV multiverse without a measure. You had better deal with it." And then it turns out that it's not just you who's heard that voice: Every single human being on the planet (who didn't sleep through it, isn't deaf etc.) has heard those same words.

On the hypothesis, this is of course about to happen to you, though only in one of those worlds with high K-complexity that you don't care about very much.

So let's consider the following possible plan of action: You could act as if there is some difference between "existence" and "non-existence", or perhaps some graded degree of existence, until you hear those words and confirm that everybody else has heard them as well, or until you've experienced one similarly obviously "disorderly" event. So until that happens, you do things like invest time and energy into trying to figure out what the best way to act is if it turns out that there is some magical reality fluid, and into trying to figure out what a non-confused version of something like a measure on conscious experience could look like, and you act in ways that don't kill you if we happen to not live in a measureless Tegmark IV. But once you've had a disorderly experience, just a single one, you switch over to optimizing for the measureless mathematical multiverse.

If the degree to which you care about worlds is really proportional to their K-complexity, with respect to what you and I would consider a "simple" universal Turing machine, then this would be a silly plan; there is very little to be gained from being right in worlds that have that much higher K-complexity. But when I query my intuitions, it seems like a rather good plan:

• Yes, I care less about those disorderly worlds. But not as much less as if I valued them by their K-complexity. I seem to be willing to tap into my complex human intuitions to refer to the notion of "single obviously disorderly event", and assign the worlds with a single such event, and otherwise low K-complexity, not that much lower importance than the worlds with actual low K-complexity.
• And if I imagine that the confused-seeming notions of "really physically exists" and "actually experienced" do have some objective meaning independent of my preferences, then I care much more about the difference between "I get to 'actually experience' a tomorrow" and "I 'really physically' get hit by a car today" than I care about the difference between the world with true low K-complexity and the worlds with a single disorderly event.

In other words, I agree that on the priors I put into my UDT calculation, it's much more likely that we live in measureless Tegmark IV; but my confidence in this isn't extreme, and if we don't, then the difference between "exists" and "doesn't exist" (or "is experienced a lot" and "is experienced only infinitesimally") is very important; much more important than the difference between "simple world" and "simple world plus one disorderly event" according to my preferences if we do live in a Tegmark IV universe. If I act optimally according to the Tegmark IV hypothesis in the latter worlds, that still gives me most of the utility that acting optimally in the truly simple worlds would give me -- or, more precisely, the utility differential isn't nearly as large as if there is something else going on, and I should be doing something about it, and I'm not.

This is the reason why I'm trying to think seriously about things like l-zombies and magical reality fluid. I mean, I don't even think that these are particularly likely to be exactly right even if the measureless Tegmark IV hypothesis is wrong; I expect that there would be some new insight that makes even more sense than Tegmark IV, and makes all the confusion go away. But trying to grapple with the confused intuitions we currently have seems at least a possible way to make progress on this, if it should be the case that there is in fact progress to be made.

*

Here's one avenue of investigation that seems worthwhile to me, and wouldn't without the above argument. One thing I could imagine finding, that could make the confusion go away, would be that the intuitive notion of "all possible Turing machines" is just wrong, and leads to outright contradictions (e.g., to inconsistencies in Peano Arithmetic, or something similarly convincing). Lots of people have entertained the idea that concepts like the real numbers don't "really" exist, and only the behavior of computable functions is "real"; perhaps not even that is real, and true reality is more restricted? (You can reinterpret many results about real numbers as results about computable functions, so maybe you could reinterpret results about computable functions as results about these hypothetical weaker objects that would actually make mathematical sense.) So it wouldn't be the case after all that there is some Turing machine that computes the conscious experiences you would have if pumpkins started falling from the sky.

Does the above make sense? Probably not. But I'd say that there's a small chance that maybe yes, and that if we understood the right kind of math, it would seem very obvious that not all intuitively possible human experiences are actually mathematically possible (just as obvious as it is today, with hindsight, that there is no Turing machine which takes a program as input and outputs whether this program halts). Moreover, it seems plausible that this could have consequences for how we should act. This, together with my argument above, make me think that this sort of thing is worth investigating -- even if my priors are heavily on the side of expecting that all experiences exist to the same degree, and ordinarily this difference in probabilities would make me think that our time would be better spent on investigating other, more likely hypotheses.

*

Leaving aside the question of how I should act, though, does all of this mean that I should believe that I live in a universe with l-zombies and magical reality fluid, until such time as I hear that voice speaking to me?

I do feel tempted to try to invoke my argument from the l-zombies post that I prefer the map-territory correspondences of actually existing humans to be correct, and don't care about whether l-zombies have their map match up with the territory. But I'm not sure that I care much more about actually existing humans being correct, if the measureless mathematical multiverse hypothesis is wrong, than I care about humans in simple worlds being correct, if that hypothesis is right. So I think that the right thing to do may be to have a subjective belief that I most likely do live in the measureless Tegmark IV, as long as that's the view that seems by far the least confused -- but continue to spend resources on investigating alternatives, because on priors they don't seem sufficiently unlikely to make up for the potential great importance of getting this right.

## A few remarks about mass-downvoting

15 13 February 2014 05:06PM

To whoever has for the last several days been downvoting ~10 of my old comments per day:

It is possible that your intention is to discourage me from commenting on Less Wrong.

The actual effect is the reverse. My comments still end up positive on average, and I am therefore motivated to post more of them in order to compensate for the steady karma drain you are causing.

If you are mass-downvoting other people, the effect on some of them is probably the same.

To the LW admins, if any are reading:

Look, can we really not do anything about this behaviour? It's childish and stupid, and it makes the karma system less useful (e.g., for comment-sorting), and it gives bad actors a disproportionate influence on Less Wrong. It seems like there are lots of obvious things that would go some way towards helping, many of which have been discussed in past threads about this.

Failing that, can we at least agree that it's bad behaviour and that it would be good in principle to stop it or make it more visible and/or inconvenient?

Failing that, can we at least have an official statement from an LW administrator that mass-downvoting is not considered an undesirable behaviour here? I really hope this isn't the opinion of the LW admins, but as the topic has been discussed from time to time with never any admin response I've been thinking it increasingly likely that it is. If so, let's at least be honest about it.

To anyone else reading this:

If you should happen to notice that a sizeable fraction of my comments are at -1, this is probably why. (Though of course I may just have posted a bunch of silly things. I expect it happens from time to time.)

My apologies for cluttering up Discussion with this. (But not very many apologies; this sort of mass-downvoting seems to me to be one of the more toxic phenomena on Less Wrong, and I retain some small hope that eventually something may be done about it.)

## SUDT: A toy decision theory for updateless anthropics

14 23 February 2014 11:50PM

The best approach I know for thinking about anthropic problems is Wei Dai's Updateless Decision Theory (UDT). We aren't yet able to solve all problems that we'd like to—for example, when it comes to game theory, the only games we have any idea how to solve are very symmetric ones—but for many anthropic problems, UDT gives the obviously correct solution. However, UDT is somewhat underspecified, and cousin_it's concrete models of UDT based on formal logic are rather heavyweight if all you want is to figure out the solution to a simple anthropic problem.

In this post, I introduce a toy decision theory, Simple Updateless Decision Theory or SUDT, which is most definitely not a replacement for UDT but makes it easy to formally model and solve the kind of anthropic problems that we usually apply UDT to. (And, of course, it gives the same solutions as UDT.) I'll illustrate this with a few examples.

This post is a bit boring, because all it does is to take a bit of math that we already implicitly use all the time when we apply updateless reasoning to anthropic problems, and spells it out in excruciating detail. If you're already well-versed in that sort of thing, you're not going to learn much from this post. The reason I'm posting it anyway is that there are things I want to say about updateless anthropics, with a bit of simple math here and there, and while the math may be intuitive, the best thing I can point to in terms of details are the posts on UDT, which contain lots of irrelevant complications. So the main purpose of this post is to save people from having to reverse-engineer the simple math of SUDT from the more complex / less well-specified math of UDT.

(I'll also argue that Psy-Kosh's non-anthropic problem is a type of counterfactual mugging, I'll use the concept of l-zombies to explain why UDT's response to this problem is correct, and I'll explain why this argument still works if there aren't any l-zombies.)

*

I'll introduce SUDT by way of a first example: the counterfactual mugging. In my preferred version, Omega appears to you and tells you that it has thrown a very biased coin, which had only a 1/1000 chance of landing heads; however, in this case, the coin has in fact fallen heads, which is why Omega is talking to you. It asks you to choose between two options, (H) and (T). If you choose (H), Omega will create a Friendly AI; if you choose (T), it will destroy the world. However, there is a catch: Before throwing the coin, Omega made a prediction about which of these options you would choose if the coin came up heads (and it was able to make a highly confident prediction). If the coin had come up tails, Omega would have destroyed the world if it's predicted that you'd choose (H), and it would have created a Friendly AI if it's predicted (T). (Incidentally, if it hadn't been able to make a confident prediction, it would just have destroyed the world outright.)

 Coin falls heads (chance = 1/1000) Coin falls tails (chance = 999/1000) You choose (H) if coin falls heads Positive intelligence explosion Humanity wiped out You choose (T) if coin falls heads Humanity wiped out Positive intelligence explosion

In this example, we are considering two possible worlds:  and . We write  (no pun intended) for the set of all possible worlds; thus, in this case, . We also have a probability distribution over , which we call . In our example,  and .

In the counterfactual mugging, there is only one situation you might find yourself in in which you need to make a decision, namely when Omega tells you that the coin has fallen heads. In general, we write  for the set of all possible situations in which you might need to make a decision; the  stands for the information available to you, including both sensory input and your memories. In our case, we'll write , where  is the single situation where you need to make a decision.

For every , we write  for the set of possible actions you can take if you find yourself in situation . In our case,. A policy (or "plan") is a function  that associates to every situation  an action  to take in this situation. We write  for the set of all policies. In our case, , where  and .

Next, there is a set of outcomes, , which specify all the features of what happens in the world that make a difference to our final goals, and the outcome function , which for every possible world  and every policy  specifies the outcome  that results from executing  in the world . In our case,  (standing for FAI and DOOM), and  and .

Finally, we have a utility function . In our case,  and . (The exact numbers don't really matter, as long as , because utility functions don't change their meaning under affine transformations, i.e. when you add a constant to all utilities or multiply all utilities by a positive number.)

Thus, an SUDT decision problem consists of the following ingredients: The sets ,  and  of possible worlds, situations you need to make a decision in, and outcomes; for every , the set  of possible actions in that situation; the probability distribution ; and the outcome and utility functions  and . SUDT then says that you should choose a policy  that maximizes the expected utility , where  is the expectation with respect to , and  is the true world.

In our case,  is just the probability of the good outcome , according to the (prior) distribution . For , that probability is 1/1000; for , it is 999/1000. Thus, SUDT (like UDT) recommends choosing (T).

If you set up the problem in SUDT like that, it's kind of hidden why you could possibly think that's not the right thing to do, since we aren't distinguishing situations  that are "actually experienced" in a particular possible world ; there's nothing in the formalism that reflects the fact that Omega never asks us for our choice if the coin comes up tails. In my post on l-zombies, I've argued that this makes sense because even if there's no version of you that actually consciously experiences being in the heads world, this version still exists as a Turing machine and the choices that it makes influence what happens in the real world. If all mathematically possible experiences exist, so that there aren't any l-zombies, but some experiences are "experienced more" (have more "magical reality fluid") than others, the argument is even clearer—even if there's some anthropic sense in which, upon being told that the coin fell heads, you can conclude that you should assign a high probability of being in the heads world, the same version of you still exists in the tails world, and its choices influence what happens there. And if everything is experienced to the same degree (no magical reality fluid), the argument is clearer still.

*

From Vladimir Nesov's counterfactual mugging, let's move on to what I'd like to call Psy-Kosh's probably counterfactual mugging, better known as Psy-Kosh's non-anthropic problem. This time, you're not alone: Omega gathers you together with 999,999 other advanced rationalists, all well-versed in anthropic reasoning and SUDT. It places each of you in a separate room. Then, as before, it throws a very biased coin, which has only a 1/1000 chance of landing heads. If the coin does land heads, then Omega asks all of you to choose between two options, (H) and (T). If the coin falls tails, on the other hand, Omega chooses one of you at random and asks that person to choose between (H) and (T). If the coin lands heads and you all choose (H), Omega will create a Friendly AI; same if the coin lands tails, and the person who's asked chooses (T); else, Omega will destroy the world.

 Coin falls heads (chance = 1/1000) Coin falls tails (chance = 999/1000) Everyone chooses (H) if asked Positive intelligence explosion Humanity wiped out Everyone chooses (T) if asked Humanity wiped out Positive intelligence explosion Different people choose differently Humanity wiped out (Depends on who is asked)

We'll assume that all of you prefer a positive FOOM over a gloomy DOOM, which means that all of you have the same values as far as the outcomes of this little dilemma are concerned: , as before, and all of you have the same utility function, given by  and . As long as that's the case, we can apply SUDT to find a sensible policy for everybody to follow (though when there is more than one optimal policy, and the different people involved can't talk to each other, it may not be clear how one of the policies should be chosen).

This time, we have a million different people, who can in principle each make an independent decision about what to answer if Omega asks them the question. Thus, we have . Each of these people can choose between (H) and (T), so  for every person , and a policy  is a function that returns either (H) or (T) for every . Obviously, we're particularly interested in the policies  and  satisfying  and  for all .

The possible worlds are , and their probabilities are  and . The outcome function is as follows: ,  for ,  if , and  otherwise.

What does SUDT recommend? As in the counterfactual mugging,  is the probability of the good outcome , under policy . For , the good outcome can only happen if the coin falls heads: in other words, with probability . If , then the good outcome can not happen if the coin falls heads, because in that case everybody gets asked, and at least one person chooses (T). Thus, in this case, the good outcome will happen only if the coin comes up tails and the randomly chosen person answers (T); this probability is , where  is the number of people answering (T). Clearly, this is maximized for , where ; moreover, in this case we get the probability , which is better than for , so SUDT recommends the plan .

Again, when you set up the problem in SUDT, it's not even obvious why anyone might think this wasn't the correct answer. The reason is that if Omega asks you, and you update on the fact that you've been asked, then after updating, you are quite certain that the coin has landed heads: yes, your prior probability was only 1/1000, but if the coin has landed tails, the chances that you would be asked was only one in a million, so the posterior odds are about 1000:1 in favor of heads. So, you might reason, it would be best if everybody chose (H); and moreover, all the people in the other rooms will reason the same way as you, so if you choose (H), they will as well, and this maximizes the probability that humanity survives. This relies on the fact that the others will choose the same way as you, but since you're all good rationalists using the same decision theory, that's going to be the case.

But in the worlds where the coin comes up tails, and Omega chooses someone else than you, the version of you that gets asked for its decision still "exists"... as an l-zombie. You might think that what this version of you does or doesn't do doesn't influence what happens in the real world; but if we accept the argument from the previous paragraph that your decisions are "linked" to those of the other people in the experiment, then they're still linked if the version of you making the decision is an l-zombie: If we see you as a Turing machine making a decision, that Turing machine should reason, "If the coin came up tails and someone else was chosen, then I'm an l-zombie, but the person who is actually chosen will reason exactly the same way I'm doing now, and will come to the same decision; hence, my decision influences what happens in the real world even in this case, and I can't do an update and just ignore those possible worlds."

I call this the "probably counterfactual mugging" because in the counterfactual mugging, you are making your choice because of its benefits in a possible world that is ruled out by your observations, while in the probably counterfactual mugging, you're making it because of its benefits in a set of possible worlds that is made very improbable by your observations (because most of the worlds in this set are ruled out). As with the counterfactual mugging, this argument is just all the stronger if there are no l-zombies because all mathematically possible experiences are in fact experienced.

*

As a final example, let's look at what I'd like to call Eliezer's anthropic mugging: the anthropic problem that inspired Psy-Kosh's non-anthropic one. This time, you're alone again, except that there's many of you: Omega is creating a million copies of you. It flips its usual very biased coin, and if that coin falls heads, it places all of you in exactly identical green rooms. If the coin falls tails, it places one of you in a green room, and all the others in red rooms. It then asks all copies in green rooms to choose between (H) and (T); if your choice agrees with the coin, FOOM, else DOOM.

 Coin falls heads (chance = 1/1000) Coin falls tails (chance = 999/1000) Green roomers choose (H) Positive intelligence explosion Humanity wiped out Green roomers choose (T) Humanity wiped out Positive intelligence explosion

Our possible worlds are back to being , with probabilities  and . We are also back to being able to make a choice in only one particular situation, namely when you're a copy in a green room: . Actions are , outcomes , utilities  and , and the outcome function is given by  and . In other words, from SUDT's perspective, this is exactly identical to the situation with the counterfactual mugging, and thus the solution is the same: Once more, SUDT recommends choosing (T).

On the other hand, the reason why someone might think that (H) could be the right answer is closer to that for Psy-Kosh's probably counterfactual mugging: After waking up in a green room, what should be your posterior probability that the coin has fallen heads? Updateful anthropic reasoning says that you should be quite sure that it has fallen heads. If you plug those probabilities into an expected utility calculation, it comes out as in Psy-Kosh's case, heavily favoring (H).

But even if these are good probabilities to assign epistemically (to satisfy your curiosity about what the world probably looks like), in light of the arguments from the counterfactual and the probably counterfactual muggings (where updating definitely is the right thing to do epistemically, but plugging these probabilities into the expected utility calculation gives the wrong result), it doesn't seem strange to me to come to the conclusion that choosing (T) is correct in Eliezer's anthropic mugging as well.

## Finance as a career option

14 09 February 2014 08:47PM

As a part of our research for Cognito Mentoring, Vipul Naik and I compiled a draft of a page on finance as a career option. Because some Less Wrongers are planning on earning to give and finance is a commonly considered career option for those who are earning to give, I thought that it might be of interest to the Less Wrong community.  See also 80,000 Hours' blog posts on finance as a career

## 17 Rules to Make a Definition that Avoids the 37 Ways of Words Being Wrong

13 22 February 2014 05:16AM

Eliezer's writing style of A->B, then A, then B, though generally clear, results in a large amount of redundancy.

In this post, I have attempted to reduce the number of rules needed to remember by half. The numbers are the rules from the original post.

So, without further ado, a good definition for a word:

1. can be shown to be wrong37 and is not the final13 authority18 19
2. has strong justifications33 for the word's existence32 and its particular definition,20 which leave no room for an argument17 22
3. agrees with conventional usage4
4. explains what context the word depends on36
5. limits its scope to avoid overlap with other meanings25
6. does not assume that definitions are the best way of giving words semantics12
7. directs a complex mental paintbrush35 to paint detailed pictures of the thing you're trying to think about23
8. is a brain inference aid13 that refers to and instructs one on how to find a specific/unique24 similarity cluster21 that is apparent from empirical experience28 29 30, the cluster's size being inversely proportional to the word's length31
9. is not a binary category9 11 and cannot be used for deductive inference27
10. requires observing only14 a few3 real-world1 properties that can be easily5 verified2 and are less abstract6 than the word being defined (in particular, the definition cannot be circular16)
11. is not just a list of random properties10 21
12. contains no negated properties10 33
13. specifies exhaustively all of the correct connotations of the word25 26
14. makes the properties of a random object satisfying the definition be nearly independent34
15. has examples6 which satisfy the definition, including the original example(s) that motivated the definition being given15 and typical/conventional examples7
16. tells you which examples are more typical or less typical9
17. captures enough characteristics of the examples to identify non-members8

And there you go. 17 rules, follow them all and you can't use words wrongly.

## The innovation tree, overshadowed in the innovation forest

12 25 February 2014 02:11PM

Cross-posted at Practical Ethics.

Many have pronounced that the era of innovation dead, peace be to its soul. From Tyler Cowen's decree that we've picked all the low hanging fruit of innovation, through Robert Gordon's idea that further innovation growth is threatened by "six headwinds", to Gary Karparov's and Peter Thiel's theory that risk aversion has stifled innovation, there is no lack of predictions about the end of discovery.

I don't propose to address the issue with something as practical and useful as actual data. Instead, staying true to my philosophical environment, I propose a thought experiment that hopefully may shed some light. The core idea is that we might be underestimating the impact of innovation because we have so much of it.

Imagine that technological innovation had for some reason stopped around the 1945 - with one exception: the CD and CD player/burner. Fast forwards a few decades, and visualise society. We can imagine a society completely dominated by the CD. We'd have all the usual uses for the CD - music, songs and similar - of course, but also much more.

continue reading »

## How to teach to magical thinkers?

12 24 February 2014 01:43PM

I'm afraid I haven't properly designed the Muggles Studies course I introduced at my local Harry Potter fan club. Last Sunday we finally had our second class (after wasted months of insistence and delays), and I introduced some very basic descriptions of common biases, while of course emphasizing the need to detect them in ourselves before trying to detect them in other people. At some point, which I didn't completely notice, the discussion changed from an explanation of the attribution bias into a series of multicultural examples in favor of moral relativity. I honestly don't know how that happened, but as more and more attendants voiced their comments, I started to fear someone would irreversibly damage the lessons I was trying to teach. They basically stopped short of calling the scientific method a cultural construct, at which point I'm sure I would have snapped. I don't know what to make of this. Some part of me tries to encourage me and make me put more effort into showing these people the need for more reductionism in their worldview, but another part of me just wants to give them up as hopeless postmodernists. What should I do?

## Intelligence Metrics with Naturalized Induction using UDT

12 21 February 2014 12:23PM

Followup to: Intelligence Metrics and Decision Theory
Related to: Bridge Collapse: Reductionism as Engineering Problem

A central problem in AGI is giving a formal definition of intelligence. Marcus Hutter has proposed AIXI as a model of perfectly intelligent agent. Legg and Hutter have defined a quantitative measure of intelligence applicable to any suitable formalized agent such that AIXI is the agent with maximal intelligence according to this measure.

Legg-Hutter intelligence suffers from a number of problems I have previously discussed, the most important being:

• The formalism is inherently Cartesian. Solving this problem is known as naturalized induction and it is discussed in detail here.
• The utility function Legg & Hutter use is a formalization of reinforcement learning, while we would like to consider agents with arbitrary preferences. Moreover, a real AGI designed with reinforcement learning would tend to wrestle control of the reinforcement signal from the operators (there must be a classic reference on this but I can't find it. Help?). It is straightword to tweak to formalism to allow for any utility function which depends on the agent's sensations and actions, however we would like to be able to use any ontology for defining it.
Orseau and Ring proposed a non-Cartesian intelligence metric however their formalism appears to be too general, in particular there is no Solomonoff induction or any analogue thereof, instead a completely general probability measure is used.

My attempt at defining a non-Cartesian intelligence metric ran into problems of decision-theoretic flavor. The way I tried to used UDT seems unsatisfactory, and later I tried a different approach related to metatickle EDT.

In this post, I claim to accomplish the following:
• Define a formalism for logical uncertainty. When I started writing this I thought this formalism might be novel but now I see it is essentially the same as that of Benja.
• Use this formalism to define a non-constructive formalization of UDT. By "non-constructive" I mean something that assigns values to actions rather than a specific algorithm like here.
• Apply the formalization of UDT to my quasi-Solomonoff framework to yield an intelligence metric.
• Slightly modify my original definition of the quasi-Solomonoff measure so that the confidence of the innate model becomes a continuous rather than discrete parameter. This leads to an interesting conjecture.
• Propose a "preference agnostic" variant as an alternative to Legg & Hutter's reinforcement learning.
• Discuss certain anthropic and decision-theoretic aspects.

# Logical Uncertainty

The formalism introduced here was originally proposed by Benja.

Fix a formal system F. We want to be able to assign probabilities to statements s in F, taking into account limited computing resources. Fix D a natural number related to the amount of computing resources that I call "depth of analysis".

Define P0(s) := 1/2 for all s to be our initial prior, i.e. each statement's truth value is decided by a fair coin toss. Now define
PD(s) := P0(s | there are no contradictions of length <= D).

Consider X to be a number in [0, 1] given by a definition in F. Then dk(X) := "The k-th digit of the binary expansion of X is 1" is a statement in F. We define ED(X) := Σk 2-k PD(dk(X)).

## Remarks

• Clearly if s is provable in F then for D >> 0, PD(s) = 1. Similarly if "not s" is provable in F then for D >> 0,
PD(s) = 0.
• If each digit of X is decidable in F then lim-> inf ED(X) exists and equals the value of X according to F.
• For s of length > D, PD(s) = 1/2 since no contradiction of length <= D can involve s.
• It is an interesting question whether lim-> inf PD(s) exists for any s. It seems false that this limit always exists and equals 0 or 1, i.e. this formalism is not a loophole in Goedel incompleteness. To see this consider statements that require a high (arithmetical hierarchy) order halting oracle to decide.
• In computational terms, D corresponds to non-deterministic spatial complexity. It is spatial since we assign truth values simultaneously to all statements so in any given contradiction it is enough to retain the "thickest" step. It is non-deterministic since it's enough for a contradiction to exists, we don't have an actual computation which produces it. I suspect this can be made more formal using the Curry-Howard isomorphism, unfortunately I don't understand the latter yet.

# Non-Constructive UDT

Consider A a decision algorithm for optimizing utility U, producing an output ("decision") which is an element of C. Here U is just a constant defined in F. We define the U-value of c in C for A at depth of analysis D to be
VD(c, A; U) := ED(U | "A produces c" is true). It is only well defined as long as "A doesn't produce c" cannot be proved at depth of analysis D i.e. PD("A produces c") > 0. We define the absolute U-value of c for A to be
V(cAU) := ED(c, A)(U | "A produces c" is true) where D(c, A) := max {D | PD("A produces c") > 0}. Of course D(cA) can be infinite in which case Einf(...) is understood to mean limD -> inf ED(...).

For example V(cAU) yields the natural values for A an ambient control algorithm applied to e.g. a simple model of Newcomb's problem.  To see this note that given A's output the value of U can be determined at low depths of analysis whereas the output of A requires a very high depth of analysis to determine.

# Naturalized Induction

Our starting point is the "innate model" N: a certain a priori model of the universe including the agent G. This model encodes the universe as a sequence of natural numbers Y = (yk) which obeys either specific deterministic or non-deterministic dynamics or at least some constraints on the possible histories. It may or may not include information on the initial conditions. For example, N can describe the universe as a universal Turing machine M (representing G) with special "sensory" registers e. N constraints the dynamics to be compatible with the rules of the Turing machine but leaves unspecified the behavior of e. Alternatively, N can contain in addition to M a non-trivial model of the environment. Or N can be a cellular automaton with the agent corresponding to a certain collection of cells.

However, G's confidence in N is limited: otherwise it wouldn't need induction. We cannot start with 0 confidence: it's impossible to program a machine if you don't have even a guess of how it works. Instead we introduce a positive real number t which represents the timescale over which N is expected to hold. We then assign to each hypothesis H about Y (you can think about them as programs which compute yk given yj for j < k; more on that later) the weight QS(H) := 2-L(H(1 - e-t(H)/t). Here L(H) is the length of H's encoding in bits and t(H) is the time during which H remains compatible with N. This is defined for N of deterministic / constraint type but can be generalized to stochastic N

The weights QS(H) define a probability measure on the space of hypotheses which induces a probability measure on the space of histories Y. Thus we get an alternative to Solomonoff induction which allows for G to be a mechanistic part of the universe, at the price of introducing N and t

## Remarks

• Note that time is discrete in this formalism but t is continuous.
• Since we're later going to use logical uncertainties wrt the formal system F, it is tempting to construct the hypothesis space out of predicates in F rather than programs.

# Intelligence Metric

To assign intelligence to agents we need to add two ingredients:

• The decoding Q: {Y} -> {bit-string} of the agent G from the universe Y. For example Q can read off the program loaded into M at time k=0.
• A utility function U: {Y} -> [0, 1] representing G's preferences. U has to be given by a definition in F. Note that N provides the ontology wrt which U is defined.
It seems tempting to define the intelligence to be EQS(U | Q), the conditional expectation value of U for a given value of Q in the quasi-Solomonoff measure. However, this is wrong for roughly the same reasons EDT is wrong (see previous post for details).

Instead, we define I(Q0) := EQS(Emax(U(Y(H)) | "Q(Y(H)) = Q0" is true)). Here the subscript max stands for maximal depth of analysis, as in the construction of absolute UDT value above.

## Remarks

• IMO the correct way to look at this is intelligence metric = value of decision for the decision problem "what should I program into my robot?". If N is a highly detailed model including "me" (the programmer of the AI), this literally becomes the case. However for theoretical analysis it is likely to be more convenient to work with simple N (also conceptually it leaves room for a "purist" notion of agent's intelligence, decoupled from the fine details of its creator).
• As opposed to usual UDT, the algorithm (H) making the decision (Q) is not known with certainty. I think this represents a real uncertainty that has to be taken into account in decision problems in general: the decision-maker doesn't know her own algorithm. Since this "introspective uncertainty" is highly correlated with "indexical" uncertainty (uncertainty about the universe), it prevents us from absorbing the later into the utility function as proposed by Coscott
• For high values of t, G can improve its understanding of the universe by bootstrapping the knowledge it already has. This is not possible for low values of t. In other words, if I cannot trust my mind at all, I cannot deduce anything. This leads me to an interesting conjecture: There is a a critical value t* of t from which this bootstrapping becomes possible (the positive feedback look of knowledge becomes critical). I(Q) is non-smooth at t* (phase transition).
• If we wish to understand intelligence, it might be beneficial to decouple it from the choice of preferences. To achieve this we can introduce the preference formula as an unknown parameter in N. For example, if G is realized by a machine M, we can connect M to a data storage E whose content is left undetermined by N. We can then define U to be defined by the formula encoded in E at time k=0. This leads to I(Q) being a sort of "general-purpose" intelligence while avoiding the problems associated with reinforcement learning.
• As opposed to Legg-Hutter intelligence, there appears to be no simple explicit description for Q* maximizing I(Q) (e.g. among all programs of given length). This is not surprising, since computational cost considerations come into play. In this framework it appears to be inherently impossible to decouple the computational cost considerations: G's computations have to be realized mechanistically and therefore cannot be free of time cost and side-effects.
• Ceteris paribus, Q* deals efficiently with problems like counterfactual mugging. The "ceteris paribus" conditional is necessary here since because of cost and side-effects of computations it is difficult to make absolute claims. However, it doesn't deal efficiently with counterfactual mugging in which G doesn't exist in the "other universe". This is because the ontology used for defining U (which is given by N) assumes G does exist. At least this is the case for simple ontologies like described above: possibly we can construct N in which G might or might not exist. Also, if G uses a quantum ontology (i.e. N describes the universe in terms of a wavefunction and U computes the quantum expectation value of an operator) then it does take into account other Everett universes in which G doesn't exist.
For many choices of N (for example if the G is realized by a machine M), QS-induction assigns well-defined probabilities to subjective expectations, contrary to what is expected from UDT. However:
• This is not the case for all N. In particular, if N admits destruction of M then M's sensations after the point of destruction are not well-defined. Indeed, we better allow for destruction of M if we want G's preferences to behave properly in such an event. That is, if we don't allow it we get a "weak anvil problem" in the sense that G experiences an ontological crisis when discovering its own mortality and the outcome of this crisis is not obvious. Note though that it is not the same as the original ("strong") anvil problem, for example G might come to the conclusion the dynamics of "M's ghost" will be some sort of random.
• These probabilities probably depend significantly on N and don't amount to an elegant universal law for solving the anthropic trilemma.
• Indeed this framework is not completely "updateless", it is "partially updated" by the introduction of N and t. This suggests we might want the updates to be minimal in some sense, in particular t should be t*.
• The framework suggests there is no conceptual problem with cosmologies in which Boltzmann brains are abundant. Q* wouldn't think it is a Boltzmann brain since the long address of Boltzmann brains within the universe makes the respective hypotheses complex thus suppressing them, even disregarding the suppression associated with N. I doubt this argument is original but I feel the framework validates it to some extent.

This week's open thread is less than a day old, and has already accumulated more comments than the 15 latest non-open-thread posts combined.  I fear the thread will wither and die before Friday.

Going from monthly to weekly open threads was a big hit.  Should we ratchet up open thread frequency even more?  Should we add more outlets for comments, or will comments inevitably expand to fill the available room?

Proposal for discussion: We follow a regimen of weekly blather threads for the next two weeks, then reassess.

• Stupid questions (Monday) - Admit your ignorance
• Advice (Tuesday) - Seek the wisdom of the crowd
• Open Thread (Wednesday) - Catch-all prattle
• Links (Friday) - Quality readings.  Meme postings punishable by death

#### Addendum

I posted a Links on Friday and a Stupid Questions on Monday.  In the 5 post-days that they've been up, they've accumulated 32 comments.  Based on these numbers, it seems unlikely that these topical open threads will relieve pressure on the main Open Thread and so I've stopped the experiment.

## Amanda Knox Redux: is Satoshi Nakamoto the real Satoshi Nakamoto?

11 06 March 2014 11:33PM

Many of you here have likely heard of Bitcoin, and maybe know something about it.

Earlier today, a story broke that a reporter had apparently tracked down the real Satoshi Nakamoto, infamous creator of the Bitcoin protocol.

This seems like an excellent opportunity to practice our Bayesian updating!

So, how likely do you think it is that this man is the founder of Bitcoin? What do you believe and why?

## Managing your time spent learning

11 19 February 2014 06:48PM

This article is written for people who are looking for advice on prioritizing activities, in particular, what to spend time learning.

In thinking about how to budget your time, it's helpful to explicitly prioritize the activities that you engage in in terms of their relative importance, and distinguish between what's important and what you find interesting. Sometimes we exaggerate the usefulness of interesting but only slightly useful activities in their minds, on account of wanting to believe that time spent on them is productive. If you think about how useful an activity is and, how interesting the activity is separately, you're less likely to do this. It's helpful to consider the following four categories of activities:

• Important and interesting: Do, and take your time. Get it right!
• Important and not interesting: Do as much as necessary, and maybe a bit more; look into ways of overcoming procrastination. Also consider ways to make them more interesting.
• Not important and interesting: Do only if you feel like it, don't try to press yourself, and consider substituting with activities that are interesting and important.
• Not important and not interesting: Avoid.

More below

## Proposal: LW courses

10 05 March 2014 04:18PM

For a long time I have tried to study things on my own, at my own pace. But it was always an uphill struggle against strong akrasia issues, and eventually I came to the conclusion that the only thing that really seems to work is to have externally-imposed deadlines. The only way I could think of to do this was to sign up for classes, so I enrolled in a number of MOOCs. So far this has worked wonders - I went from basically spending most of my time playing around and wasting time, to several recent days where I studied for several hours straight.

The only thing I don't like about this setup is that there's a very limited number of really good MOOCs out there on the subjects I want to study. Also, most MOOCs are geared for a wider audience and are therefore dumbed-down to a certain degree.

So I had the following idea: A lot of us on LW seem to be studying a lot of the same material, whether it's the sequences, MIRI course list, CFAR booklist, or any of the various recommended reading lists. What if those who were studying the same thing would get together and set a schedule for themselves to finish the reading material, complete with deadlines? This might not be a normal "externally imposed" deadline, but at least it's a deadline with some social pressure to back it up. I can't be the only one on LW who could benefit from a deadline.

The details would need to be worked out, but here's a preliminary version of the way I envision it:

• There should be a monthly thread for requests for new classes. The request should specify the text to be used, or it could ask for suggestions for a good text. The request should also specify the approximate pace (very slow - slow - normal - fast - very fast), or an approximate weekly time commitment.
• The next thing that would be needed for each proposed class would be for someone who's already gone through that text to propose a rough calendar for the course. For example, they could say that given the requested pace / time commitment, you should expect to spend about 3 months on that particular text. Also, some chapters are harder than others, so the calendar should specify, for example, that you should expect to spend just one week on Chapters 1-3, but Chapter 7 will need to spread over three weeks. It would also be very useful to specify what prerequisites are needed for that text. (Similar to this thread. Keep in mind that different people have different styles when it comes to prerequisites. Some prefer to do as few prerequisites as possible and then skip straight to the harder stuff, and work backwards / fill in gaps as necessary. Others prefer to carefully cover all lower-level material before even touching the harder stuff. These people will want to know about all possible prerequisites so that they won't have to work backwards at all.)
• I would recommend creating a repository of available course calendars (i.e., course X should be split up this way, course Y should be split up that way, etc.). This can be done by creating a special thread for this purpose and then linking to that thread every time a new "proposed course" thread starts.
• A calendar provides some deadlines, but there needs to be some motivation for keeping to the deadlines. I can think of a few possibilities that might work:
• Social pressure: If there is anybody else in your class other than yourself, there's a certain amount of social pressure to keep up with the group and keep to the agreed-upon deadlines. Classmates can increase this pressure by actively encouraging each other to keep up.
• Social encouragement: As you make each deadline you should report that you did so, and others can then respond with encouragement.
• Karma: If someone makes a deadline they should post an announcement to that effect, and LWers (even those not part of your class, and even those who aren't taking any classes) could be encouraged to upvote the announcement. I haven't been on LW long enough to tell if this is a socially acceptable use of karma points, but this might be motivating for some people.
• Perhaps someone could design a "LW U" badge or something of the sort to post on your personal / social site when you complete a course. (Notice that with the karma or badge reward forms, it becomes possible to have only a single member in a course and they'll still be able to get some form of reward structure. It might not be as effective as having other people in the course, but at least it works.)
• There should be a dedicated thread for each course once it begins. The thread would be used for everything relating to the course: announcing progress, discussing subject-related material, meta-discussions about the course, etc.
• LWers who have already completed the subject / textbook could follow the course discussions and provide guidance and help as needed. Anyone who thinks they can contribute in this "teacher" capacity should let course participants know about it beforehand, as this will provide additional social pressure / support, and provide valuable encouragement (there's someone I can ask my stupid questions to!).
• I'd recommend that once one or more people decide to take a course, they should set a date to start the course that's at least two weeks (maybe a month) in the future. This would give time for others to join. Each month's thread for proposed courses could then include a list of "courses starting soon".
• Perhaps people who have already studied a given text could put together a few quizzes / tests / finals for that text. The quizzes would be sent to individual students at a certain point in the course via private message. Each student would take the quiz on their own (honor system, of course), and the quizzes would then be graded either by the creator of the quiz (the "teacher"), a volunteer TA (using an answer key provided by the teacher), the other students, or even by each student themselves. (I would not recommend this last unless there are no other options, since even very honest people can be sorely tempted to fudge things occasionally in their own favor.) There could even be a final grade for the course. I suspect that this system would create powerful psychological motivation for certain people to work hard at the coursework and complete their work on time.

What do you think about such an idea?

## Proportional Giving

10 02 March 2014 09:09PM

Executive summary: The practice of giving a fixed fraction of one's income to charity is near-universal but possibly indefensible. I describe one approach that certainly doesn't defend it, speculate vaguely about a possible way of fixing it up, and invite better ideas from others.

Many of us give a certain fraction of our income to charitable causes. This sort of practice has a long history:

Deuteronomy 14:22 Thou shalt truly tithe all the increase of thy seed, that the field bringeth forth year by year.

(note that "tithe" here means "give one-tenth of") and is widely practised today:

GWWC Pledge: I recognise that I can use part of my income to do a significant amount of good in the developing world. Since I can live well enough on a smaller income, I pledge that from today until the day I retire, I shall give at least ten percent of what I earn to whichever organizations can most effectively use it to help people in developing countries. I make this pledge freely, openly, and without regret.

And of course it's roughly how typical taxation systems (which are kinda-sorta like charitable donation, if you squint) operate. But does it make sense? Is there some underlying principle from which a policy of giving away a certain fraction of one's income (not necessarily the traditional 10%, of course) follows?

The most obvious candidate for such a principle would be what we might call

Weighted Utilitarianism: Act so as to maximize a weighted sum of utility, where (e.g.) one's own utility may be weighted much higher than that of random far-away people.

But this can't produce anything remotely like a policy of proportional giving. Assuming you aren't giving away many millions per year (which is a fair assumption if you're thinking in terms of a fraction of your salary) then the level of utility-per-unit-money achievable by your giving is basically independent of what you give, and so is the weight you attach to the utility of the beneficiaries.

So suppose that when your income, after taking out donations, is $X, your utility (all else equal) is u(X), so that your utility per marginal dollar is u'(X); and suppose you attach weight 1 to your own utility and weight w to that of the people who'd benefit from your donations; and suppose their gain in utility per marginal dollar given is t. Then when your income is S you will set your giving g so that u'(S-g) = wt. What this says is that a weighted-utilitarian should keep a fixed absolute amount S-g of his or her income, and give all the rest away. The fixed absolute amount will depend on the weight w (hence, on exactly which people are benefited by the donations) and on the utility per dollar given t (hence, on exactly what charities are serving them and how severe their need is), but not on the person's pre-donation income S. (Here's a quick oversimplified example. Suppose that utility is proportional to log(income), that the people your donations will help have an income equivalent to$1k/year, that you care 100x more about your utility than about theirs, and that your donations are the equivalent of direct cash transfers to those people. Then u' = 1/income, so you should keep everything up to $100k/year and give the rest away. The generalization to other weighting factors and beneficiary incomes should be obvious.) This argument seems reasonably watertight given its premises, but proportional giving is so well-established a phenomenon that we might reasonably trust our predisposition in its favour more than our arguments against. Can we salvage it somehow? Here's one possibility. One effect of income is (supposedly) to incentivize work, and maybe (mumble near mode mumble) this effect is governed entirely by anticipated personal utility and not by any benefit conferred on others. Then the policy derived above, which above the threshold makes personal utility independent of effort, would lead to minimum effort and hence maybe less net weighted utility than could be attained with a different policy. Does this lead to anything like proportional giving, at least for some semi-plausible assumptions about the relationship between effort and income? At the moment, I don't know. I have a page full of scribbled attempts to derive something of the kind, but they didn't work out. And of course there might be some better way to get proportional giving out of plausible ethical principles. Anyone want to do better? ## Rational Evangelism 10 26 February 2014 06:00AM Not "rationality evangelism", which CFAR is doing already if I understand their mission. "Rational evangelism", which is what CFAR would do if they were Catholic missionaries. If you believe in Hell, as many people very truly do, it is hard for Hell not to seem like the world's most important problem. To some extent, proselytizing religions treat Hell with respect--they spend billions of dollars trying to save sinners, and the most devout often spend their lives preaching the Gospel (insert non-Christian variant). But is Hell given enough respect? Every group meets with mixed success in solving its problems, but the problem of eternal suffering leaves little room for "mixed success". Even the most powerful religions are stuck in patterns that make the work of salvation very difficult indeed. And some seem willing to reduce their evangelism* for reasons that aren't especially convincing in the face of "nonbelievers are quite possibly going to burn, or at least be outside the presence of God, forever". What if you were a rationalist who viewed Hell like certain Less Wrongers view the Singularity? (This belief would be hard to reconcile with rationalism generally, but for the sake of argument...) How would you tackle the problem of eternal suffering with the same passion we spend on probability theory and friendly AI? I wrote a long thought experiment to better define the problem, involving a religion called "Normomism", but it was awkward. There are plenty of real religions whose members believe in Hell, or at least in a Heaven that many people aren't going to (also a terrible loss). Some have a stated mission of saving as many people as possible from a bad afterlife. So where are they falling short? If you were the Pope, or the Caliph, or the supreme dictator of some smaller religion, what tactics would you use to convince more people to do and believe exactly the things that would save them--whether that's faith or good works? Why haven't these tactics been tried already? Is there really much room for improvement? Spreading the Word This post isn't a dig at believers, though it does seem like many people don't act on their sincere belief in an eternal afterlife. (I don't mind when people try to convert me--at least they care!) My main point: It's worth considering that people who believe in Very Bad Future Outcomes have been working to prevent those outcomes for thousands of years, and have stumbled upon formidable techniques for doing so. I've thought for a while about rational evangelism, and it's surprisingly hard to come up with ways that people like Rick Warren and Jerry Lovett could improve their methodology. (Read Lovett's "contact me" paragraph for the part that really impressed me.) We speak often of borrowing from religion, but these conversations mostly touch on social bonding, rather than what it means to spread ideas so important that the fate of the human race depends on them. ("Raising the Sanity Waterline" is a great start, but those ideas haven't been the focus of many recent posts.) I'm not saying this is a perfect comparison. The rationalist war for the future won't be fought one soul at a time, and we won't save anyone with a deathbed confession. But cryogenic freezing does exist. And on a more collective level, convincing the right people that the far future matters could be a coup on the level of Constantine's conversion. CFAR is doing good things in the direction of rationality evangelism. How can the rest of us do more? Living Like We Mean It This movement is going places. But I fear we may spend too much time (at least proportionally) arguing amongst ourselves, when bringing others into the fold is a key piece of the puzzle. And if we’d like to expand the flock (or, more appropriately, the herd of cats), what can we learn from history’s most persuasive organizations? I often pass up my chance to talk to people about something as simple as Givewell, let alone existential risk, and it's been a long time since I last name-dropped a Less Wrong technique. I don't think I'm alone in this.** I've met plenty of Christians who exude the same optimism and conviviality as a Rick Warren or a Ned Flanders. These kinds of people are a major boon for the Christian religion. Even if most of us are introverts, what's stopping us from teaching ourselves to live the same way? Still, I'm new here, and I could be wrong. What do you think? * Text editor's giving me some trouble, but the link is here: http://www.relevantmagazine.com/god/practical-faith/evangelism-interfaith-world ** Peter Boghossian's Manual for Creating Atheists has lots to say about using rationality techniques in the course of daily life, and is well worth reading, though the author can be an asshole sometimes. ## How to Study Unsafe AGI's safely (and why we might have no choice) 9 07 March 2014 07:24AM TL;DR A serious possibility is that the first AGI(s) will be developed in a Manhattan Project style setting before any sort of friendliness/safety constraints can be integrated reliably. They will also be substantially short of the intelligence required to exponentially self-improve. Within a certain range of development and intelligence, containment protocols can make them safe to interact with. This means they can be studied experimentally, and the architecture(s) used to create them better understood, furthering the goal of safely using AI in less constrained settings. Setting the Scene The year is 2040, and in the last decade a series of breakthroughs in neuroscience, cognitive science, machine learning, and computer hardware have put the long-held dream of a human-level artificial intelligence in our grasp. The wild commercial success of lifelike robotic pets, the integration into everyday work and leisure of AI assistants and concierges, and STUDYBOT's graduation from Harvard's Online degree program with an octuple major and full honors, DARPA, the NSF and the European Research Council have announced joint funding of an artificial intelligence program that will create a superhuman intelligence in 3 years. Safety was announced as a critical element of the project, especially in light of the self-modifying LeakrVirus that catastrophically disrupted markets in 36 and 37. The planned protocols have not been made public, but it seems they will be centered in traditional computer security rather than techniques from the nascent field of Provably Safe AI, which were deemed impossible to integrate on the current project timeline. Technological and/or Political issues could force the development of AI without theoretical safety guarantees that we'd certainly like, but there is a silver lining A lot of the discussion around LessWrong and MIRI that I've seen (and I haven't seen all of it, please send links!) seems to focus very strongly on the situation of an AI that can self-modify or construct further AIs, resulting in an exponential explosion of intelligence (FOOM/Singularity). The focus on FAI is on finding an architecture that can be explicitly constrained (and a constraint set that won't fail to do what we desire). My argument is essentially that there could be a critical multi-year period preceding any possible exponentially self-improving intelligence during which a series of AGIs of varying intelligence, flexibility and architecture will be built. This period will be fast and frantic, but it will be incredibly fruitful and vital both in figuring out how to make an AI sufficiently strong to exponentially self-improve and in how to make it safe and friendly (or develop protocols to bridge the even riskier period between when we can develop FOOM-capable AIs and when we can ensure their safety). I'll break this post into three parts. 1. why is a substantial period of proto-singularity more likely than a straight-to-singularity situation? 2. Second, what strategies will be critical to developing, controlling, and learning from these pre-FOOM AIs? 3. Third, what are the political challenge that will develop immediately before and during this period? Why is a proto-singularity likely? The requirement for a hard singularity, an exponentially self-improving AI, is that the AI can substantially improve itself in a way that enhances its ability to further improve itself, which requires the ability to modify its own code; access to resources like time, data, and hardware to facilitate these modifications; and the intelligence to execute a fruitful self-modification strategy. The first two conditions can (and should) be directly restricted. I'll elaborate more on that later, but basically any AI should be very carefully sandboxed (unable to affect its software environment), and should have access to resources strictly controlled. Perhaps no data goes in without human approval or while the AI is running. Perhaps nothing comes out either. Even a hyperpersuasive hyperintelligence will be slowed down (at least) if it can only interact with prespecified tests (how do you test AGI? No idea but it shouldn't be harder than friendliness). This isn't a perfect situation. Eliezer Yudkowsky presents several arguments for why an intelligence explosion could happen even when resources are constrained, (see Section 3 of Intelligence Explosion Microeconomics) not to mention ways that those constraints could be defied even if engineered perfectly (by the way, I would happily run the AI box experiment with anybody, I think it is absurd that anyone would fail it! [I've read Tuxedage's accounts, and I think I actually do understand how a gatekeeper could fail, but I also believe I understand how one could be trained to succeed even against a much stronger foe than any person who has played the part of the AI]). But the third emerges from the way technology typically develops. I believe it is incredibly unlikely that an AGI will develop in somebody's basement, or even in a small national lab or top corporate lab. When there is no clear notion of what a technology will look like, it is usually not developed. Positive, productive accidents are somewhat rare in science, but they are remarkably rare in engineering (please, give counterexamples!). The creation of an AGI will likely not happen by accident; there will be a well-funded, concrete research and development plan that leads up to it. An AI Manhattan Project described above. But even when there is a good plan successfully executed, prototypes are slow, fragile, and poor-quality compared to what is possible even with approaches using the same underlying technology. It seems very likely to me that the first AGI will be a Chicago Pile, not a Trinity; recognizably a breakthrough but with proper consideration not immediately dangerous or unmanageable. [Note, you don't have to believe this to read the rest of this. If you disagree, consider the virtues of redundancy and the question of what safety an AI development effort should implement if they can't be persuaded to delay long enough for theoretically sound methods to become available]. A Manhattan Project style effort makes a relatively weak, controllable AI even more likely, because not only can such a project implement substantial safety protocols that are explicitly researched in parallel with primary development, but also because the total resources, in hardware and brainpower, devoted to the AI will be much greater than a smaller project, and therefore setting a correspondingly higher bar for the AGI thus created to reach to be able to successfully self-modify itself exponentially and also break the security procedures. Strategies to handle AIs in the proto-Singularity, and why they're important First, take a look the External Constraints Section of this MIRI Report and/or this article on AI Boxing. I will be talking mainly about these approaches. There are certainly others, but these are the easiest to extrapolate from current computer security. These AIs will provide us with the experimental knowledge to better handle the construction of even stronger AIs. If careful, we will be able to use these proto-Singularity AIs to learn about the nature of intelligence and cognition, to perform economically valuable tasks, and to test theories of friendliness (not perfectly, but well enough to start). "If careful" is the key phrase. I mentioned sandboxing above. And computer security is key to any attempt to contain an AI. Monitoring the source code, and setting a threshold for too much changing too fast at which point a failsafe freezes all computation; keeping extremely strict control over copies of the source. Some architectures will be more inherently dangerous and less predictable than others. A simulation of a physical brain, for instance, will be fairly opaque (depending on how far neuroscience has gone) but could have almost no potential to self-improve to an uncontrollable degree if its access to hardware is limited (it won't be able to make itself much more efficient on fixed resources). Other architectures will have other properties. Some will be utility optimizing agents. Some will have behaviors but no clear utility. Some will be opaque, some transparent. All will have a theory to how they operate, which can be refined by actual experimentation. This is what we can gain! We can set up controlled scenarios like honeypots to catch malevolence. We can evaluate our ability to monitor and read the thoughts of the agi. We can develop stronger theories of how damaging self-modification actually is to imposed constraints. We can test our abilities to add constraints to even the base state. But do I really have to justify the value of experimentation? I am familiar with criticisms based on absolutley incomprehensibly perceptive and persuasive hyperintelligences being able to overcome any security, but I've tried to outline above why I don't think we'd be dealing with that case. Political issues Right now AGI is really a political non-issue. Blue sky even compared to space exploration and fusion both of which actually receive funding from government in substantial volumes. I think that this will change in the period immediately leading up to my hypothesized AI Manhattan Project. The AI Manhattan Project can only happen with a lot of political will behind it, which will probably mean a spiral of scientific advancements, hype and threat of competition from external unfriendly sources. Think space race. So suppose that the first few AIs are built under well controlled conditions. Friendliness is still not perfected, but we think/hope we've learned some valuable basics. But now people want to use the AIs for something. So what should be done at this point? I won't try to speculate what happens next (well you can probably persuade me to, but it might not be as valuable), beyond extensions of the protocols I've already laid out, hybridized with notions like Oracle AI. It certainly gets a lot harder, but hopefully experimentation on the first, highly-controlled generation of AI to get a better understanding of their architectural fundamentals, combined with more direct research on friendliness in general would provide the groundwork for this. ## A medium for more rational discussion 9 24 February 2014 05:20PM It would be cool if online discussions allowed you to 1) declare your claims, 2) declare how your claims depend on each other (ie. make a dependency tree), 3) discuss the claims, and 4) update the status of the claim by saying whether or not you agree with it, and using something like the text shorthand for uncertainty to say how confident you are in your agreement/disagreement. I think that mapping out these things visually would allow for more productive conversation. And it would also allow newcomers to the discussion to quickly and easily get up to date, rather than having to sift through tons of comments. On this note, there should also probably be something like an answer wiki for each claim to summarize the arguments and say what the consensus is. I get the feeling that it should be flexible though. That probably means that it should be accompanied by the normal commenting system. Sometimes you don't actually know what your claims are, but need to "talk it out" in order to figure out what they are. Sometimes you don't really know how they depend on each other. And sometimes you have something tangential to say (on that note, there should probably be an area for tangential comments, or at least a way to flag them as tangential). As far who would be interested in this, obviously this Less Wrong community would be interested, and I think that there are definitely some other online communities that would (Hacker News, some subreddits...). Also, this may be speculating, but I would hope that it would develop a reputation for the most effective way to have a productive discussion. So much so that people would start saying, "go outline your argument on [name]". Maybe there'd even be pressure for politicians to do this. If so, then I think this could put pressure on society to be more rational. What do you guys think? EDIT: If anyone is actually interested in building this, you definitely have my permission (don't worry about "stealing the idea"). I want to build it, but 1) I don't think I'm a good enough programmer yet, and 2) I'm busy with my startup. EDIT: Another idea: if you think that a statement commits an established fallacy, then you should be able to flag it (like this). And if enough other people agree, then the statement is underlined or highlighted or something. The advantage to this is that it makes the discussion less "bulky". A simple version of this would be flagging things as less than DH6. But there are obviously a bunch of other things worth flagging that Eliezer has talked about in the sequences that are pretty non-controversial. EDIT: Here is a rough mockup of how it would look. Notes: - The claims should show how many votes of agreement/disagreement they got. Probably using text shorthand for uncertainty. - The claims should be colored green if there is a lot of agreement, and red if there is a lot of disagreement. - See edit above. Commenting in the discussion should be like this. And you should be able to flag statements as fallacious in a similar way. If there is enough agreement about the flag, the statement should be underlined in red or something. ## Identity and Death 9 18 February 2014 11:35AM This recent SMBC comic illustrates the old question of what exactly is you by referencing the Star Trek Teleporter Problem. Do you actually get teleported or does the teleporter just kill you before making a copy of you somewhere else? Well, the answer that a lot of rationalist seem to accept is Pattern Identity Theory proposed by Hans Moravec (skim the link or do a google search for the theory if you have no idea what I am referring to). I am very sympathetic to this view and it definitely ties with my limited understanding of physics and biology - elementary particles are interchangeable and do not have 'identity', at least some of the atoms in your body (including some of those who form neurons) get replaced over time etc. This is all fine and dandy, but if you take this view to its logical extreme it looks like a sufficently modified version of you shouldn't actually qualify as you - the difference in the pattern might be as great or greater than the difference in the patterns of any two random people. Let's say something happens to Eliezer and he gets successfully cryo-preserved in 2014. Then 80 years later the singularity hasn't arrived yet but the future is still pretty good - everyone is smart and happy due to enhancements, ageing is a thing of the past and we have the technology to wake cryopreserved people up. The people in that future build Eliezer a new body, restore the information from his brain and apply all the standard enhancements on him and then they wake him up. The person who wakes up remembers all that good old Eliezer did and seems to act like you would expect an enhanced Eliezer to act. However, if you examine things closely the difference between 2014!Eliezer and 2094!Eliezer is actually bigger than the difference between 2014!Eliezer and let's say 2014!Yvain due to having all the new standard enhancements. Does that person really qualify as the same person according to Pattern Identity Theory, then? Sure, he originates from Eliezer and arguably the difference between the two is similar to the difference between kid!Eliezer and adult!Eliezer but is it really the same pattern? If you believe that you really are the pattern then how can you not think of Eliezer!2014 as a dead man? Sure, you could argue that continual change (as opposed to the sudden change in the cryo!Eliezer scenario) or 'evolution of the pattern' is in some way relevant but why would that be? The only somewhat reasonable argument for that I've seen is 'because it looks like this is what I care about'. That's fine with me but my personal preference is closer to 'I want to continue existing and experiencing things'; I don't care if anything that looks like me or thinks it's me is experiencing stuff - I want me (whatever that is) to continue living and doing stuff. And so far it looks really plausible that me is the pattern which sadly leaves me to think that maybe changing the pattern is a bad idea. I know that this line of thinking can damn you to eternal stagnation but it seems worth exploring before teleporters, uploading, big self-enhancements etc. come along which is why I am starting this discussion. Additionally, a part of the problem might be that there is some confusion about definitions going on but I'd like to see where. Furthermore, 'the difference in the pattern' seems both somehow hard to quantify and more importantly - it doesn't look like something that could have a clear cut-off as in 'if the pattern differs by more than 10% you are a different person'. At any rate, whatever that cut-off is, it still seems pretty clear that tenoke!2000 differs enough from me to be considered dead. As an exercise at home I will leave you to think about what this whole line of thinking implies if you combine it with MWII-style quantum immortality. ## Meetup report: London LW paranoid debating session 9 16 February 2014 11:46PM A photo from a (different) recent LW London meetup Cross-posted from my blog. I wasn't going to bother writing this up, but then I remembered it's important to publish negative results. LessWrong London played a few rounds of paranoid debating at our meetup on 02/02/14. I'm not sure we got too much from the experience, except that it was fun. (I enjoyed it, at any rate.) There were nine of us, which was unwieldy, so we split into two groups. Our first questions probably weren't very good: we wanted the height of the third-highest mountain in the world, and the length of the third-longest river. (The groups had different questions so that someone on the other group could verify that the answer was well-defined and easy to discover. I had intended to ask "tallness of the third-tallest mountain", but the wikipedia page I found sorted by height, so I went with that.) I was on the "river" question, and we did pretty badly. None of us really knew what ballpark we were searching in. I made the mistake of saying an actual number that was in my head but I didn't know where from and I didn't trust it, that the longest river was something like 1,800 miles long. Despite my unconfidence, we became anchored there. Someone else suggested that the thing to do would be to take a quarter of the circumference of earth (which comes to 6,200 miles) as a baseline and adjust for the fact that rivers wiggle. I thought, that's crazy, you must be the mole. I think I answered 1500 miles. In reality, the longest river is 4,100 miles, the third longest is 3900 miles, and the mole decided that 1800 was dumb and he didn't need to do anything to sabotage us. (I don't remember what the circumference-over-four person submitted. I have a recollection that he was closer than I was, but I also had a recollection that circumference-over-four was actually pretty close, which it isn't especially.) The other team did considerably better, getting answers in the 8,000s for a true answer of 8,600. I'd devised a scoring system, where every player submits their own answer, and non-moles score proportional to -|log(given answer / actual answer)|; the mole scores the negative mean of everyone else's score. But after calculating it for a few people, we decided we didn't really care, and we probably wouldn't be playing enough rounds for it to become meaningful. Those questions weren't so great, because we felt there wasn't much you could do to approach them beyond having some idea of the correct answer. For round two we tried to pick questions more amenable to Fermi estimates: annual U.S. electricity consumption (sourced from Wolfram Alpha), and the number of pennies that could fit inside St. Paul's Cathedral. This round, we gave the correct answers to the moles. I was on team Cathedral, and again did pretty badly. We started by measuring pennies using notepaper for scale, and calculating packing densities, to work out pennies per cubic metre. (I don't remember the answer, but we got pretty close.) But after that it was just a matter of knowing how large St. Paul's Cathedral was likely to be. I had been stood outside St. Paul's Cathedral a few weeks back, but mostly facing in the wrong direction while tourists took photos of the group I was with. From that vague recollection I thought maybe it was about a thousand metres square at the base, and four stories so about twelve metres high? (Later I looked at a picture of the Cathedral, and realized that I was probably thinking of the dimensions of the entrance hall.) Someone else, who had actually been inside the cathedral, was giving much higher numbers, especially for the base, and someone else was agreeing with his general ballpark. And the other people weren't saying much, so I figured one of those two had to be the mole, and decided to not update very much in that direction. Those numbers were pretty reasonable, and mine were pretty bad. One of them was the mole (not the one I most suspected); I don't remember what he said his strategy was. Again, I'm not too sure it was a great question; pennies-per-cubic-metre is a question of geometry rather than estimation, and the interior dimensions of St. Paul's Cathedral don't seem much more Fermi estimable than the river question. The other team got very close to the answer I'd given the mole. Apparently they actually came up with a number off the top of someone's head that was pretty damn close. Embarassingly, the answer I gave the mole was an order of magnitude too high.... I'd sourced it from Wolfram Alpha ahead of time, but then someone asked me to clarify whether it was total energy usage, or just electricity usage. I looked again to check, and I think I used a different query, saw that it said "electricity usage", and didn't see that the number was different. The answer I actually gave was for energy usage. The mole on that team reported that it wasn't really helpful to know the correct answer without any intermediate steps. That mechanic might be worth experimenting further, but currently I think it doesn't add much, and it's a mild extra hassle when setting up. I had fun, and hopefully in future I will put much less trust in my estimates of the dimensions of things, but I wouldn't say the session was a particular success. Not a failure either, just kind of "it happened". ## Mental Subvocalization --"Saying" Words In Your Mind As You Read 9 15 February 2014 02:38AM http://en.wikipedia.org/wiki/Subvocalization I'm curious about how often or to what degree visitors to this site subvocalize as they read. I was originally interested in reducing subvocalizations as a way to increase reading speed, as the idea is mentioned in multiple pieces I've read about speed reading. The Wikipedia entry seems to focus on subtle throat and muscle movements, but I'm more interested to know if you "say" or "hear" the words in your head as you read. Since reading about subvocalization recently, I seem to notice that I "say/hear" what I'm reading quite frequently. I'm not sure if this is causal (in the way that the command "don't think of pink elephants" obliges you to do so), or if I just notice it more now, or both. When I'm very engrossed in a book either I don't notice the subvocalizations or they stop happening, so seems that it could either be a cause or a symptom of distractedness. In the comments, please describe your mental subvocalizations (or lack of them) and if they are related to how engrossed you are in the book. Any other comments relevant comments about speed reading or subvocalizations are welcome. ## Cambridge (England) lecture: Existential Risk: Surviving the 21st Century, 26th February 9 14 February 2014 07:39PM The Centre for the Study of Existential Risk will be holding a public lecture on "Existential Risk: Surviving the 21st Century" in collaboration with 80,000 Hours: Cambridge and Giving What We Can: Cambridge on the 26th of February in Cambridge (United Kingdom). Lady Mitchell Hall, Sidgwick Site, Cambridge. 5:30pm-6:45pm, with drinks reception to follow. SPEAKERS: Lord Martin Rees, Astronomer Royal Jaan Tallinn, co-founder of Skype Huw Price, Bertrand Russell Professor of Philosophy at Cambridge "In the coming century, the greatest threats to human survival may come from our own technological developments. However, if we can safely navigate the pitfalls, the benefits that technology promises are enormous. A philosopher, an astronomer, and an entrepreneur have come together to form the Centre for the Study of Existential Risk. The goal: to bring a fraction of humanity’s talents to bear on the task of ensuring our long-term survival. In this lecture, Huw Price, Martin Rees and Jaan Tallinn will outline humanity’s greatest challenge: surviving the 21st century." This event is free and open to all. Facebook event notice is here. In other news, I hope to be posting a general update on progress with the Centre's establishment fairly shortly for those who are interested, although it's still an ongoing process. Things have become quite busy and there are a lot of opportunities to follow up on, so I'll be taking a leave of absence from the Future of Humanity Institute for 6 months in April to work full-time on the project to establish the centre. ## Useful Personality Tests 9 11 February 2014 11:18AM Have you ever taken a personality quiz/test that helped you have valuable insights? If so, what were the tests and how were they useful? The only useful ones I've found all yielded the same type of insight. They showed me where I stand relative to others, which is can be genuinely useful since representative samples of large populations can be hard to come by. This includes IQ tests and tests for mental disorders (in my experience, people are usually aware that they are, for example, smarter than the average (although the Dunning-Kruger effect might complicate this) or have some intrusive thoughts and compulsive rituals, but might be surprised to find that they are three standard deviations above the norm or that their symptoms are sufficiently severe to be considered OCD). No remotely reliable (as in, not astrology) test I have ever seen has revealed genuinely surprising information for a moderately self-aware person, outside of ranking. Furthermore, they rarely gather personality data in a remotely subtle or non-transparent way ("do you like spending lots of time with large groups of people?" "yes..." "surprise, you're an extrovert!"), and thus seem super susceptible to test-takers' attempts to confirm a desired identity. An example of a more interesting/subtle way to potentially conduct a personality test would be to use question like OKTrends' "do you like beer?" which clusters strongly with "do you have sex on the first date," and, potentially, sexual openness. Such results might be harder for manipulate (consciously or unconsciously) and could assist with deeper self-awareness. Edited because the first link was broken. ## The rationality of splitting donations 9 10 February 2014 03:13AM Here are some tentative thoughts that I haven't run by anyone to check for soundness. They're not genuinely original to me – they've been floating around the effective altruism community in some form or other for a while – I just hadn't thought them through in sufficient detail to take them to their logical conclusion. I'd appreciate any feedback. Suppose that the expected number of lives saved per additional dollar donated to charity A is x and the expected value of lives saved per additional dollar donated to charity B is y, where x and y are constants, and x > y. Then if you're trying to maximize the expected number of lives saved (c.f. The "Intuitions" Behind "Utilitarianism"), you should make all of your charitable contributions to charity A. In practice, x and y will not be constant, because of room for more funding issues. So splitting one's donations can maximize number of lives saved, if x is sometimes smaller than y. But suppose that you're donating$d, where increasing the charities' budgets by $d would leave the condition x > y unaltered. A common view is that one should then donate all$d to charity A.

However, this doesn't take into account timeless decision theory. If all donors to charities A and B were identical to you, then your decision to donate \$d to charity A would be equivalent to a decision for all donors' funds to go to charity A rather than charity B, effectively constituting a decision for charity A to get a little bit more money in exchange for charity B existing altogether. If x > y doesn't always hold, this is not expected value maximizing. The other donors aren't identical to you, but their decisions are still correlated with yours on account of the psychological unity of humankind, and shared cultural backgrounds.

Suppose that x > y is not always true. For simplicity, suppose that the total amount that donors will donate to charities A and B is fixed and known to you. If there were no correlation between your decision making and that of other donors, then it would be that you should give all of your money to charity A. If the correlation between your decision making and that of the other donors was perfect, then your ratio of donations to charity A to charity B should be the same as the ratio of the total amount of funding that you think charity A should get to the total amount of funding that you think charity B should get. This raises the possibility that your actual split of donations should be somewhere between these two extremes, and in particular, that you should split donations.

In practice it won't always make sense to split donations: it might be that for any given charity A, there are many charities B with the property that x > y is not always true, such that it would be a logistical hassle to split one's donations between all of them. But when one has a small handful of charities that one is considering donating to, it may make sense to split one's donations, even when one is a small donor.

Moreover, charities are closer in cost-effectiveness than might initially meet the eye, so that the condition that x > y is not always true holds more often than might initially meet the eye. So the case is for splitting donations is stronger than might initially meet the eye, and the split should be more even than might initially meet the eye.

## Single player extensive-form games as a model of UDT

8 25 February 2014 10:43AM

This post was inspired by Benja's SUDT post. I'm going to describe another simplified model of UDT which is equivalent to Benja's proposal, and is based on standard game theory concepts as described in this Wikipedia article.

First let's define what is a "single player extensive-form game with chance moves and imperfect information":

1. A "single player extensive-form game" is a tree of nodes. Each leaf node is a utility value. A play of the game starts at the root and ends at some leaf node.
2. Some non-leaf nodes are "chance nodes", with probabilities assigned to branches going out of that node. All other non-leaf nodes are "decision nodes", where the player can choose which branch to take. (Thanks to badger for helping me fix an error in this part!)
3. "Imperfect information" means the decision nodes are grouped into "information sets". The player doesn't know which node they're currently at, only which information set it belongs to.
4. "Imperfect recall" is a special case of imperfect information, where knowing the current information set doesn't even allow the player to figure out which information sets were previously visited, like in the Absent-Minded Driver problem.
5. We will assume that the player can use "behavioral strategies", which let the player make a random choice at each node independently, rather than "mixed strategies", which randomize over the set of pure strategies for the entire game. See Piccione and Rubinstein's paper for more on this difference. (Thanks to Coscott for pointing out that assumption!)
6. The behavioral strategy with the highest expected utility will be taken as the solution of the game.

(How does that relate to Benja's SUDT proposal? As far as I can tell, SUDT is somewhere in between an extensive-form game and its corresponding normal-form game, which is just a mapping of strategies to utilities in the single player case.)

Now let's try using that to solve some UDT problems:

Absent-Minded Driver is the simplest case, since it's already discussed in the literature as a game of the above form. It's strange that not everyone agrees that the best strategy is indeed the best, but let's skip that and move on.

Psy-Kosh's non-anthropic problem is more tricky, because it has multiple players. We will model it as a single-player game anyway, putting the decision nodes of the different players in sequence and grouping them together into information sets in the natural way. The resulting game tree is complicated, but the solution is the same as UDT's. As a bonus, we see that our model does not need any kind of anthropic probabilities, because it doesn't specify or use the probabilities of individual nodes within an information set.

Wei Dai's coordination problem is similar to the previous one, but with multiple players choosing different actions based on different information. If we use the same trick of folding all players into one, and group the decision nodes into information sets in the natural way, we get the right solution again. It's nice to see that our model automatically solves problems that require Wei's "explicit optimization of global strategy".

Counterfactual Mugging is even more tricky, because writing it as an extensive-form game must include a decision node for Omega's simulation of the player. Some people are okay with that, and our model gives the right solution. But others feel that it leads to confusing questions about the nature of observation. For example, what if Omega used a logical coin, and the player could actually check which way the coin came up by doing a long calculation? Paying up is still the right decision (I think we solved that problem in September), but our model here doesn't have enough detail.

Finally, Agent Simulates Predictor is the kind of problem that cannot be captured by our model at all, because logical uncertainty is the whole point of ASP.

It's instructive to see the difference between the kind of UDT problems that fit our model, and those that require something more. Also it would be easy to implement the model as a computer program, and solve some UDT problems automatically. (Though the exercise wouldn't have much scientific value, because extensive-form games are a well known idea.) In this way it's a little similar to Patrick's work on modal agents, which made some UDT problems solvable on the computer by using modal logic instead of enumerating proofs. Now I wonder if other problems that involve logical uncertainty could also be solved by some simplified model?

## Open Thread February 25 - March 3

## What are some related communities online?

8 24 February 2014 01:04PM

I think that it might be useful to create a list of related communities online that people might want to check out. Suggestions much appreciated.

Directly Related to rationalism

Skeptics Stack Exchange - useful for confirming factual claims you are skeptical of. Requires specific, answerable questions

Straight dope - similar to Skeptics, but less strict on the questions accepted

Change my view - place to get your view challenged

Cognitive Science - you are expected to do your reading first, but useful if your want to lean about what the research says

Philosophy Reddit

Secondary:

Ask Science

## LINK-How we make our depression worse

8 21 February 2014 07:02PM

http://www.alternet.org/print/books/youre-making-your-depression-worse-self-help-bringing-us-down

TL;DR:

only a human being can feel bad about feeling bad

results in a positive feedback loop pushing people into depression.

## Embracing the "sadistic" conclusion

8 13 February 2014 10:30AM

This is not the post I was planning to write. Originally, it was going to be a heroic post where I showed my devotion to philosophical principles by reluctantly but fearlessly biting the bullet on the sadistic conclusion. Except... it turns out to be nothing like that, because the sadistic conclusion is practically void of content and embracing it is trivial.

## Sadism versus repugnance

The sadistic conclusion can be found in Gustaf Arrhenius's papers such as "An Impossibility Theorem for Welfarist Axiologies." In it he demonstrated that - modulo a few technical assumptions - any system of population ethics has to embrace either the Repugnant Conclusion, the Anti-Egalitarian Conclusion or the Sadistic conclusion. Astute readers of my blog posts may have noticed I'm not the repugnant conclusion's greatest fan, evah! The anti-egalitarian conclusion claims that you can make things better by keeping total happiness/welfare/preference satisfaction constant but redistributing it in a more unequal way. Few systems of ethics embrace this in theory (though many social systems seem to embrace it in practice).

Remains the sadistic conclusion. A population ethics that accepts this is one where it is sometimes better to create someone whose life is not worth living (call them a "victim"), rather a group of people whose lives are worth living. It seems well named - can you not feel the top hatted villain twirl his moustache as he gleefully creates lives condemned to pain and misery, laughing manically as he prevents the intrepid heroes from changing the settings on his incubator machine to "worth living"? How could that sadist be in the right, according to any decent system of ethics?

## Remove the connotations, then the argument

But the argument is flawed, for two main reasons: one that strikes at the connotations of "sadistic", the other at the heart of the comparison itself.

The reason the sadistic aspect is a misnomer is that creating a victim is not actually a positive development. Almost all ethical systems would advocate improving the victim's life, if at all possible (or ending it, if appropriate). Indeed some ethical systems which have the "sadistic conclusion" (such as prioritarianism or egalitarianism) would think it more important to improve the victim's life that some ethical systems that don't have the conclusion (such as total utilitarianism). Only if such help is somehow impossible do you get the conclusion. So it's not a gleeful sadist inflicting pain, but a reluctant acceptance that "if universe conspires to prevent us from helping this victim, then it still may be worth creating them as the least bad option" (see for instance this comment).

"The least bad option." For the sadistic conclusion is based on a trick, contrasting two bad options and making them seem related (see this comment). Consider for example whether it is good to create a large permanent underclass of people with much more limited and miserable lives than all others - but whose lives are nevertheless just above some complicated line of "worth living". You may or may not agree that this is bad, but many people and many systems of population ethics do feel it's a negative outcome.

Then, given that this underclass is a bad outcome (and given a few assumptions as to how outcomes are ranked) then we can find other bad outcomes that are not quite as bad as this one. Such as... a single victim, a tiny bit below the line of "worth living". So the sadistic conclusion is not saying anything about the happiness level of a single created population. It's simply saying that sometime (A) creating underclasses with slightly worthwhile lives can sometimes be bad, while (B) creating a victim can sometimes be less bad. But the victim isn't playing a useful role here: they're just an example of a bad outcome better than (A), only linked to (A) through superficial similarity and rhetoric.

For most systems of population ethics the sadistic conclusion can thus be reduced to "creating underclasses with slightly worthwhile lives can sometimes be bad." But this is the very point that population ethicists are disputing each other about! Wrapping that central point into a misleading "sadistic conclusion" is... well, the term "misleading" gave it away.

## Calorie Restriction: My Theory and Practice

8 12 February 2014 01:16AM

Like most futurist-oriented people, I am fascinated by the idea of long-term life extension -- the notion that eventually people will have life expectancies of hundreds of years; thousands of years; or even more.  Although medicine has a ways to go in this area, one obvious approach is to take low tech steps to increase one's lifespan in hopes of living long enough to take advantage of possible future advances.  i.e., to roll with +1 dice.  Besides the obvious steps like wearing seat belts; getting regular exercise; eating a lot of fruits and vegetables, calorie restriction presents itself as an intriguing possible method of life extension.

In this essay, I will attempt to briefly define calorie restriction; assess how useful it might be; speculate about how it might be useful; and use the foregoing to justify my own personal approach to calorie restriction, which I will describe presently.  Of course I welcome comments and criticisms, especially since I am messing around with my own health.

I should note at the outset that I have no formal training or credentials in medicine nutrition or anything like that.  I'm just an attorney.

I should also add that my approach to calorie restriction is not a diet in the sense of being a weight loss strategy for people who cannot control their eating.  This is not a weight loss post!  I do not describe in this essay how I control my eating; control is assumed.

Last, my general approach is one of no regret.  i.e. My main priority in calorie-restricting myself is to avoid doing anything too radical in terms of loss of quality of life or risk to my health.

I. What is Calorie Restriction?

Wikipedia defines "calorie restriction" as follows:

Caloric restriction (CR), or calorie restriction, is a dietary regiment that is based on low calorie intake. "Low" can be defined relative to the subject's previous intake before intentionally restricting calories, or relative to an average person of similar body type.

So immediately we see a problem -- the concept of calorie restriction is ambiguous.  How am I supposed to evaluate and possibly implement calorie restriction in my life if I am not even clear on what it means?  This is not just a problem for laymen like me.  Imagine you are a researcher who is studying the effects of calorie restrictions in lab chimps.  How do you feed your control group of lab chimps?  Do you let them eat donuts and potato chips ad libitum?  Do you limit them to chimp chow?  Without a clear definition, this is a bit of a conundrum.

In fact, one individual has argued that the difference in treatment of control animals may be part of the reason why two studies on calorie restriction in monkeys had different results:

Further, the NIA study control monkeys were not truly fed ad libitum, unlike the WNPRC study. The regulated portioning of food for the NIA control monkeys may be a slight restriction and, thus, largely prevented obesity. Studies of 10% CR have been reported to increase lifespan in rats compared to ad libitum controls – even more than 25% and 40% CR20. The NIA control monkeys may experience survival benefits from this slight restriction.

http://www.crsociety.org/science/nia_monkey_study

Another individual states as follows:

"Both the NIA and U Wisc studies need to be considered together for proper interpretation. It is clear that the U Wisc "controls" differ from the U Wisc CR group and BOTH NIA groups, and are probably most like the general populations of developed countries.

Because we at NIA wanted to avoid the criticism leveled at many rodent CR studies that controls are overweight and sedentary, we specifically designed our dietary conditions to supply an adequate, but not OVERadequate, caloric intake.

The bottom line is that, for most people (who are more like the U Wisc controls), CR may indeed provide both health (BOTH studies agree on THIS) and longevity benefits.....and of course, most important.....more "healthy years."

https://www.crsociety.org/index.php?/topic/2939-dr-george-roth-comments-on-calorie-restriction-and-nia-monkey-study/

For purposes of this essay, I will offer the following definitions:

1.  "Mild calorie restriction" = restricting calories sufficiently so that you avoid gaining large amounts of weight.

2.  "Moderate calorie restriction" = restricting calories sufficiently so that most of the time you are towards the bottom of your metabolic range.

3.  "Severe calorie restriction" = restricting calories sufficiently so that you end up spending your time significantly below typically fit people in terms of muscle mass and/or body fat.

The first and third definitions are pretty straightforward, although it's worth noting that a lot of people engage in mild calorie restriction unintentionally, just through the operation of their natural system which regulates their appetite/urge to eat/urge to stop eating (John Walker calls this the "food clock.")

The second definition requires a little explanation.   From simple observation, it appears that small changes in one's energy intake result in corresponding changes in one's metabolic rate.  So that if your weight is stable but you eat a little more or less than usual, you might notice that you are a little warmer or cooler than usual.  Evidently the body can and does make small adjustments to its metabolic rate in response to changes in food intake.  This is also consistent with dieters' reports that they feel cold when dieting.

II.     Does Calorie Restriction Work in Humans?

It seems quite likely that mild calorie restriction works in humans based on the observation that fat people have significantly greater mortality than thin people.

For example, as illustrated by the charts here:

http://www.nejm.org/action/showImage?doi=10.1056%2FNEJMoa1000367&iid=f01&

Of course one cannot know this for sure since there is no ethical way to do a large controlled experiment, but still it's reasonable to infer cause and effect:  Common sense says that being fat puts a lot of abnormal extra strain on your system almost all the time.  In any event, there seems to be little downside to mild calorie restriction.

A more interesting question is whether moderate calorie restriction works in humans.  Common sense says that it ought to be beneficial based on the idea that slowing one's metabolism ought to slow the aging process, all things being equal.  One interesting area of research is studies which look at the effect of modest weight loss among obese people.  Is someone who goes from 250 pounds to 225 pounds and stays there more healthy than someone who goes from 210 pounds to 220 pounds and continues to gain weight?  If so, part of the difference might be that the second person is towards the top of his metabolic range while the first person is towards the middle or bottom.

The Calorie Restriction Society web site links to a couple presentations which argue that cancer is actually a metabolic disease related to having too much energy in play.  I'm a bit skeptical of this claim, but it does seem to me that you are inviting trouble by having extra energy floating around in your system.

As for severe calorie restriction, the jury is still out.  I don't put too much stock in the left side of the J-shaped curves comparing body weight to mortality.  Surely a lot of underweight people have serious latent health problems.  What's more interesting to me is that the curves flatten out between BMI of about 19 and 23.5.  This suggests to me that one can realize most of the benefits of reduced body mass by being normal weight and that after that, if there are any benefits, it's diminishing returns.

III.   My Approach to Calorie Restriction

I have decided to adopt an intermediate approach to calorie restriction, i.e. the aim is to stay thin and be towards the bottom of my metabolic range most of the time.  The health benefits to staying thin are pretty clear; there doesn't seem to be much downside; and frankly there are a lot of social benefits.  The benefits of staying towards the bottom of my metabolic range are more iffy, but again there doesn't seem to be much downside to it.  (Putting aside issues of health, the main downside is that it happens pretty frequently that I will have a meal and eat less food than I would have liked to eat.)

Severe calorie restriction seems too speculative to me to be worth the trouble.  Particularly given the social costs and the likely diminishing returns problem.  I like having a somewhat muscular appearance as opposed to a gaunt appearance.  Since my main priority is to avoid regrets, I am not willing to go this route without pretty solid evidence of benefit.

IIIa.  The Nuts and Bolts

What I do is this:  I have a basic daily diet which I believe is reasonably healthy and well-balanced.  Although it is somewhat flexible, it contains roughly the same proportions of macro-nutrients and is roughly the same amount of calories each day.  From careful observation, I have determined that my basic daily diet is about 500 to 600 calories short of my actual daily caloric needs.  i.e. if I stuck to my basic daily diet and ate nothing more, I would lose about a pound a week.  I add a small supplement of extra food to my basic daily diet if I work out at the gym in order to balance out the exercise.  (Interestingly, I once measured and it seems my basic daily diet, including the exercise supplement, is about 2800 calories.  This seems pretty high for a man who is thin, slightly below average height, and only slightly muscular in build.  I'm not sure what to make of it.)

I weigh myself every morning and calculate a 7-day moving average of my weight.  I then subtract this number from a pre-determined reference weight and multiply the result by 100.  This is the number of additional calories I consume that day in the form of reasonably healthy foods.  The idea is to eat close to the minimum to maintain weight, thus staying thin and towards the bottom of my metabolic range.

Now and then my weight spikes upward when I have an event which involves a lot of eating; after that it drifts back down again.  I've been calorie-restricted in this way for some time now.  I feel perfectly fine but after every meal I feel like I could easily eat more.  I pretty much never get heartburn anymore.  I usually wake up quite hungry.  These are about the only effects I have noticed.

IV.  Self-Criticisms of My Approach

In the interest of rationality, it probably makes sense to offer some self-criticism:

1. I found the above scientific references only after I had settled on my approach to calorie restriction.   So there is probably a certain backwardness about my reasoning.  My conclusion is based more on my own intuition, reasoning, observations and common sense than on scientific research.

2.  It never occurred to me to regularly measure my body temperature before and after starting this program.  Which is unfortunate because it may have given me some useful information about the effects of my diet on my metabolism.

3.  There's really no way to measure if any of this is having an effect on my rate of aging.  Without this sort of feedback, I'm pretty much shooting in the dark.

V.  Conclusion

So that's about the extent of my self-experiment.  It's a bit frightening that I'm putting my own health on the line in the face of so much uncertainty.  At same time, it seems like a reasonable, conservative approach which is unlikely to lead to regrets.  Of course there is an excellent chance I will never know how much of an impact my lifestyle had on my overall health.

Anyway, I welcome any intelligent thoughts, suggestions, constructive criticism, etc.

## In favour of terseness

7 08 March 2014 06:01PM

I like posts that are concise and to the point. Posts like that maximize my information/effort ratio. I would really like to see experienced rationalists simply post a list of things they believe on any given subject with a short explanation for why they believe each of those things. Then I could go ahead and adjust my beliefs based on those lists as necessary.

Sadly I don’t see any posts like this. Presumably this is because of the social convention where you’re expected to back up any public belief with arguments, so that other people can attempt to poke holes in them. I find this strange because the arguments people present rarely have anything to do with why they believe those things, which makes the whole exercise a giant distraction from the main point that the author is trying to bring across. In order to prevent this kind of derailment, posters tend to cover their arguments with endless qualifications so that their sentences read like this: “I personally believe that, in cases X Y Z and under circumstances B and C, ceteris paribus and barring obvious exceptions, it seems safe to say that murder is wrong, though of course I could be mistaken.” The problems with such excessive argumentation and qualification are threefold:

1. The post becomes less readable: The information/effort ratio is lowered.
2. It becomes much more difficult to tell what the author genuinely believes: Are they really unsure or just trying to appear humble? Is that their true objection, or just an argument?
3. Despite everything, someone is STILL going to miss the point and reply that sometimes killing people is ok in certain situations, and then the next 100 comments will be about that.

By contrast, terseness makes posts more readable and makes it less likely that the main point is misunderstood. So if we as a community could just relax the demand for argumentation and qualification somewhat, and we all focussed on debating the main points of posts instead of getting sidetracked, then perhaps experienced rationalists here could write nice and concise posts that give short and clear answers to complicated questions. Instead, some of the sequences are so long and involve so many arguments, counter-arguments and disclaimers that I feel the point is lost entirely.

## The sin of updating when you can change whether you exist

7 28 February 2014 01:25AM

Trigger warning: In a thought experiment in this post, I used a hypothetical torture scenario without thinking, even though it wasn't necessary to make my point. Apologies, and thanks to an anonymous user for pointing this out. I'll try to be more careful in the future.

Should you pay up in the counterfactual mugging?

I've always found the argument about self-modifying agents compelling: If you expected to face a counterfactual mugging tomorrow, you would want to choose to rewrite yourself today so that you'd pay up. Thus, a decision theory that didn't pay up wouldn't be reflectively consistent; an AI using such a theory would decide to rewrite itself to use a different theory.

But is this the only reason to pay up? This might make a difference: Imagine that Omega tells you that it threw its coin a million years ago, and would have turned the sky green if it had landed the other way. Back in 2010, I wrote a post arguing that in this sort of situation, since you've always seen the sky being blue, and every other human being has also always seen the sky being blue, everyone has always had enough information to conclude that there's no benefit from paying up in this particular counterfactual mugging, and so there hasn't ever been any incentive to self-modify into an agent that would pay up ... and so you shouldn't.

I've since changed my mind, and I've recently talked about part of the reason for this, when I introduced the concept of an l-zombie, or logical philosophical zombie, a mathematically possible conscious experience that isn't physically instantiated and therefore isn't actually consciously experienced. (Obligatory disclaimer: I'm not claiming that the idea that "some mathematically possible experiences are l-zombies" is likely to be true, but I think it's a useful concept for thinking about anthropics, and I don't think we should rule out l-zombies given our present state of knowledge. More in the l-zombies post and in this post about measureless Tegmark IV.) Suppose that Omega's coin had come up the other way, and Omega had turned the sky green. Then you and I would be l-zombies. But if Omega was able to make a confident guess about the decision we'd make if confronted with the counterfactual mugging (without simulating us, so that we continue to be l-zombies), then our decisions would still influence what happens in the actual physical world. Thus, if l-zombies say "I have conscious experiences, therefore I physically exist", and update on this fact, and if the decisions they make based on this influence what happens in the real world, a lot of utility may potentially be lost. Of course, you and I aren't l-zombies, but the mathematically possible versions of us who have grown up under a green sky are, and they reason the same way as you and me—it's not possible to have only the actual conscious observers reason that way. Thus, you should pay up even in the blue-sky mugging.

But that's only part of the reason I changed my mind. The other part is that while in the counterfactual mugging, the answer you get if you try to use Bayesian updating at least looks kinda sensible, there are other thought experiments in which doing so in the straight-forward way makes you obviously bat-shit crazy. That's what I'd like to talk about today.

*

The kind of situation I have in mind involves being able to influence whether you exist, or more precisely, influence whether the version of you making the decision exists as a conscious observer (or whether it's an l-zombie).

Suppose that you wake up and Omega explains to you that it's kidnapped you and some of your friends back in 2014, and put you into suspension; it's now the year 2100. It then hands you a little box with a red button, and tells you that if you press that button, Omega will slowly torture you and your friends to death; otherwise, you'll be able to live out a more or less normal and happy life (or to commit painless suicide, if you prefer). Furthermore, it explains that one of two things have happened: Either (1) humanity has undergone a positive intelligence explosion, and Omega has predicted that you will press the button; or (2) humanity has wiped itself out, and Omega has predicted that you will not press the button. In any other scenario, Omega would still have woken you up at the same time, but wouldn't have given you the button. Finally, if humanity has wiped itself out, it won't let you try to "reboot" it; in this case, you and your friends will be the last humans.

There's a correct answer to what to do in this situation, and it isn't to decide that Omega's just given you anthropic superpowers to save the world. But that's what you get if you try to update in the most naive way: If you press the button, then (2) becomes extremely unlikely, since Omega is really really good at predicting. Thus, the true world is almost certainly (1); you'll get tortured, but humanity survives. For great utility! On the other hand, if you decide to not press the button, then by the same reasoning, the true world is almost certainly (2), and humanity has wiped itself out. Surely you're not selfish enough to prefer that?

The correct answer, clearly, is that your decision whether to press the button doesn't influence whether humanity survives, it only influences whether you get tortured to death. (Plus, of course, whether Omega hands you the button in the first place!) You don't want to get tortured, so you don't press the button. Updateless reasoning gets this right.

*

Let me spell out the rules of the naive Bayesian decision theory ("NBDT") I used there, in analogy with Simple Updateless Decision Theory (SUDT). First, let's set up our problem in the SUDT framework. To simplify things, we'll pretend that FOOM and DOOM are the only possible things that can happen to humanity. In addition, we'll assume that there's a small probability $\textstyle \varepsilon$ that Omega makes a mistake when it tries to predict what you will do if given the button. Thus, the relevant possible worlds are $\textstyle \Omega = \{\mathrm{foom}, \mathrm{doom}\} \times \{\mathrm{correct},\mathrm{incorrect}\}$. The precise probabilities you assign to these doesn't matter very much; I'll pretend that FOOM and DOOM are equiprobable, $\textstyle \mathbb{P}(x,\mathrm{incorrect}) = \varepsilon/2$ and $\textstyle \mathbb{P}(x,\mathrm{correct}) = (1-\varepsilon)/2$.

There's only one situation in which you need to make a decision, $\textstyle \mathcal{I} = \{*\}$; I won't try to define NBDT when there is more than one situation. Your possible actions in this situation are to press or to not press the button, $\textstyle \mathcal{A}(*) = \{P,\neg P\}$, so the only possible policies are $\textstyle \pi_P$, which presses the button ($\textstyle \pi_P(*) = P$), and $\textstyle \pi_{\neg P}$, which doesn't ($\textstyle \pi_{\neg P}(*) = \neg P$); $\textstyle \Pi = \{\pi_P,\pi_{\neg P}\}$.

There are four possible outcomes, specifying (a) whether humanity survives and (b) whether you get tortured: $\textstyle \mathcal{O} = \{\mathrm{foom}, \mathrm{doom}\} \times \{\mathrm{torture},\neg\mathrm{torture}\}$. Omega only hands you the button if FOOM and it predicts you'll press it, or DOOM and it predicts you won't. Thus, the only cases in which you'll get tortured are $\textstyle o((\mathrm{foom},\mathrm{correct}),\pi_P) = (\mathrm{foom},\mathrm{torture})$ and $\textstyle o((\mathrm{doom},\mathrm{incorrect}),\pi_P) = (\mathrm{doom},\mathrm{torture})$. For any other $\textstyle x\in\{\mathrm{foom},\mathrm{doom}\}$, $\textstyle y\in\{\mathrm{correct},\mathrm{incorrect}\}$, and $\textstyle \pi\in\Pi$, we have $\textstyle o((x,y),\pi) = (x,\neg\mathrm{torture})$.

Finally, let's define our utility function by $u((\mathrm{foom},\neg\mathrm{torture})) = L$, $u((\mathrm{foom},\mathrm{torture})) = L-1$, $u((\mathrm{doom},\neg\mathrm{torture})) = -L$, and $u((\mathrm{doom},\mathrm{torture})) = -L-1$, where $\textstyle L$ is a very large number.

This suffices to set up an SUDT decision problem. There are only two possible worlds $\textstyle \omega\in\Omega$ where $\textstyle u(o(\omega,\pi_P))$ differs from $\textstyle u(o(\omega,\pi_{\neg P}))$, namely $\textstyle (\mathrm{foom},\mathrm{correct})$ and $\textstyle (\mathrm{doom},\mathrm{incorrect})$, where $\textstyle \pi_P$ results in torture and $\textstyle \pi_{\neg P}$ doesn't. In each of these cases, the utility of $\textstyle \pi_P$ is lower (by one) than that of $\textstyle \pi_{\neg P}$. Hence, $\textstyle \mathbb{E}[u(o(\boldsymbol{\omega},\pi_P))] < \mathbb{E}[u(o(\boldsymbol{\omega},\pi_{\neg P}))]$, implying that SUDT says you should choose $\textstyle \pi_{\neg P}$.

*

For NBDT, we need to know how to update, so we need one more ingredient: a function specifying in which worlds you exist as a conscious observer. In anticipation of future discussions, I'll write this as a function $\textstyle \mu(i;\omega,\pi)$, which gives the "measure" ("amount of magical reality fluid") of the conscious observation $\textstyle i\in\mathcal{I}$ if policy $\textstyle \pi\in\Pi$ is executed in the possible world $\textstyle \omega\in\Omega$. In our case, $\textstyle i = *$ and $\textstyle \mu(*;\omega,\pi)\in\{0,1\}$, indicating non-existence and existence, respectively. We can interpret $\textstyle \mu(i;\omega,\pi)$ as the conditional probability of making observation $\textstyle i$, given that the true world is $\textstyle \omega$, if plan $\textstyle \pi$ is executed. In our case, $\textstyle \mu(*;(\mathrm{foom},\mathrm{correct}),\pi_P) =$ $\textstyle \mu(*;(\mathrm{foom},\mathrm{incorrect}),\pi_{\neg P}) =$ $\textstyle \mu(*;(\mathrm{doom},\mathrm{correct}),\pi_{\neg P}) =$ $\textstyle \mu(*;(\mathrm{doom},\mathrm{incorrect}),\pi_P) = 1$, and $\textstyle \mu(*;\omega,\pi) = 0$ in all other cases.

Now, we can use Bayes' theorem to calculate the posterior probability of a possible world, given information $\textstyle i = *$ and policy $\textstyle \pi$: $\textstyle \mathbb{P}(\omega\mid i;\pi) = \mathbb{P}(\omega)\cdot\mu(i;\omega,\pi) / \sum_{\omega'\in\Omega} \mathbb{P}(\omega')\cdot\mu(i;\omega',\pi)$. NBDT tells us to choose the policy $\textstyle \pi$ that maximizes the posterior expected utility, $\textstyle \mathbb{E}[u(o(\boldsymbol{\omega},\pi))\mid i;\pi]$.

In our case, we have $\textstyle \mathbb{P}((\mathrm{foom},\mathrm{correct}) \mid *;\pi_P) = \mathbb{P}((\mathrm{doom},\mathrm{correct}) \mid *;\pi_{\neg P}) = 1-\varepsilon$ and $\textstyle \mathbb{P}((\mathrm{doom},\mathrm{incorrect}) \mid *;\pi_P) = \mathbb{P}((\mathrm{foom},\mathrm{incorrect}) \mid *;\pi_{\neg P}) = \varepsilon$. Thus, if we press the button, our expected utility is dominated by the near-certainty of humanity surviving, whereas if we don't, it's dominated by humanity's near-certain doom, and NBDT says we should press.

*

But maybe it's not updating that's bad, but NBDT's way of implementing it? After all, we get the clearly wacky results only if our decisions can influence whether we exist, and perhaps the way that NBDT extends the usual formula to this case happens to be the wrong way to extend it.

One thing we could try is to mark a possible world $\textstyle \omega$ as impossible only if $\textstyle \mu(*;\omega,\pi) = 0$ for all policies $\textstyle \pi$ (rather than: for the particular policy $\textstyle \pi$ whose expected utility we are computing). But this seems very ad hoc to me. (For example, this could depend on which set of possible actions $\textstyle \mathcal{A}(*)$ we consider, which seems odd.)

There is a much more principled possibility, which I'll call pseudo-Bayesian decision theory, or PBDT. PBDT can be seen as re-interpreting updating as saying that you're indifferent about what happens in possible worlds in which you don't exist as a conscious observer, rather than ruling out those worlds as impossible given your evidence. (A version of this idea was recently brought up in a comment by drnickbone, though I'd thought of this idea myself during my journey towards my current position on updating, and I imagine it has also appeared elsewhere, though I don't remember any specific instances.) I have more than one objection to PBDT, but the simplest one to argue is that it doesn't solve the problem: it still believes that it has anthropic superpowers in the problem above.

Formally, PBDT says that we should choose the policy $\textstyle \pi$ that maximizes $\textstyle \mathbb{E}[u(o(\boldsymbol{\omega},\pi))\cdot\mu(*;\boldsymbol{\omega},\pi)]$ (where the expectation is with respect to the prior, not the updated, probabilities). In other words, we set the utility of any outcome in which we don't exist as a conscious observer to zero; we can see PBDT as SUDT with modified outcome and utility functions.

When our existence is independent on our decisions—that is, if $\textstyle \mu(*;\omega,\pi)$ doesn't depend on $\textstyle \pi$—then it turns out that PBDT and NBDT are equivalent, i.e., PBDT implements Bayesian updating. That's because in that case, $\textstyle \mathbb{E}[u(o(\boldsymbol{\omega},\pi))\mid *;\pi] =$ $\textstyle \sum_{\omega\in\Omega} u(o(\omega,\pi))\cdot\mathbb{P}(\omega\mid *;\pi)$ $\textstyle = \sum_{\omega\in\Omega} u(o(\omega,\pi))\cdot\mathbb{P}(\omega)\cdot \mu(*;\omega,\pi) / \sum_{\omega'\in\Omega} \mathbb{P}(\omega')\cdot\mu(*;\omega',\pi)$. If $\textstyle \mu(*;\omega,\pi)$ doesn't depend on $\textstyle \pi$, then the whole denominator doesn't depend on $\textstyle \pi$, so the fraction is maximized if and only if the numerator is. But the numerator is $\textstyle \sum_{\omega\in\Omega} u(o(\omega,\pi))\cdot\mathbb{P}(\omega)\cdot \mu(*;\omega,\pi) =$ $\textstyle \mathbb{E}[u(o(\boldsymbol{\omega},\pi))\cdot\mu(*;\omega,\pi)]$, exactly the quantity that PBDT says should be maximized.

Unfortunately, although in our problem above $\mu(*;\omega,\pi)$ does depend of $\pi$, the denominator as a whole still doesn't: For both $\pi_P$ and $\pi_{\neg P}$, there is exactly one possible world with probability $(1-\varepsilon)/2$ and one possible world with probability $\varepsilon/2$ in which $*$ is a conscious observer, so we have $\textstyle\sum_{\omega'\in\Omega} \mathbb{P}(\omega')\cdot\mu(*;\omega',\pi) = 1/2$ for both $\pi\in\Pi$. Thus, PBDT gives the same answer as NBDT, by the same mathematical argument as in the case where we can't influence our own existence. If you think of PBDT as SUDT with the utility function $u(o(\omega,\pi))\cdot\mu(*;\omega,\pi)$, then intuitively, PBDT can be thought of as reasoning, "Sure, I can't influence whether humanity is wiped out; but I can influence whether I'm an l-zombie or a conscious observer; and who cares what happens to humanity if I'm not? Best to press to button, since getting tortured in a world where there's been a positive intelligence explosion is much better than life without torture if humanity has been wiped out."

I think that's a pretty compelling argument against PBDT, but even leaving it aside, I don't like PBDT at all. I see two possible justifications for PBDT: You can either say that $u(o(\omega,\pi))\cdot\mu(*;\omega,\pi)$ is your real utility function—you really don't care about what happens in worlds where the version of you making the decision doesn't exist as a conscious observer—or you can say that your real preferences are expressed by $u(o(\omega,\pi))$, and multiplying by $\mu(*;\omega,\pi)$ is just a mathematical trick to express a steelmanned version of Bayesian updating. If your preferences really are given by $u(o(\omega,\pi))\cdot\mu(*;\omega,\pi)$, then fine, and you should be maximizing $\textstyle \mathbb{E}[u(o(\boldsymbol{\omega},\pi))\cdot\mu(*;\omega,\pi)]$ (because you should be using (S)UDT), and you should press the button. Some kind of super-selfish agent, who doesn't care a fig even about a version of itself that is exactly the same up till five seconds ago (but then wasn't handed the button) could indeed have such preferences. But I think these are wacky preferences, and you don't actually have them. (Furthermore, if you did have them, then $u(o(\omega,\pi))\cdot\mu(*;\omega,\pi)$ would be your actual utility function, and you should be writing it as just $u(o(\omega,\pi))$, where $o(\omega,\pi)$ must now give information about whether $*$ is a conscious observer.)

If multiplying by $\mu(*;\omega,\pi)$ is just a trick to implement updating, on the other hand, then I find it strange that it introduces a new concept that doesn't occur at all in classical Bayesian updating, namely the utility of a world in which $*$ is an l-zombie. We've set this to zero, which is no loss of generality because classical utility functions don't change their meaning if you add or subtract a constant, so whenever you have a utility function where all worlds in which $*$ is an l-zombie have the same utility $u_0$, then you can just subtract $u_0$ from all utilities (without changing the meaning of the utility function), and get a function where that utility is zero. But that means that the utility functions I've been plugging into PBDT above do change their meaning if you add a constant to them. You can set up a problem where the agent has to decide whether to bring itself into existence or not (Omega creates it iff it predicts that the agent will press a particular button), and in that case the agent will decide to do so iff the world has utility greater than zero—clearly not invariant under adding and subtracting a constant. I can't find any concept like the utility of not existing in my intuitions about Bayesian updating (though I can find such a concept in my intuitions about utility, but regarding that see the previous paragraph), so if PBDT is just a mathematical trick to implement these intuitions, where does that utility come from?

I'm not aware of a way of implementing updating in general SUDT-style problems that does better than NBDT, PBDT, and the ad-hoc idea mentioned above, so for now I've concluded that in general, trying to update is just hopeless, and we should be using (S)UDT instead. In classical decision problems, where there are no acausal influences, (S)UDT will of course behave exactly as if it did do a Bayesian update; thus, in a sense, using (S)UDT can also be seen as a reinterpretation of Bayesian updating (in this case just as updateless utility maximization in a world where all influence is causal), and that's the way I think about it nowadays.

## LessWrong Hamburg Second Meetup Notes: In need of Structure

7 22 February 2014 04:20PM

Review of our second Meetup : LessWrong Hamburg - about Procrastination

When I arrived late there already was a discussion about the benefits and content of LessWrong. I didn't take a clear organizers role and instead let the meetup mostly run itself. This worked OK but also led to the planned topic procrastication falling off the table until very late when it was discovered that there was actually more interest in it than everybody seemed to have assumed.

There were phases where the discussion was dominated by everyday topics and not focussed. This was partly because some participants knew each other well and played topics back and forth. This kept the others out. I could have moderated this but wasn't clearly aware of it until it was explicitly and friendly made a topic.

Despite the unstructured format we got the following positive results:

• I had brought Emotions Revealed by Ekman (and other books) and left it lying on the table. Page turning led to the faces test and we took the test and discovered which emotions we could read or differentiate best and least. See also Emotion in the LW Wiki.
• A diabetes glucose test was demonstrated (curiosity overcame fear of being needled) and one actually had an unexpectedly elevated reading.
• When discussing how to stop smoking we turned up one option that convinced one participant: When relocating next month he will try to find a non-smoker flat-sharing community.
• We noticed that our communication cultures differed and talked about guess and ask cultures and how we could improve (I saw that this was a topic at the recent Berkeley Meetup).
• We noticed that that we had difficulty finding structure and made an explicit agenda for the next meetup.

We planned the next meetup and chose a moderator to keep us more focussed.

Lessons:

• Suitable books laying about can direct discussion to productive topics.
• Lively discussions about off-topics eat time and can keep participants out - but also provide casual athmosphere.
• Missing meetup structure can cause dissatisfaction with content uncertainness about direction.

(none of these surprising)

We also played a game (Set), had some fun, took photos and planned a next meetup.

## Native Russian speakers wanted, for help with translation of LW texts

7 11 February 2014 05:37PM

If you are native Russian speaker and are willing to help MIRI with bringing its information to Russian-speaking audience by helping with translation of key materials, e.g. the forthcoming ebook Smarter Than Us, please contact me via pm or email:

Thanks!

## Brainstorming: children's stories

7 11 February 2014 01:23PM

So I have a three-year old kid, and will usually read or tell him a bedtime story.

That is a nice opportunity to introduce new concepts, but my capacity for improvisation is limited, especially towards the end of the day. So I'm asking the good people on LessWrong for ideas. How would you wrap various lesswrongish ideas in a short story a little kid would pay attention to?

I'm mostly interested in the aspects of "practical rationality" that aren't going to be taught at school or in children's books or children's TV shows - so things like Sunk Costs, taking the outside view, wondering which side is true instead of arguing for a side, etc.

Pointers to outside sources of such stories are welcome too!

Edit: actually, if you want to share ideas of games or activities of the same kind, go ahead! :)

View more: Next