Metaphilosophical Mysteries

21Wei_Dai27 July 2010 12:55AM

Creating Friendly AI seems to require us humans to either solve most of the outstanding problems in philosophy, or to solve meta-philosophy (i.e., what is the nature of philosophy, how do we practice it, and how should we program an AI to do it?), and to do that in an amount of time measured in decades. I'm not optimistic about our chances of success, but out of these two approaches, the latter seems slightly easier, or at least less effort has already been spent on it. This post tries to take a small step in that direction, by asking a few questions that I think are worth investigating or keeping in the back of our minds, and generally raising awareness and interest in the topic.

continue reading »

Hacking the CEV for Fun and Profit

30Wei_Dai03 June 2010 08:30PM

It’s the year 2045, and Dr. Evil and the Singularity Institute have been in a long and grueling race to be the first to achieve machine intelligence, thereby controlling the course of the Singularity and the fate of the universe. Unfortunately for Dr. Evil, SIAI is ahead in the game. Its Friendly AI is undergoing final testing, and Coherent Extrapolated Volition is scheduled to begin in a week. Dr. Evil learns of this news, but there’s not much he can do, or so it seems.  He has succeeded in developing brain scanning and emulation technology, but the emulation speed is still way too slow to be competitive.

There is no way to catch up with SIAI's superior technology in time, but Dr. Evil suddenly realizes that maybe he doesn’t have to. CEV is supposed to give equal weighting to all of humanity, and surely uploads count as human. If he had enough storage space, he could simply upload himself, and then make a trillion copies of the upload. The rest of humanity would end up with less than 1% weight in CEV. Not perfect, but he could live with that. Unfortunately he only has enough storage for a few hundred uploads. What to do…

Ah ha, compression! A trillion identical copies of an object would compress down to be only a little bit larger than one copy. But would CEV count compressed identical copies to be separate individuals? Maybe, maybe not. To be sure, Dr. Evil gives each copy a unique experience before adding it to the giant compressed archive. Since they still share almost all of the same information, a trillion copies, after compression, just manages to fit inside the available space.

Now Dr. Evil sits back and relaxes. Come next week, the Singularity Institute and rest of humanity are in for a rather rude surprise!

Frequentist Magic vs. Bayesian Magic

26Wei_Dai08 April 2010 08:34PM

[I posted this to open thread a few days ago for review. I've only made some minor editorial changes since then, so no need to read it again if you've already read the draft.]

This is a belated reply to cousin_it's 2009 post Bayesian Flame, which claimed that frequentists can give calibrated estimates for unknown parameters without using priors:

And here's an ultra-short example of what frequentists can do: estimate 100 independent unknown parameters from 100 different sample data sets and have 90 of the estimates turn out to be true to fact afterward. Like, fo'real. Always 90% in the long run, truly, irrevocably and forever.

And indeed they can. Here's the simplest example that I can think of that illustrates the spirit of frequentism:

Suppose there is a machine that produces biased coins. You don't know how the machine works, except that each coin it produces is either biased towards heads (in which case each toss of the coin will land heads with probability .9 and tails with probability .1) or towards tails (in which case each toss of the coin will land tails with probability .9 and heads with probability .1). For each coin, you get to observe one toss, and then have to state whether you think it's biased towards heads or tails, and what is the probability that's the right answer.

Let's say that you decide to follow this rule: after observing heads, always answer "the coin is biased towards heads with probability .9" and after observing tails, always answer "the coin is biased towards tails with probability .9". Do this for a while, and it will turn out that 90% of the time you are right about which way the coin is biased, no matter how the machine actually works. The machine might always produce coins biased towards heads, or always towards tails, or decide based on the digits of pi, and it wouldn't matter—you'll still be right 90% of the time. (To verify this, notice that in the long run you will answer "heads" for 90% of the coins actually biased towards heads, and "tails" for 90% of the coins actually biased towards tails.) No priors needed! Magic!

continue reading »

Late Great Filter Is Not Bad News

9Wei_Dai04 April 2010 04:17AM

But I hope that our Mars probes will discover nothing. It would be good news if we find Mars to be completely sterile. Dead rocks and lifeless sands would lift my spirit.

Conversely, if we discovered traces of some simple extinct life form—some bacteria, some algae—it would be bad news. If we found fossils of something more advanced, perhaps something looking like the remnants of a trilobite or even the skeleton of a small mammal, it would be very bad news. The more complex the life we found, the more depressing the news of its existence would be. Scientifically interesting, certainly, but a bad omen for the future of the human race.

— Nick Bostrom, in Where Are They? Why I hope that the search for extraterrestrial life finds nothing

This post is a reply to Robin Hanson's recent OB post Very Bad News, as well as Nick Bostrom's 2008 paper quoted above, and assumes familiarity with Robin's Great Filter idea. (Robin's server for the Great Filter paper seems to be experiencing some kind of error. See here for a mirror.)

Suppose Omega appears and says to you:

(Scenario 1) I'm going to apply a great filter to humanity. You get to choose whether the filter is applied one minute from now, or in five years. When the designated time arrives, I'll throw a fair coin, and wipe out humanity if it lands heads. And oh, it's not the current you that gets to decide, but the version of you 4 years and 364 days from now. I'll predict his or her decision and act accordingly.

I hope it's not controversial that the current you should prefer a late filter, since (with probability .5) that gives you and everyone else five more years of life. What about the future version of you? Well, if he or she decides on the early filter, that would constitutes a time inconsistency. And for those who believe in multiverse/many-worlds theories, choosing the early filter shortens the lives of everyone in half of all universes/branches where a copy of you is making this decision, which doesn't seem like a good thing. It seems clear that, ignoring human deviations from ideal rationality, the right decision of the future you is to choose the late filter.

continue reading »

Think Before You Speak (And Signal It)

21Wei_Dai19 March 2010 10:21PM

In deciding whether to pay attention to an idea, a big clue, if it were readily available, would be how many people have checked it over for correctness, and for how long. Most new ideas that human beings come up with are wrong, and if someone just thought of something five seconds ago and excitedly wants to tell you about it, probably the only benefit of listening is not offending the person.

But it seems quite rare for this important piece of metadata to be straightforwardly declared, perhaps because such declarations can't be trusted in general. Instead, we usually have to infer it from various other clues, like the speaker's personality (how long do they typically think before they speak?), formality of the language employed to express the idea, the presence of spelling and grammar mistakes, the venue where the idea is presented or published, etc.

Unfortunately, such inferences can be imprecise or error-prone. For example, the same speaker may sometimes think a lot before speaking, and other times think little before speaking. Using costly signals like formal language is also wasteful compared to everyone simply telling the truth (but can still be a second-best solution in low-trust groups). In a community like ours, where most of us are striving to build reputations for being (or at least trying to be) rational and cooperative, and therefore there is a level of trust higher than usual, it might be worth experimenting with a norm of declaring how long we've thought about each new idea when presenting it. This may be either in addition to or as an alternative to other ways of communicating how confident we are about our ideas.

To follow my own advice, I'll say that I've thought about this topic off and on for about two weeks, and then spent about three hours writing and reviewing this post. I first started thinking about it at the SIAI decision theory workshop, which was the first time I ever worked with a large group of people on a complex problem in real time. I noticed that the variance in the amount of time different people spend thinking through new ideas before they speak is quite high. I was surprised to discover, for example, that Gary Drescher has been working on decision theory for many years and has considered and discarded about a dozen possible solutions.

The trigger for actually writing this post is yesterday's Overcoming Bias post Twin Conspiracies, which Robin seemed to have spent much less time thinking through than usual, but which has no overt indications of this. (An obvious objection that he apparently failed to consider is, wouldn't corporations actively recruit twins to be co-CEOs if they are so productive? Several OB commenters also pointed this out.) A blogger may not want to spend days poring over every post, but why not make it easier for the reader to distinguish the serious, carefully thought out ideas from the throwaway ones?

Individual vs. Group Epistemic Rationality

21Wei_Dai02 March 2010 09:46PM

It's common practice in this community to differentiate forms of rationality along the axes of epistemic vs. instrumental, and individual vs. group, giving rise to four possible combinations. I think our shared goal, as indicated by the motto "rationalists win", is ultimately to improve group instrumental rationality. Generally, improving each of these forms of rationality also tends to improve the others, but sometimes conflicts arise between them. In this post I point out one such conflict between individual epistemic rationality and group epistemic rationality.

We place a lot of emphases here on calibrating individual levels of confidence (i.e., subjective probabilities), and on the idea that rational individuals will tend to converge toward agreement about the proper level of confidence in any particular idea as they update upon available evidence. But I argue that from a group perspective, it's sometimes better to have a spread of individual levels of confidence about the individually rational level. Perhaps paradoxically, disagreements among individuals can be good for the group.

continue reading »

Explicit Optimization of Global Strategy (Fixing a Bug in UDT1)

8Wei_Dai19 February 2010 01:30AM

When describing UDT1 solutions to various sample problems, I've often talked about UDT1 finding the function S* that would optimize its preferences over the world program P, and then return what S* would return, given its input. But in my original description of UDT1, I never explicitly mentioned optimizing S as a whole, but instead specified UDT1 as, upon receiving input X, finding the optimal output Y* for that input, by considering the logical consequences of choosing various possible outputs. I have been implicitly assuming that the former (optimization of the global strategy) would somehow fall out of the latter (optimization of the local action) without having to be explicitly specified, due to how UDT1 takes into account logical correlations between different instances of itself. But recently I found an apparent counter-example to this assumption.

(I think this "bug" also exists in TDT, but I don't understand it well enough to make a definite claim. Perhaps Eliezer or someone else can tell me if TDT correctly solves the sample problem given here.)

continue reading »

Shut Up and Divide?

46Wei_Dai09 February 2010 08:09PM
During a recent discussion with komponisto about why my fellow LWers are so interested in the Amanda Knox case, his answers made me realize that I had been asking the wrong question. After all, feeling interest or even outrage after seeing a possible case of injustice seems quite natural, so perhaps a better question to ask is why am I so uninterested in the case.
Reflecting upon that, it appears that I've been doing something like Eliezer's "Shut Up and Multiply", except in reverse. Both of us noticed the obvious craziness of scope insensitivity and tried to make our emotions work more rationally. But whereas he decided to multiply his concern for individuals human beings by the population size to an enormous concern for humanity as a whole, I did the opposite. I noticed that my concern for humanity is limited, and therefore decided that it's crazy to care much about random individuals that I happen to come across. (Although I probably haven't consciously thought about it in this way until now.)
The weird thing is that both of these emotional self-modification strategies seem to have worked, at least to a great extent. Eliezer has devoted his life to improving the lot of humanity, and I've managed to pass up news and discussions about Amanda Knox without a second thought. It can't be the case that both of these ways to change how our emotions work are the right thing to do, but the apparent symmetry between them seems hard to break.
What ethical principles can we use to decide between "Shut Up and Multiply" and "Shut Up and Divide"? Why should we derive our values from our native emotional responses to seeing individual suffering, and not from the equally human paucity of response at seeing large portions of humanity suffer in aggregate? Or should we just keep our scope insensitivity, like our boredom?
And an interesting meta-question arises here as well: how much of what we think our values are, is actually the result of not thinking things through, and not realizing the implications and symmetries that exist? And if many of our values are just the result of cognitive errors or limitations, have we lived with them long enough that they've become an essential part of us?

Complexity of Value ≠ Complexity of Outcome

24Wei_Dai30 January 2010 02:50AM

Complexity of value is the thesis that our preferences, the things we care about, don't compress down to one simple rule, or a few simple rules. To review why it's important (by quoting from the wiki):

  • Caricatures of rationalists often have them moved by artificially simplified values - for example, only caring about personal pleasure. This becomes a template for arguing against rationality: X is valuable, but rationality says to only care about Y, in which case we could not value X, therefore do not be rational.
  • Underestimating the complexity of value leads to underestimating the difficulty of Friendly AI; and there are notable cognitive biases and fallacies which lead people to underestimate this complexity.

I certainly agree with both of these points. But I worry that we (at Less Wrong) might have swung a bit too far in the other direction. No, I don't think that we overestimate the complexity of our values, but rather there's a tendency to assume that complexity of value must lead to complexity of outcome, that is, agents who faithfully inherit the full complexity of human values will necessarily create a future that reflects that complexity. I will argue that it is possible for complex values to lead to simple futures, and explain the relevance of this possibility to the project of Friendly AI.

continue reading »

Value Uncertainty and the Singleton Scenario

5Wei_Dai24 January 2010 05:03AM

In January of last year, Nick Bostrom wrote a post on Overcoming Bias about his and Toby Ord’s proposed method of handling moral uncertainty. To abstract away a bit from their specific proposal, the general approach was to convert a problem involving moral uncertainty into a game of negotiation, with each player’s bargaining power determined by one’s confidence in the moral philosophy represented by that player.

Robin Hanson suggested in his comments to Nick’s post that moral uncertainty should be handled the same way we're supposed to handle ordinary uncertainty, by using standard decision theory (i.e., expected utility maximization). Nick’s reply was that many ethical systems don’t fit into the standard decision theory framework, so it’s hard to see how to combine them that way.

In this post, I suggest we look into the seemingly easier problem of value uncertainty, in which we fix a consequentialist ethical system, and just try to deal with uncertainty about values (i.e., utility function). Value uncertainty can be considered a special case of moral uncertainty in which there is no apparent obstacle to applying Robin’s suggestion. I’ll consider a specific example of a decision problem involving value uncertainty, and work out how Nick and Toby’s negotiation approach differs in its treatment of the problem from standard decision theory. Besides showing the difference in the approaches, I think the specific problem is also quite important in its own right.

The problem I want to consider is, suppose we believe that a singleton scenario is very unlikely, but may have very high utility if it were realized, should we focus most of our attention and effort into trying to increase its probability and/or improve its outcome? The main issue here is (putting aside uncertainty about what will happen after a singleton scenario is realized) uncertainty about how much we value what is likely to happen.

continue reading »

View more: Next