L-zombies! (L-zombies?)

21 Benja 07 February 2014 06:30PM

Reply to: Benja2010's Self-modification is the correct justification for updateless decision theory; Wei Dai's Late great filter is not bad news

"P-zombie" is short for "philosophical zombie", but here I'm going to re-interpret it as standing for "physical philosophical zombie", and contrast it to what I call an "l-zombie", for "logical philosophical zombie".

A p-zombie is an ordinary human body with an ordinary human brain that does all the usual things that human brains do, such as the things that cause us to move our mouths and say "I think, therefore I am", but that isn't conscious. (The usual consensus on LW is that p-zombies can't exist, but some philosophers disagree.) The notion of p-zombie accepts that human behavior is produced by physical, computable processes, but imagines that these physical processes don't produce conscious experience without some additional epiphenomenal factor.

An l-zombie is a human being that could have existed, but doesn't: a Turing machine which, if anybody ever ran it, would compute that human's thought processes (and its interactions with a simulated environment); that would, if anybody ever ran it, compute the human saying "I think, therefore I am"; but that never gets run, and therefore isn't conscious. (If it's conscious anyway, it's not an l-zombie by this definition.) The notion of l-zombie accepts that human behavior is produced by computable processes, but supposes that these computational processes don't produce conscious experience without being physically instantiated.

Actually, there probably aren't any l-zombies: The way the evidence is pointing, it seems like we probably live in a spatially infinite universe where every physically possible human brain is instantiated somewhere, although some are instantiated less frequently than others; and if that's not true, there are the "bubble universes" arising from cosmological inflation, the branches of many-worlds quantum mechanics, and Tegmark's "level IV" multiverse of all mathematical structures, all suggesting again that all possible human brains are in fact instantiated. But (a) I don't think that even with all that evidence, we can be overwhelmingly certain that all brains are instantiated; and, more importantly actually, (b) I think that thinking about l-zombies can yield some useful insights into how to think about worlds where all humans exist, but some of them have more measure ("magical reality fluid") than others.

So I ask: Suppose that we do indeed live in a world with l-zombies, where only some of all mathematically possible humans exist physically, and only those that do have conscious experiences. How should someone living in such a world reason about their experiences, and how should they make decisions — keeping in mind that if they were an l-zombie, they would still say "I have conscious experiences, so clearly I can't be an l-zombie"?

If we can't update on our experiences to conclude that someone having these experiences must exist in the physical world, then we must of course conclude that we are almost certainly l-zombies: After all, if the physical universe isn't combinatorially large, the vast majority of mathematically possible conscious human experiences are not instantiated. You might argue that the universe you live in seems to run on relatively simple physical rules, so it should have high prior probability; but we haven't really figured out the exact rules of our universe, and although what we understand seems compatible with the hypothesis that there are simple underlying rules, that's not really proof that there are such underlying rules, if "the real universe has simple rules, but we are l-zombies living in some random simulation with a hodgepodge of rules (that isn't actually ran)" has the same prior probability; and worse, if you don't have all we do know about these rules loaded into your brain right now, you can't really verify that they make sense, since there is some mathematically possible simulation whose initial state has you remember seeing evidence that such simple rules exist, even if they don't; and much worse still, even if there are such simple rules, what evidence do you have that if these rules were actually executed, they would produce you? Only the fact that you, like, exist, but we're asking what happens if we don't let you update on that.

I find myself quite unwilling to accept this conclusion that I shouldn't update, in the world we're talking about. I mean, I actually have conscious experiences. I, like, feel them and stuff! Yes, true, my slightly altered alter ego would reason the same way, and it would be wrong; but I'm right...

...and that actually seems to offer a way out of the conundrum: Suppose that I decide to update on my experience. Then so will my alter ego, the l-zombie. This leads to a lot of l-zombies concluding "I think, therefore I am", and being wrong, and a lot of actual people concluding "I think, therefore I am", and being right. All the thoughts that are actually consciously experienced are, in fact, correct. This doesn't seem like such a terrible outcome. Therefore, I'm willing to provisionally endorse the reasoning "I think, therefore I am", and to endorse updating on the fact that I have conscious experiences to draw inferences about physical reality — taking into account the simulation argument, of course, and conditioning on living in a small universe, which is all I'm discussing in this post.

NB. There's still something quite uncomfortable about the idea that all of my behavior, including the fact that I say "I think therefore I am", is explained by the mathematical process, but actually being conscious requires some extra magical reality fluid. So I still feel confused, and using the word l-zombie in analogy to p-zombie is a way of highlighting that. But this line of reasoning still feels like progress. FWIW.

But if that's how we justify believing that we physically exist, that has some implications for how we should decide what to do. The argument is that nothing very bad happens if the l-zombies wrongly conclude that they actually exist. Mostly, that also seems to be true if they act on that belief: mostly, what l-zombies do doesn't seem to influence what happens in the real world, so if only things that actually happen are morally important, it doesn't seem to matter what the l-zombies decide to do. But there are exceptions.

Consider the counterfactual mugging: Accurate and trustworthy Omega appears to you and explains that it just has thrown a very biased coin that had only a 1/1000 chance of landing heads. As it turns out, this coin has in fact landed heads, and now Omega is offering you a choice: It can either (A) create a Friendly AI or (B) destroy humanity. Which would you like? There is a catch, though: Before it threw the coin, Omega made a prediction about what you would do if the coin fell heads (and it was able to make a confident prediction about what you would choose). If the coin had fallen tails, it would have created an FAI if it has predicted that you'd choose (B), and it would have destroyed humanity if it has predicted that you would choose (A). (If it hadn't been able to make a confident prediction about what you would choose, it would just have destroyed humanity outright.)

There is a clear argument that, if you expect to find yourself in a situation like this in the future, you would want to self-modify into somebody who would choose (B), since this gives humanity a much larger chance of survival. Thus, a decision theory stable under self-modification would answer (B). But if you update on the fact that you consciously experience Omega telling you that the coin landed heads, (A) would seem to be the better choice!

One way of looking at this is that if the coin falls tails, the l-zombie that is told the coin landed heads still exists mathematically, and this l-zombie now has the power to influence what happens in the real world. If the argument for updating was that nothing bad happens even though the l-zombies get it wrong, well, that argument breaks here. The mathematical process that is your mind doesn't have any evidence about whether the coin landed heads or tails, because as a mathematical object it exists in both possible worlds, and it has to make a decision in both worlds, and that decision affects humanity's future in both worlds.

Back in 2010, I wrote a post arguing that yes, you would want to self-modify into something that would choose (B), but that that was the only reason why you'd want to choose (B). Here's a variation on the above scenario that illustrates the point I was trying to make back then: Suppose that Omega tells you that it actually threw its coin a million years ago, and if it had fallen tails, it would have turned Alpha Centauri purple. Now throughout your history, the argument goes, you would never have had any motive to self-modify into something that chooses (B) in this particular scenario, because you've always known that Alpha Centauri isn't, in fact, purple.

But this argument assumes that you know you're not a l-zombie; if the coin had in fact fallen tails, you wouldn't exist as a conscious being, but you'd still exist as a mathematical decision-making process, and that process would be able to influence the real world, so you-the-decision-process can't reason that "I think, therefore I am, therefore the coin must have fallen heads, therefore I should choose (A)." Partly because of this, I now accept choosing (B) as the (most likely to be) correct choice even in that case. (The rest of my change in opinion has to do with all ways of making my earlier intuition formal getting into trouble in decision problems where you can influence whether you're brought into existence, but that's a topic for another post.)

However, should you feel cheerful while you're announcing your choice of (B), since with high (prior) probability, you've just saved humanity? That would lead to an actual conscious being feeling cheerful if the coin has landed heads and humanity is going to be destroyed, and an l-zombie computing, but not actually experiencing, cheerfulness if the coin has landed heads and humanity is going to be saved. Nothing good comes out of feeling cheerful, not even alignment of a conscious' being's map with the physical territory. So I think the correct thing is to choose (B), and to be deeply sad about it.

You may be asking why I should care what the right probabilities to assign or the right feelings to have are, since these don't seem to play any role in making decisions; sometimes you make your decisions as if updating on your conscious experience, but sometimes you don't, and you always get the right answer if you don't update in the first place. Indeed, I expect that the "correct" design for an AI is to fundamentally use (more precisely: approximate) updateless decision theory (though I also expect that probabilities updated on the AI's sensory input will be useful for many intermediate computations), and "I compute, therefore I am"-style reasoning will play no fundamental role in the AI. And I think the same is true for humans' decisions — the correct way to act is given by updateless reasoning. But as a human, I find myself unsatisfied by not being able to have a picture of what the physical world probably looks like. I may not need one to figure out how I should act; I still want one, not for instrumental reasons, but because I want one. In a small universe where most mathematically possible humans are l-zombies, the argument in this post seems to give me a justification to say "I think, therefore I am, therefore probably I either live in a simulation or what I've learned about the laws of physics describes how the real world works (even though there are many l-zombies who are thinking similar thoughts but are wrong about them)."

And because of this, even though I disagree with my 2010 post, I also still disagree with Wei Dai's 2010 post arguing that a late Great Filter is good news, which my own 2010 post was trying to argue against. Wei argued that if Omega gave you a choice between (A) destroying the world now and (B) having Omega destroy the world a million years ago (so that you are never instantiated as a conscious being, though your choice as an l-zombie still influences the real world), then you would choose (A), to give humanity at least the time it's had so far. Wei concluded that this means that if you learned that the Great Filter is in our future, rather than our past, that must be good news, since if you could choose where to place the filter, you should place it in the future. I now agree with Wei that (A) is the right choice, but I don't think that you should be happy about it. And similarly, I don't think you should be happy about news that tells you that the Great Filter is later than you might have expected.

Beware Trivial Fears

37 Stabilizer 04 February 2014 05:40AM

Does the surveillance state affect us? It has affected me, and I didn't realize that it was affecting me until recently. I give a few examples of how it has affected me:

  1. I was once engaged in a discussion on Facebook about Obama's foreign policy. Around that time, I was going to apply for a US visa. I stopped the discussion early. Semi-consciously, I was worried that what I was writing would be checked by US visa officials and would lead to my visa being denied.
  2. I was once really interested in reading up on the Unabomber and his manifesto, because somebody mentioned that he had some interesting ideas, and though fundamentally misguided, he might have been onto something. I didn't explore much because I was worried---again semi-consciously---that my traffic history would be logged on some NSA computer somewhere, and that I'd pattern match to the Unabomber (I'm a physics grad student, the Unabomber was a mathematician).
  3. I didn't visit Silk Road as I was worried that my visits would be traced, even though I had no plans of buying anything.
  4. Just generally, I try to not search for some really weird stuff that I want to search for (I'm a curious guy!). 
  5. I was almost not going to write this post. 
And these are just the ones that I became conscious of. I wonder how many more have slipped under the radar.

Yes, I know these fears are silly. In fact, writing them out makes them feel even more silly. But they still affected my behavior. Now, I may be atypical. But I'm sure I'm not that atypical. I'm sure many, many people refrain from visiting and exploring parts of the Internet and writing things on different forums and blogs because of the fear of being recorded and the data being used against them. Especially susceptible to this fear are immigrants.

In Beware Trivial Inconveniences, Yvain points out that the Great Firewall of China is very easy to bypass but the vast majority of Chinese people don't bypass it because it's a trivial inconvenience.

I would like to introduce the analogous and very related concept of a trivial fear: fear of low probability events that affects behavior in a major way, especially over a large population. Much more insidiously, the people experiencing these fears don't even realize they're experiencing it: because the fear is of small magnitude, it can be rationalized away easily.

In this particular case, the fear acts in a way so as to restrict the desire for information and free speech.

In a recent conversation, a friend mentioned that calling the modern surveillance state 'Orwellian' is hyperbole. Maybe so. I don't know if the surveillance state is a Good Thing or a Bad Thing. I'm not an economist or a political scientist or a moral philosopher. I simply want to point out that the main lesson from 1984 is not the exact details of the dystopia, but the fact that the people living in the dystopia weren't even remotely aware that they were living in one.

Flashes of Nondecisionmaking

28 lionhearted 27 January 2014 02:30PM

If you crash a bicycle and cut your knee, it bleeds. You can apply pressure to the wound or otherwise aid in clotting it, but you can't fully control the blood. You can't think, "Body! I command you not to bleed!" Nor can you directly say, "I choose not to bleed" through pure will alone.

This is easy enough to understand. We don't have direct control over our blood. We can apply some measure of indirect to it -- taking aspirin might thin the blood, breathing deeply and relaxing might slow the pulse and the flow of blood slightly -- but we do not have direct and instant control over the flow of our blood.

That's our blood. It's quite a personal thing, when you think about it.

At the same time, there's a view that we have full control and choice over our actions in a given situation.

I no longer believe this to be the case.

We can staunch the flow of bleeding through applying pressure, a cloth, perhaps slowing down our pulse and bloodflow through lowering stress and deep breathing. But we can't, in the moment, command or control blood by force of will or mind alone.

Likewise, I'm starting to believe we have lots of indirect control over our patterns of action in our lives, but perhaps less control and command in individual moments.

When a person rolls out of bed, they usually do very similar things each morning. How much control or command do they have -- mentally or analytically or however you want to define it -- over these actions?

Not much, I'd say.

Yet, they have immense indirect control, similar to blood flow. If you normally lay out your clothes the night before, and you lay out running clothes instead of work clothes, and set your alarm for an hour earlier, your chances of running go up a lot. There still may be an element of choice or self-command when you decide to run or not, but it's very possible there wasn't choice or self-command available if you did not rearrange your environment with that sort of indirect pressure.

I had an experience recently that was incredibly distressing. It was strange and very unpleasant at the time, but I'm now thankful for it.

I was at a convenience store when I realized I was in the process of buying some junk food and energy drinks.

My mind recognized this, but seemingly had not so much say on what's going on. My legs were just walking the familiar convenience store aisles near my home, picking up two of this energy drink, one of that pack of peanut M&M's, and so on.

I don't know if I could have stopped the pattern and put the items back in the moment. At the time, I was shocked to realize that I was watching myself act, but I hadn't stopped and started thinking or pondering. My legs and hands were working seemingly slightly independent of myself.

At the time, it was like a bad dream, or some sort of miserable and crazy experience. I shrugged it off -- strange things happen, you know? -- but I kept thinking about it periodically.

I'd been training in meditation and impulse control a lot over the last six months, and been studying and experimenting a bit about how our minds work and cognitive psychology.

My realization now, quite a while later, is that the distressing experience at the convenience store -- "what the hell is going on here, I am seemingly not controlling my actions!"-- was actually the beginning of a flash of a greater awareness of my day-to-day life.

I believe now that we're constantly in nondecisionmaking mode. We're constantly running patterns or taking actions without conscious command or choice, similar to blood running from a cut.

This process can be managed indirectly and affected, including in the moment it's happening if we're aware of it. But oftentimes, we don't even know we're metaphorically bleeding. We're just doing things, some of them "smart", some of them stupid and harmful.

I've had more flashes of awareness, seeing myself running mechanical patterns during times I normally wouldn't have noticed them. Briefly, here and there. I've been sometimes able to radically course correct and do something entirely different. Othertimes, I try and fail to do something different. I haven't had a moment as puzzling as that first convenience store one.

There's perhaps two takeaways here. The first is that greater training in awareness and meditation can lead to "waking up" or noticing the situation you're in more often. You probably already knew that.

But the second and more important one, I think, is the idea that things that seem like choices aren't always so. We don't choose to bleed if we cut our knee. Once we realize we're bleeding, we can apply indirect pressure, de-stress, use external things like cloth or bandages, and otherwise manage the situation. We can also buy more protective clothing or improve our technique for the future, so we bleed less. But we can't simply say "Body, I command you not to bleed" nor "I choose not to bleed" if we are, in fact, bleeding.

Indirect influence and control, immense amounts. More than most people realize. Direct influence and control? Perhaps not as much as commonly believed.

No Universally Compelling Arguments in Math or Science

30 ChrisHallquist 05 November 2013 03:32AM

Last week, I started a thread on the widespread sentiment that people don't understand the metaethics sequence. One of the things that surprised me most in the thread was this exchange:

Commenter: "I happen to (mostly) agree that there aren't universally compelling arguments, but I still wish there were. The metaethics sequence failed to talk me out of valuing this."

Me: "But you realize that Eliezer is arguing that there aren't universally compelling arguments in any domain, including mathematics or science? So if that doesn't threaten the objectivity of mathematics or science, why should that threaten the objectivity of morality?"

Commenter: "Waah? Of course there are universally compelling arguments in math and science."

Now, I realize this is just one commenter. But the most-upvoted comment in the thread also perceived "no universally compelling arguments" as a major source of confusion, suggesting that it was perceived as conflicting with morality not being arbitrary. And today, someone mentioned having "no universally compelling arguments" cited at them as a decisive refutation of moral realism.

After the exchange quoted above, I went back and read the original No Universally Compelling Arguments post, and realized that while it had been obvious to me when I read it that Eliezer meant it to apply to everything, math and science included, it was rather short on concrete examples, perhaps in violation of Eliezer's own advice. The concrete examples can be found in the sequences, though... just not in that particular post.

continue reading »

A New Interpretation of the Marshmallow Test

73 elharo 05 July 2013 12:22PM

I've begun to notice a pattern with experiments in behavioral economics. An experiment produces a result that's counter-intuitive and surprising, and demonstrates that people don't behave as rationally as expected. Then, as time passes, other researchers contrive different versions of the experiment that show the experiment may not have been about what we thought it was about in the first place. For example, in the dictator game, Jeffrey Winking and Nicholas Mizer changed the experiment so that the participants didn't know each other and the subjects didn't know they were in an experiment. With this simple adjustment that made the conditions of the game more realistic, the "dictators" switched from giving away a large portion of their unearned gains to giving away nothing. Now it's happened to the marshmallow test.

In the original Stanford marshmallow experiment, children were given one marshmallow. They could eat the marshmallow right away; or, if they waited fifteen minutes for the experimenter to return without eating the marshmallow, they'd get a second marshmallow. Even more interestingly, in follow-up studies two decades later, the children who waited longer for the second marshmallow, i.e. showed delayed gratification, had higher SAT scores, school performance, and even improved Body Mass Index. This is normally interpreted as indicating the importance of self-control and delayed gratification for life success.

Not so fast.

In a new variant of the experiment entitled (I kid you not) "Rational snacking", Celeste Kidd, Holly Palmeri, and Richard N. Aslin from the University of Rochester gave the children a similar test with an interesting twist.

They assigned 28 children to two groups asked to perform art projects. Children in the first group each received half a container of used crayons, and were told that if they could wait, the researcher would bring them more and better art supplies. However, after two and a half minutes, the adult returned and told the child they had made a mistake, and there were no more art supplies so they'd have to use the original crayons.

In part 2, the adult gave the child a single sticker and told the child that if they waited, the adult would bring them more stickers to use. Again the adult reneged.

Children in the second group went through the same routine except this time the adult fulfilled their promises, bringing the children more and better art supplies and several large stickers.

After these two events, the experimenters repeated the classic marshmallow test with both groups. The results demonstrated children were a lot more rational than we might have thought. Of the 14 children in group 1, who had been shown that the experimenters were unreliable adults, 13 of them ate the first marshmallow. 8 of the 14 children in the reliable adult group, waited out the fifteen minutes. On average children in unreliable group 1 waited only 3 minutes, and those in reliable group 2 waited 12 minutes.

So maybe what the longitudinal studies show is that children who come from an environment where they have learned to be more trusting have better life outcomes. I make absolutely no claims as to which direction the arrow of causality may run, or whether it's pure correlation with other factors. For instance, maybe breastfeeding increases both trust and academic performance. But any way you interpret these results, the case for the importance and even the existence of innate self-control is looking a lot weaker.

Do Earths with slower economic growth have a better chance at FAI?

30 Eliezer_Yudkowsky 12 June 2013 07:54PM

I was raised as a good and proper child of the Enlightenment who grew up reading The Incredible Bread Machine and A Step Farther Out, taking for granted that economic growth was a huge in-practice component of human utility (plausibly the majority component if you asked yourself what was the major difference between the 21st century and the Middle Ages) and that the "Small is Beautiful" / "Sustainable Growth" crowds were living in impossible dreamworlds that rejected quantitative thinking in favor of protesting against nuclear power plants.

And so far as I know, such a view would still be an excellent first-order approximation if we were going to carry on into the future by steady technological progress:  Economic growth = good.

But suppose my main-line projection is correct and the "probability of an OK outcome" / "astronomical benefit" scenario essentially comes down to a race between Friendly AI and unFriendly AI.  So far as I can tell, the most likely reason we wouldn't get Friendly AI is the total serial research depth required to develop and implement a strong-enough theory of stable self-improvement with a possible side order of failing to solve the goal transfer problem.  Relative to UFAI, FAI work seems like it would be mathier and more insight-based, where UFAI can more easily cobble together lots of pieces.  This means that UFAI parallelizes better than FAI.  UFAI also probably benefits from brute-force computing power more than FAI.  Both of these imply, so far as I can tell, that slower economic growth is good news for FAI; it lengthens the deadline to UFAI and gives us more time to get the job done.  I have sometimes thought half-jokingly and half-anthropically that I ought to try to find investment scenarios based on a continued Great Stagnation and an indefinite Great Recession where the whole developed world slowly goes the way of Spain, because these scenarios would account for a majority of surviving Everett branches.

Roughly, it seems to me like higher economic growth speeds up time and this is not a good thing.  I wish I had more time, not less, in which to work on FAI; I would prefer worlds in which this research can proceed at a relatively less frenzied pace and still succeed, worlds in which the default timelines to UFAI terminate in 2055 instead of 2035.

I have various cute ideas for things which could improve a country's economic growth.  The chance of these things eventuating seems small, the chance that they eventuate because I write about them seems tiny, and they would be good mainly for entertainment, links from econblogs, and possibly marginally impressing some people.  I was thinking about collecting them into a post called "The Nice Things We Can't Have" based on my prediction that various forces will block, e.g., the all-robotic all-electric car grid which could be relatively trivial to build using present-day technology - that we are too far into the Great Stagnation and the bureaucratic maturity of developed countries to get nice things anymore.  However I have a certain inhibition against trying things that would make everyone worse off if they actually succeeded, even if the probability of success is tiny.  And it's not completely impossible that we'll see some actual experiments with small nation-states in the next few decades, that some of the people doing those experiments will have read Less Wrong, or that successful experiments will spread (if the US ever legalizes robotic cars or tries a city with an all-robotic fleet, it'll be because China or Dubai or New Zealand tried it first).  Other EAs (effective altruists) care much more strongly about economic growth directly and are trying to increase it directly.  (An extremely understandable position which would typically be taken by good and virtuous people).

Throwing out remote, contrived scenarios where something accomplishes the opposite of its intended effect is cheap and meaningless (vide "But what if MIRI accomplishes the opposite of its purpose due to blah") but in this case I feel impelled to ask because my mainline visualization has the Great Stagnation being good news.  I certainly wish that economic growth would align with FAI because then my virtues would align and my optimal policies have fewer downsides, but I am also aware that wishing does not make something more likely (or less likely) in reality.

To head off some obvious types of bad reasoning in advance:  Yes, higher economic growth frees up resources for effective altruism and thereby increases resources going to FAI, but it also increases resources going to the AI field generally which is mostly pushing UFAI, and the problem arguendo is that UFAI parallelizes more easily.

Similarly, a planet with generally higher economic growth might develop intelligence amplification (IA) technology earlier.  But this general advancement of science will also accelerate UFAI, so you might just be decreasing the amount of FAI research that gets done before IA and decreasing the amount of time available after IA before UFAI.  Similarly to the more mundane idea that increased economic growth will produce more geniuses some of whom can work on FAI; there'd also be more geniuses working on UFAI, and UFAI probably parallelizes better and requires less serial depth of research.  If you concentrate on some single good effect on blah and neglect the corresponding speeding-up of UFAI timelines, you will obviously be able to generate spurious arguments for economic growth having a positive effect on the balance.

So I pose the question:  "Is slower economic growth good news?" or "Do you think Everett branches with 4% or 1% RGDP growth have a better chance of getting FAI before UFAI"?  So far as I can tell, my current mainline guesses imply, "Everett branches with slower economic growth contain more serial depth of cognitive causality and have more effective time left on the clock before they end due to UFAI, which favors FAI research over UFAI research".

This seems like a good parameter to have a grasp on for any number of reasons, and I can't recall it previously being debated in the x-risk / EA community.

EDIT:  To be clear, the idea is not that trying to deliberately slow world economic growth would be a maximally effective use of EA resources and better than current top targets; this seems likely to have very small marginal effects, and many such courses are risky.  The question is whether a good and virtuous person ought to avoid, or alternatively seize, any opportunities which come their way to help out on world economic growth.

EDIT 2:  Carl Shulman's opinion can be found on the Facebook discussion here.

Robust Cooperation in the Prisoner's Dilemma

69 orthonormal 07 June 2013 08:30AM

I'm proud to announce the preprint of Robust Cooperation in the Prisoner's Dilemma: Program Equilibrium via Provability Logic, a joint paper with Mihaly Barasz, Paul Christiano, Benja Fallenstein, Marcello Herreshoff, Patrick LaVictoire (me), and Eliezer Yudkowsky.

This paper was one of three projects to come out of the 2nd MIRI Workshop on Probability and Reflection in April 2013, and had its genesis in ideas about formalizations of decision theory that have appeared on LessWrong. (At the end of this post, I'll include links for further reading.)

Below, I'll briefly outline the problem we considered, the results we proved, and the (many) open questions that remain. Thanks in advance for your thoughts and suggestions!

Background: Writing programs to play the PD with source code swap

(If you're not familiar with the Prisoner's Dilemma, see here.)

The paper concerns the following setup, which has come up in academic research on game theory: say that you have the chance to write a computer program X, which takes in one input and returns either Cooperate or Defect. This program will face off against some other computer program Y, but with a twist: X will receive the source code of Y as input, and Y will receive the source code of X as input. And you will be given your program's winnings, so you should think carefully about what sort of program you'd write!

Of course, you could simply write a program that defects regardless of its input; we call this program DefectBot, and call the program that cooperates on all inputs CooperateBot. But with the wealth of information afforded by the setup, you might wonder if there's some program that might be able to achieve mutual cooperation in situations where DefectBot achieves mutual defection, without thereby risking a sucker's payoff. (Douglas Hofstadter would call this a perfect opportunity for superrationality...)

Previously known: CliqueBot and FairBot

And indeed, there's a way to do this that's been known since at least the 1980s. You can write a computer program that knows its own source code, compares it to the input, and returns C if and only if the two are identical (and D otherwise). Thus it achieves mutual cooperation in one important case where it intuitively ought to: when playing against itself! We call this program CliqueBot, since it cooperates only with the "clique" of agents identical to itself.

There's one particularly irksome issue with CliqueBot, and that's the fragility of its cooperation. If two people write functionally analogous but syntactically different versions of it, those programs will defect against one another! This problem can be patched somewhat, but not fully fixed. Moreover, mutual cooperation might be the best strategy against some agents that are not even functionally identical, and extending this approach requires you to explicitly delineate the list of programs that you're willing to cooperate with. Is there a more flexible and robust kind of program you could write instead?

As it turns out, there is: in a 2010 post on LessWrong, cousin_it introduced an algorithm that we now call FairBot. Given the source code of Y, FairBot searches for a proof (of less than some large fixed length) that Y returns C when given the source code of FairBot, and then returns C if and only if it discovers such a proof (otherwise it returns D). Clearly, if our proof system is consistent, FairBot only cooperates when that cooperation will be mutual. But the really fascinating thing is what happens when you play two versions of FairBot against each other. Intuitively, it seems that either mutual cooperation or mutual defection would be stable outcomes, but it turns out that if their limits on proof lengths are sufficiently high, they will achieve mutual cooperation!

The proof that they mutually cooperate follows from a bounded version of Löb's Theorem from mathematical logic. (If you're not familiar with this result, you might enjoy Eliezer's Cartoon Guide to Löb's Theorem, which is a correct formal proof written in much more intuitive notation.) Essentially, the asymmetry comes from the fact that both programs are searching for the same outcome, so that a short proof that one of them cooperates leads to a short proof that the other cooperates, and vice versa. (The opposite is not true, because the formal system can't know it won't find a contradiction. This is a subtle but essential feature of mathematical logic!)

Generalization: Modal Agents

Unfortunately, FairBot isn't what I'd consider an ideal program to write: it happily cooperates with CooperateBot, when it could do better by defecting. This is problematic because in real life, the world isn't separated into agents and non-agents, and any natural phenomenon that doesn't predict your actions can be thought of as a CooperateBot (or a DefectBot). You don't want your agent to be making concessions to rocks that happened not to fall on them. (There's an important caveat: some things have utility functions that you care about, but don't have sufficient ability to predicate their actions on yours. In that case, though, it wouldn't be a true Prisoner's Dilemma if your values actually prefer the outcome (C,C) to (D,C).)

However, FairBot belongs to a promising class of algorithms: those that decide on their action by looking for short proofs of logical statements that concern their opponent's actions. In fact, there's a really convenient mathematical structure that's analogous to the class of such algorithms: the modal logic of provability (known as GL, for Gödel-Löb).

So that's the subject of this preprint: what can we achieve in decision theory by considering agents defined by formulas of provability logic?

continue reading »

Tiling Agents for Self-Modifying AI (OPFAI #2)

55 Eliezer_Yudkowsky 06 June 2013 08:24PM

An early draft of publication #2 in the Open Problems in Friendly AI series is now available:  Tiling Agents for Self-Modifying AI, and the Lobian Obstacle.  ~20,000 words, aimed at mathematicians or the highly mathematically literate.  The research reported on was conducted by Yudkowsky and Herreshoff, substantially refined at the November 2012 MIRI Workshop with Mihaly Barasz and Paul Christiano, and refined further at the April 2013 MIRI Workshop.

Abstract:

We model self-modication in AI by introducing 'tiling' agents whose decision systems will approve the construction of highly similar agents, creating a repeating pattern (including similarity of the offspring's goals).  Constructing a formalism in the most straightforward way produces a Godelian difficulty, the Lobian obstacle.  By technical methods we demonstrate the possibility of avoiding this obstacle, but the underlying puzzles of rational coherence are thus only partially addressed.  We extend the formalism to partially unknown deterministic environments, and show a very crude extension to probabilistic environments and expected utility; but the problem of finding a fundamental decision criterion for self-modifying probabilistic agents remains open.

Commenting here is the preferred venue for discussion of the paper.  This is an early draft and has not been reviewed, so it may contain mathematical errors, and reporting of these will be much appreciated.

The overall agenda of the paper is introduce the conceptual notion of a self-reproducing decision pattern which includes reproduction of the goal or utility function, by exposing a particular possible problem with a tiling logical decision pattern and coming up with some partial technical solutions.  This then makes it conceptually much clearer to point out the even deeper problems with "We can't yet describe a probabilistic way to do this because of non-monotonicity" and "We don't have a good bounded way to do this because maximization is impossible, satisficing is too weak and Schmidhuber's swapping criterion is underspecified."  The paper uses first-order logic (FOL) because FOL has a lot of useful standard machinery for reflection which we can then invoke; in real life, FOL is of course a poor representational fit to most real-world environments outside a human-constructed computer chip with thermodynamically expensive crisp variable states.

As further background, the idea that something-like-proof might be relevant to Friendly AI is not about achieving some chimera of absolute safety-feeling, but rather about the idea that the total probability of catastrophic failure should not have a significant conditionally independent component on each self-modification, and that self-modification will (at least in initial stages) take place within the highly deterministic environment of a computer chip.  This means that statistical testing methods (e.g. an evolutionary algorithm's evaluation of average fitness on a set of test problems) are not suitable for self-modifications which can potentially induce catastrophic failure (e.g. of parts of code that can affect the representation or interpretation of the goals).  Mathematical proofs have the property that they are as strong as their axioms and have no significant conditionally independent per-step failure probability if their axioms are semantically true, which suggests that something like mathematical reasoning may be appropriate for certain particular types of self-modification during some developmental stages.

Thus the content of the paper is very far off from how a realistic AI would work, but conversely, if you can't even answer the kinds of simple problems posed within the paper (both those we partially solve and those we only pose) then you must be very far off from being able to build a stable self-modifying AI.  Being able to say how to build a theoretical device that would play perfect chess given infinite computing power, is very far off from the ability to build Deep Blue.  However, if you can't even say how to play perfect chess given infinite computing power, you are confused about the rules of the chess or the structure of chess-playing computation in a way that would make it entirely hopeless for you to figure out how to build a bounded chess-player.  Thus "In real life we're always bounded" is no excuse for not being able to solve the much simpler unbounded form of the problem, and being able to describe the infinite chess-player would be substantial and useful conceptual progress compared to not being able to do that.  We can't be absolutely certain that an analogous situation holds between solving the challenges posed in the paper, and realistic self-modifying AIs with stable goal systems, but every line of investigation has to start somewhere.

Parts of the paper will be easier to understand if you've read Highly Advanced Epistemology 101 For Beginners including the parts on correspondence theories of truth (relevant to section 6) and model-theoretic semantics of logic (relevant to 3, 4, and 6), and there are footnotes intended to make the paper somewhat more accessible than usual, but the paper is still essentially aimed at mathematically sophisticated readers.

Being Half-Rational About Pascal's Wager is Even Worse

18 Eliezer_Yudkowsky 18 April 2013 05:20AM

For so long as I can remember, I have rejected Pascal's Wager in all its forms on sheerly practical grounds: anyone who tries to plan out their life by chasing a 1 in 10,000 chance of a huge payoff is almost certainly doomed in practice.  This kind of clever reasoning never pays off in real life...

...unless you have also underestimated the allegedly tiny chance of the large impact.

For example.  At one critical junction in history, Leo Szilard, the first physicist to see the possibility of fission chain reactions and hence practical nuclear weapons, was trying to persuade Enrico Fermi to take the issue seriously, in the company of a more prestigious friend, Isidor Rabi:

I said to him:  "Did you talk to Fermi?"  Rabi said, "Yes, I did."  I said, "What did Fermi say?"  Rabi said, "Fermi said 'Nuts!'"  So I said, "Why did he say 'Nuts!'?" and Rabi said, "Well, I don't know, but he is in and we can ask him." So we went over to Fermi's office, and Rabi said to Fermi, "Look, Fermi, I told you what Szilard thought and you said ‘Nuts!' and Szilard wants to know why you said ‘Nuts!'" So Fermi said, "Well… there is the remote possibility that neutrons may be emitted in the fission of uranium and then of course perhaps a chain reaction can be made." Rabi said, "What do you mean by ‘remote possibility'?" and Fermi said, "Well, ten per cent." Rabi said, "Ten per cent is not a remote possibility if it means that we may die of it.  If I have pneumonia and the doctor tells me that there is a remote possibility that I might die, and it's ten percent, I get excited about it."  (Quoted in 'The Making of the Atomic Bomb' by Richard Rhodes.)

This might look at first like a successful application of "multiplying a low probability by a high impact", but I would reject that this was really going on.  Where the heck did Fermi get that 10% figure for his 'remote possibility', especially considering that fission chain reactions did in fact turn out to be possible?  If some sort of reasoning had told us that a fission chain reaction was improbable, then after it turned out to be reality, good procedure would have us go back and check our reasoning to see what went wrong, and figure out how to adjust our way of thinking so as to not make the same mistake again.  So far as I know, there was no physical reason whatsoever to think a fission chain reaction was only a ten percent probability.  They had not been demonstrated experimentally, to be sure; but they were still the default projection from what was already known.  If you'd been told in the 1930s that fission chain reactions were impossible, you would've been told something that implied new physical facts unknown to current science (and indeed, no such facts existed).  After reading enough historical instances of famous scientists dismissing things as impossible when there was no physical logic to say that it was even improbable, one cynically suspects that some prestigious scientists perhaps came to conceive of themselves as senior people who ought to be skeptical about things, and that Fermi was just reacting emotionally.  The lesson I draw from this historical case is not that it's a good idea to go around multiplying ten percent probabilities by large impacts, but that Fermi should not have pulled out a number as low as ten percent.

Having seen enough conversations involving made-up probabilities to become cynical, I also strongly suspect that if Fermi had foreseen how Rabi would reply, Fermi would've said "One percent".  If Fermi had expected Rabi to say "One percent is not small if..." then Fermi would've said "One in ten thousand" or "Too small to consider" - whatever he thought would get him off the hook.  Perhaps I am being too unkind to Fermi, who was a famously great estimator; Fermi may well have performed some sort of lawful probability estimate on the spot.  But Fermi is also the one who said that nuclear energy was fifty years off in the unlikely event it could be done at all, two years (IIRC) before Fermi himself oversaw the construction of the first nuclear pile.  Where did Fermi get that fifty-year number from?  This sort of thing does make me more likely to believe that Fermi, in playing the role of the solemn doubter, was just Making Things Up; and this is no less a sin when you make up skeptical things.  And if this cynicism is right, then we cannot learn the lesson that it is wise to multiply small probabilities by large impacts because this is what saved Fermi - if Fermi had known the rule, if he had seen it coming, he would have just Made Up an even smaller probability to get himself off the hook.  It would have been so very easy and convenient to say, "One in ten thousand, there's no experimental proof and most ideas like that are wrong!  Think of all the conjunctive probabilities that have to be true before we actually get nuclear weapons and our own efforts actually made a difference in that!" followed shortly by "But it's not practical to be worried about such tiny probabilities!"  Or maybe Fermi would've known better, but even so I have never been a fan of trying to have two mistakes cancel each other out.

I mention all this because it is dangerous to be half a rationalist, and only stop making one of the two mistakes.  If you are going to reject impractical 'clever arguments' that would never work in real life, and henceforth not try to multiply tiny probabilities by huge payoffs, then you had also better reject all the clever arguments that would've led Fermi or Szilard to assign probabilities much smaller than ten percent.  (Listing out a group of conjunctive probabilities leading up to taking an important action, and not listing any disjunctive probabilities, is one widely popular way of driving down the apparent probability of just about anything.)  Or if you would've tried to put fission chain reactions into a reference class of 'amazing new energy sources' and then assigned it a tiny probability, or put Szilard into the reference class of 'people who think the fate of the world depends on them', or pontificated about the lack of any positive experimental evidence proving that a chain reaction was possible, blah blah blah etcetera - then your error here can perhaps be compensated for by the opposite error of then trying to multiply the resulting tiny probability by a large impact.  I don't like making clever mistakes that cancel each other out - I consider that idea to also be clever - but making clever mistakes that don't cancel out is worse.

On the other hand, if you want a general heuristic that could've led Fermi to do better, I would suggest reasoning that previous-historical experimental proof of a chain reaction would not be strongly be expected even in worlds where it was possible, and that to discover a chain reaction to be impossible would imply learning some new fact of physical science which was not already known.  And this is not just 20-20 hindsight; Szilard and Rabi saw the logic in advance of the fact, not just afterward - though not in those exact terms; they just saw the physical logic, and then didn't adjust it downward for 'absurdity' or with more complicated rationalizations.  But then if you are going to take this sort of reasoning at face value, without adjusting it downward, then it's probably not a good idea to panic every time you assign a 0.01% probability to something big - you'll probably run into dozens of things like that, at least, and panicking over them would leave no room to wait until you found something whose face-value probability was large.

I don't believe in multiplying tiny probabilities by huge impacts.  But I also believe that Fermi could have done better than saying ten percent, and that it wasn't just random luck mixed with overconfidence that led Szilard and Rabi to assign higher probabilities than that.  Or to name a modern issue which is still open, Michael Shermer should not have dismissed the possibility of molecular nanotechnology, and Eric Drexler will not have been randomly lucky when it turns out to work: taking current physical models at face value imply that molecular nanotechnology ought to work, and if it doesn't work we've learned some new fact unknown to present physics, etcetera.  Taking the physical logic at face value is fine, and there's no need to adjust it downward for any particular reason; if you say that Eric Drexler should 'adjust' this probability downward for whatever reason, then I think you're giving him rules that predictably give him the wrong answer.  Sometimes surface appearances are misleading, but most of the time they're not.

A key test I apply to any supposed rule of reasoning about high-impact scenarios is, "Does this rule screw over the planet if Reality actually hands us a high-impact scenario?" and if the answer is yes, I discard it and move on.  The point of rationality is to figure out which world we actually live in and adapt accordingly, not to rule out certain sorts of worlds in advance.

There's a doubly-clever form of the argument wherein everyone in a plausibly high-impact position modestly attributes only a tiny potential possibility that their face-value view of the world is sane, and then they multiply this tiny probability by the large impact, and so they act anyway and on average worlds in trouble are saved.  I don't think this works in real life - I don't think I would have wanted Leo Szilard to think like that.  I think that if your brain really actually thinks that fission chain reactions have only a tiny probability of being important, you will go off and try to invent better refrigerators or something else that might make you money.  And if your brain does not really feel that fission chain reactions have a tiny probability, then your beliefs and aliefs are out of sync and that is not something I want to see in people trying to handle the delicate issue of nuclear weapons.  But in any case, I deny the original premise:  I do not think the world's niches for heroism must be populated by heroes who are incapable in principle of reasonably distinguishing themselves from a population of crackpots, all of whom have no choice but to continue on the tiny off-chance that they are not crackpots.

I haven't written enough about what I've begun thinking of as 'heroic epistemology' - why, how can you possibly be so overconfident as to dare even try to have a huge positive impact when most people in that reference class blah blah blah - but on reflection, it seems to me that an awful lot of my answer boils down to not trying to be clever about it.  I don't multiply tiny probabilities by huge impacts.  I also don't get tiny probabilities by putting myself into inescapable reference classes, for this is the sort of reasoning that would screw over planets that actually were in trouble if everyone thought like that.  In the course of any workday, on the now very rare occasions I find myself thinking about such meta-level junk instead of the math at hand, I remind myself that it is a wasted motion - where a 'wasted motion' is any thought which will, in retrospect if the problem is in fact solved, not have contributed to having solved the problem.  If someday Friendly AI is built, will it have been terribly important that someone have spent a month fretting about what reference class they're in?  No.  Will it, in retrospect, have been an important step along the pathway to understanding stable self-modification, if we spend time trying to solve the Lobian obstacle?  Possibly.  So one of these cognitive avenues is predictably a wasted motion in retrospect, and one of them is not.  The same would hold if I spent a lot of time trying to convince myself that I was allowed to believe that I could affect anything large, or any other form of angsting about meta.  It is predictable that in retrospect I will think this was a waste of time compared to working on a trust criterion between a probability distribution and an improved probability distribution.  (Apologies, this is a technical thingy I'm currently working on which has no good English description.)

But if you must apply clever adjustments to things, then for Belldandy's sake don't be one-sidedly clever and have all your cleverness be on the side of arguments for inaction.  I think you're better off without all the complicated fretting - but you're definitely not better off eliminating only half of it.

And finally, I once again state that I abjure, refute, and disclaim all forms of Pascalian reasoning and multiplying tiny probabilities by large impacts when it comes to existential risk.  We live on a planet with upcoming prospects of, among other things, human intelligence enhancement, molecular nanotechnology, sufficiently advanced biotechnology, brain-computer interfaces, and of course Artificial Intelligence in several guises.  If something has only a tiny chance of impacting the fate of the world, there should be something with a larger probability of an equally huge impact to worry about instead.  You cannot justifiably trade off tiny probabilities of x-risk improvement against efforts that do not effectuate a happy intergalactic civilization, but there is nonetheless no need to go on tracking tiny probabilities when you'd expect there to be medium-sized probabilities of x-risk reduction.  Nonetheless I try to avoid coming up with clever reasons to do stupid things, and one example of a stupid thing would be not working on Friendly AI when it's in blatant need of work.  Elaborate complicated reasoning which says we should let the Friendly AI issue just stay on fire and burn merrily away, well, any complicated reasoning which returns an output this silly is automatically suspect.

If, however, you are unlucky enough to have been cleverly argued into obeying rules that make it a priori unreachable-in-practice for anyone to end up in an epistemic state where they try to do something about a planet which appears to be on fire - so that there are no more plausible x-risk reduction efforts to fall back on, because you're adjusting all the high-impact probabilities downward from what the surface state of the world suggests...

Well, that would only be a good idea if Reality were not allowed to hand you a planet that was in fact on fire.  Or if, given a planet on fire, Reality was prohibited from handing you a chance to put it out.  There is no reason to think that Reality must a priori obey such a constraint.

EDIT:  To clarify, "Don't multiply tiny probabilities by large impacts" is something that I apply to large-scale projects and lines of historical probability.  On a very large scale, if you think FAI stands a serious chance of saving the world, then humanity should dump a bunch of effort into it, and if nobody's dumping effort into it then you should dump more effort than currently into it.  On a smaller scale, to compare two x-risk mitigation projects in demand of money, you need to estimate something about marginal impacts of the next added effort (where the common currency of utilons should probably not be lives saved, but "probability of an ok outcome", i.e., the probability of ending up with a happy intergalactic civilization).  In this case the average marginal added dollar can only account for a very tiny slice of probability, but this is not Pascal's Wager.  Large efforts with a success-or-failure criterion are rightly, justly, and unavoidably going to end up with small marginally increased probabilities of success per added small unit of effort.  It would only be Pascal's Wager if the whole route-to-an-OK-outcome were assigned a tiny probability, and then a large payoff used to shut down further discussion of whether the next unit of effort should go there or to a different x-risk.

Reflection in Probabilistic Logic

63 Eliezer_Yudkowsky 24 March 2013 04:37PM

Paul Christiano has devised a new fundamental approach to the "Löb Problem" wherein Löb's Theorem seems to pose an obstacle to AIs building successor AIs, or adopting successor versions of their own code, that trust the same amount of mathematics as the original.  (I am currently writing up a more thorough description of the question this preliminary technical report is working on answering.  For now the main online description is in a quick Summit talk I gave.  See also Benja Fallenstein's description of the problem in the course of presenting a different angle of attack.  Roughly the problem is that mathematical systems can only prove the soundness of, aka 'trust', weaker mathematical systems.  If you try to write out an exact description of how AIs would build their successors or successor versions of their code in the most obvious way, it looks like the mathematical strength of the proof system would tend to be stepped down each time, which is undesirable.)

Paul Christiano's approach is inspired by the idea that whereof one cannot prove or disprove, thereof one must assign probabilities: and that although no mathematical system can contain its own truth predicate, a mathematical system might be able to contain a reflectively consistent probability predicate.  In particular, it looks like we can have:

∀a, b: (a < P(φ) < b)          ⇒  P(a < P('φ') < b) = 1
∀a, b: P(a ≤ P('φ') ≤ b) > 0  ⇒  a ≤ P(φ) ≤ b

Suppose I present you with the human and probabilistic version of a Gödel sentence, the Whitely sentence "You assign this statement a probability less than 30%."  If you disbelieve this statement, it is true.  If you believe it, it is false.  If you assign 30% probability to it, it is false.  If you assign 29% probability to it, it is true.

Paul's approach resolves this problem by restricting your belief about your own probability assignment to within epsilon of 30% for any epsilon.  So Paul's approach replies, "Well, I assign almost exactly 30% probability to that statement - maybe a little more, maybe a little less - in fact I think there's about a 30% chance that I'm a tiny bit under 0.3 probability and a 70% chance that I'm a tiny bit over 0.3 probability."  A standard fixed-point theorem then implies that a consistent assignment like this should exist.  If asked if the probability is over 0.2999 or under 0.30001 you will reply with a definite yes.

continue reading »

View more: Prev | Next