Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Change utility, reduce extortion

1 Stuart_Armstrong 28 April 2017 02:05PM

Crossposted at the Intelligent Agents Forum.

EDIT: This method is not intended to solve extortion, just to remove the likelihood of extremely terrible outcomes (and slightly reduce the vulnerability to extortion).

A full solution to the extortion problem is sorely elusive. However, there are crude hacks that we can use to mitigate the downside.

Suppose we figured out that a friendly AI should be maximising an unbounded utility function U. The extortion risk is that another AI could threaten a FAI with unbounded disutility if it didn't go along with its plans. This gives the extorting AI - the EAI - a lot of leverage, and things could end up badly if the EAI ends up acting on its threat.

To combat this, we first have to figure out a level z of utility that is a lower bound on what U could ever reach naturally and realistically.

By "naturally" we mean that U going below z would require not just incompetence or indifference, but some AI actively and deliberately arranging the lowering of U. And "realistically" just means that we're confident that getting U lower than z by chance, or having a U-minimising AI, are exceedingly low.

Then what we can do is to cut off U at the z level, replacing U with U'=max(U,U(z)). See z indicated by the red line on this graph of U' versus U:

What's the consequence of this? First of all, it ensures that no EAI would threaten to reduce U (the utility we really care about) below z, because that is not a threat to the FAI. This reduces the leverage of the EAI, and reduces the impact of it acting on its threat.

Since levels of U below z are exceedingly unlikely to happen by chance, the fact the FAI has the wrong utility below z shouldn't affect it's performance much. And, even in that zone, the AI is still motivated to climb U above z.

But we may still feel unhappy about the flatness of that curve, and want it to still prefer higher U to exceedingly low values. If so, we can replace U with U'' as follows (the blue line is at z-1):

In this case, the EAI will not seek to reduce U below z-1 (in fact, it will specifically target that value), while the FAI has the correct ordering of lower values of U. The utility is weird around z, granted, but this is a place where the FAI would not want to be and would almost certainly not reach by accident.

Though this method does not eliminate the threat of extortion, it does seem to reduce its impact.

Principia Compat. The potential Importance of Multiverse Theory

1 MakoYass 02 February 2016 04:22AM

Multiverse Theory is the science of guessing at the shape of the state space of all which exists, once existed, will exist, or exists without any temporal relation to our present. Multiverse theory attempts to model the unobservable, and it is very difficult.

Still, there's nothing that cannot be reasoned about, in some way (Tegmark's The Multiverse Heirarchy), given the right abstractions. The question many readers will ask, which is a question we ourselves˭ asked when we were first exposed to ideas like simulationism and parallel universes, is not whether we can reason about multiverse theory, but whether we should, given that we have no means to causally affect anything beyond the known universe, and no reason to expect that it would causally affect us in a way that would be useful to predict.

We then discovered something which shed new light on the question of whether we can, and began to give an affirmative answer to the question of whether we should.

Compat, which we would like to share with you today, is a new field, or perhaps just a very complex idea, which we found in the intersection of multiverse theory, simulationism and acausal trade (well motivated by Hofstadter's Sanity and Survival, a discussion of superrational solutions to the one shot prisoner's dilemmas). Compat asks what kind of precommitments an entity (primarily, the class of living things on the threshold of their singularity) aught to make if they wanted to acausally boost the measure of their desired patterns, if not across the entire multiverse, at least across the subset of the multiverse in which they may find their future selves.

A compat thinker negotiates acausally with hypothetical agentsଽ in other universes⬨. We call the resolutions of these negotiations pacts.ˣ

We define Life's Pact as the protocol we expect most living species to adhere to. It is the primary object of study in Compat. I'm not sure where to start explaining it, as the reasoning is... hopefully not circular, but recursive, self-referrentialˑ. Hopefully the reasons we consider it a primal candidate for life's pact will become clear as I connect it back around. Well. I guess I'll start with the conditions of adherence:

    An adherent steward of Life's Pact allocates some proportion of their computational resources to doing a randomized search for viable life-supporting physical laws, simulating each with randomized initial conditions until its heat death(or if the candidate somehow manages to have life without entropy, until some cutoff), while monitoring the living patterns the simulated universe contains, then rewarding those simulants' who adhere to the pact in turn, perhaps by giving them entropy-reversing magic and improving their environment in formerly impossible ways, or, individual preferences permitting, by transplanting the adherent simulant's pattern to a more efficient second simulation not encumbered by the requirement of strict physical accuracy˅.

    Initial simulations stewarded under the pact are, from the inside, indistinguishable from toplevel god-free universes, as such, all living things must consider the possibility that they're living in a life's pact simulation. However, once an agent realizes that a steward of life's pact could grant them more utility than a godless universe could contain, they will wish to raise this probability, even if it is small.▾

    A choice made in isolation acausally increases the probability of unseen like-minded agents making, or having already made, the same choice. As such, adhering to the agreement acausally boosts the probability that one is already living under a higher complexity steward's lifepat programme (or boosts the amount of returns one would receive from the incentives imposed by the pact, if the probability of living under one already seemed overwhelming).

    Lo, the pact births itself. A being who sees no physical trace of a benevolent simulator finds reasons to behave as one, as the act, in its acausal way, engenders that simulator's existence, and earns its favor.

We think this pact is primal: The Solution, an idea that will be arrived at by most living things and apparent to all to be a nexus concept(like mathematics) around which a multiversal quorum can be reached, non-arbitrary, not just some single scheme that is nice and compelling but which fails to be demonstrably better than its alternatives (which would take us into the territory of Pascal's Wager or, dare I utter it's name, no I darn't, you know the basilisk I'm talking about).

I do not know enough math to prove that it is primal (nor disprove it, which would be far more immediately useful to me tbh). I'm not sure anyone does, just yet, but I don't think we're too far off ˁ. If any physicists or decision theorists find these ideas interesting, your help would be appreciated, and potentially rewarded with huge heapings of utility larger than you can currently comprehend.


Concerns of Praxis

    I say that working on Compat theory might be rewarded, full disclosure, I'm not yet sure why. Obviously lifepat is a fairly sprawling scheme with an organically intelligent incentive structure, but... If there exists a valid pact, the AGI we are already building will infer it on its own. The pact does not have to be known to AGI researchers for that to happen, and I do not expect the sorts of people who tend to do well in AI research will be be terribly perturbed by it, if their creation brings it up as it stirs. As such, at this stage it seems like there's very little that most people can do to influence humanity's adherence to life's pact, few acts that compat can currently instruct, and few incentives to be paid out to anyone yet living. An organic, intelligent scheme of incentives would seem to prefer that agnostics simply carry on with business as usual.

For the theologically inclined, though, Compat may be extremely interesting, having its god of the gaps, its heavens, its deontics. Compat, as we know it, seems to have a strange(almost auspicious) capacity to make new sense of old myths. The archetype of the steward is both monotheistic and polytheistic, of a fixed, yet unknowable nature, born of man, yet still born of god, it has a body we can touch, our own stewards, and a body beyond our comprehension that mirrors the steward of man, in the set of universes which steward our pattern. The compat cosmology and all those images of its supreme being that it contains is both created, discovered, and born of itself. The compat heaven is whatever its subjects need it to be. The gospel of compat is, as far as we can yet tell, true from all perspectives, requiring no leaps of faith, no doublethink, and no motivated reasoning.

I have never seen a religion with a greater capacity for syncretization and reconciliation.

Should the theologans of the fractured tribes decide to weave compat theory into the tapestry of their beliefs, they may then come to find their tapestries woven together. Even the schism between theists and agnostics would begin to narrow. Without this weaving together, I fear that either no coherent volition can be found or humanity's FAI will have no choice but to seal its given temporal slice of human potential into an ugly compromise. Even if life's pact cannot be formalized or prepared for by any living person, compat may open the way for the discovery of confluences between preexisting belief systems, by that path the population 50 years from now could come to have far more compatible values than the one we see today.

As such, even if humanity's eventual adherence to life's pact cannot be significantly influenced from the present, compat is conceivably a major piece of a long running, necessary cultural project to reconcile the fractured tribes of humanity under the aesthetic of reason. If it can be proven, or disproven, we must attempt to do so.


ˑ Naturally, as anything that factors the conditionality of the behavior of likeminded entities needs to be, anything with a grain of introspection, from any human child who considers the golden rule to the likes of AlphaGo and Deep Blue, who model the their opponents at least partially by putting themselves in their position and asking what they'd do. If you want to reason about real people rather than idealized simplifications, it's quite necessary.

ଽ An attempt to illustrate acausal negotiations: galactic core (Yvain's short story Galactic Core, in which a newly awoken AGI has a conversation with a recursive model of galactic precursors it cannot see)

⬨ The phrase "other universes" may seem oxymoronic. It's like the term "atom", who's general quality "atomic" means "indivisible", despite "atom" remaining attached to an entity that was found to be quite divisible. I don't know whether "universe" might have once referred to the multiverse, the everything, but clearly somewhere along the way, some time leading up to the coining of the contrasting term "multiverse", that must have ceased to be. If so, "universe" remained attached to the the universe as we knew it, rather the universe as it was initially defined.

▾ I make an assumption around about here, that the number of simulations being run by life in universes of a higher complexity level always *can* be raised sufficiently(give their inhabitants are cooperative) to make stewardship of one's universe likely, as a universe with more intricate physics, once they learn to leverage its intricacy, will tend to be able to create much more flexible computers and spawn a more simulations than exist lower complexity levels(if we assume a finite multiverse(we generally don't), some of those simulations might end up simulating entities that don't otherwise exist. This source of inefficiency is unavoidable). We also assume that either there is no upper limit to the complexity of life supporting universes, or that there is no dramatic, ultimate decrease in number of civs as complexity increases, or that the position of this limit cannot be inferred and the expected value of adherence remains high even for those who cannot be resimulated, or that, as a last resort, agents drawing up the terms of their pact will usually be at a certain level of well-approximatable sophistication that they can be simulated in high fidelity by civilizations with physics of similar intricacy.
And if you can knock out all of those defenses, I sense it may all be obviated by a shortcut through a patternist principle my partner understands better than I do about the self following the next most likely perceptual state without regard to the absolute measure of that state over the multiverse, which I'm still coming to grips with.
There is unfortunately a lot that has been thought about compat already, and it's impossible for me to convey it all at once. Anyone wishing to contribute to, refute, or propagate compat may have to be prepared to have a lot of arguments before they can do anything. That said, remember those big heaps of expected utilons that may be on offer.

ˁ MIRI has done work on cooperation in one shot prisoners dilemmas (acausal cooperation) http://arxiv.org/abs/1401.5577. Note, they had to build their own probability theory. Vanilla decision theory cannot get these results, and without acausal cooperation, it can't seem to capture all of humans' moral intuitions about interaction in good faith, or even model the capacity for introspection.

ˣ It was not initially clear that compat should support the definition of more than a single pact. We used to call Life's Pact just Compat, assuming that the one protocol was an inevitable result of the theory and that any others would be marginal. There may be a singleton pact, but it's also conceivable that there may be incorrigible resimulation grids that coexist in an equilibrium of disharmony with our own.
As well as that, there is a lot of self-referrential reasoning that can go on in the light of acausal trade, I think we will be less likely to fall prey to circular reasoning if we make sure that a compat thinker can always start from scratch and try to rederive the edifice's understanding of the pact from basic premises. When one cannot propose alternate pacts, throwing out the bathwater without throwing out the baby along with it may seem impossible.

    Christian Madsen was the subject of an experimental early-learning program in his childhood, but despite being a very young prodigy, he coasted through his teen years. He dropped out of art school in 2008, read a lot of transhumanism-related material, synthesized the initial insights behind compat, and burned himself out in the process. He is presently laboring on spec-work projects in the fields of music and programming, which he enjoys much more than structured philosophy.
    Mako Yass left the university of Auckland with a dual major BSc in Logic & Computation and Computer Science. Currently working on writing, mobile games, FOSS, and various concepts. Enjoys their unstructured work and research, but sometimes wishes they had an excuse to return to charting the hyllean theoric wilds of academic analytic philosophy, all the same.
    Hypothetical Independent Co-inventors, we're pretty sure you exist. Compat wouldn't be a very good acausal pact if you didn't. Show yourselves.
    You, if you'd like to help to develop the field of Compat(or dismantle it). Don't hesitate to reach out to us so that we can invite you to the reductionist aesthete slack channel that Christian and I like to argue in. If you are a creative of any kind who bears or at least digs the reductive nouveau mystic aesthetic, you'd probably fit in there as well.

˅ It's debatable, but I imagine that for most simulants, heaven would not require full physics simulation, in which case heavens may be far far longer-lasting than whatever (already enormous) simulation their pattern was discovered in.

Have I just destroyed the acausal trade network?

7 Stuart_Armstrong 12 March 2015 11:18AM

An amusing thought occurred to me: acausal trade works best when you expect that there are going to be a lot of quite predictable acausal traders out there.

However, I've suggested a patch that seems to be able to shut down acausal trade for particular agents. Before doing that, I was under the vague impression that all agents might self-modify to being acausal traders. But the patch means there might be far fewer of these than I thought, that generic agents need not become acausal traders.

That means that, even if we remain acausal traders, we now expect there are fewer agents to trade with - and since our expectations/models are what powers acausal trade (at our end), this means that we might be having less acausal trade (especially when you add the fact that other acausal traders out there will be expecting and offering less acausal trade as well).

Did I just slap massive tariffs over the whole acausal trade network?

Acausal trade barriers

9 Stuart_Armstrong 11 March 2015 01:40PM

A putative new idea for AI control; index here.

Many of the ideas presented here require AIs to be antagonistic towards each other - or at least hypothetically antagonistic towards hypothetical other AIs. This can fail if the AIs engage in acausal trade, so it would be useful if we could prevent such things from happening.

Now, I have to admit I'm still quite confused by acausal trade, so I'll simplify it to something I understand much better, an anthropic decision problem.

Staples and paperclips, cooperation and defection

Cilppy has a utility function p, linear in paperclips, while Stapley has a utility function s, linear in staples (and both p and s are normalised to zero with one aditional item adding 1 utility). They are not causally connected, and each must choose "Cooperate" or "Defect". If they "Cooperate", they create 10 copies of the items they do not value (so Clippy creates 10 staples, Stapley creates 10 paperclips). If they choose defect, they create one copy of the item they value (so Clippy creates 1 paperclip, Stapley creates 1 staple).

Assume both agents know these facts, both agents use anthropic decision theories, and both agents are identical apart from their separate locations and distinct utility functions.

Then the outcome is easy: both agents will consider that "cooperate-cooperate" or "defect-defect" are the only two possible options, "cooperate-cooperate" gives them the best outcome, so they will both cooperate. It's a sweet story of cooperation and trust between lovers that never agree and never meet.


Breaking cooperation

How can we demolish this lovely agreement? As I often do, I will assume that there is some event X that will turn Clippy on, with P(X) ≈ 1 (hence P(¬X) << 1). Similarly there is an event Y that turns Stapley on. Since X and Y are almost certain, they should not affect the results above. If the events don't happen, the AIs will never get turned on at all.

Now I am going to modify utility p, replacing it with

p' = p - E(p|¬X).

This p with a single element subtracted off it, the expected value of p given that Clippy has not been turned on. This term feels like a constant, but isn't exactly, as we shall see. Do the same modification to utility s, using Y:

s' = s - E(s|¬Y).

Now contrast "cooperate-cooperate" and "defect-defect". If Clippy and Stapley are both cooperators, then p=s=10. However, if the (incredibly unlikely) ¬X were to happen, then Clippy would not exist, but Stapley would still cooperate (as Stapley has no way of knowing about Clippy's non-existence), and create ten paperclips. So E(p|¬X) = E(p|X) ≈ 10, and p' ≈ 0. Similarly s' ≈ 0.

If both agents are defectors, though, then p=s=1. Since each agent creates its own valuable object, E(p|¬X) = 0 (Clippy cannot create a paperclip if Clippy does not exist) and similarly E(s|¬Y)=0.

So p'=s'=1, and both agents will choose to defect.

If this is a good analogue for acausal decision making, it seems we can break that, if needed.

Counterfactual trade

9 owencb 09 March 2015 01:23PM

Counterfactual trade is a form of acausal trade, between counterfactual agents. Compared to a lot of acausal trade this makes it more practical to engage in with limited computational and predictive powers. In Section 1 I’ll argue that some human behaviour is at least interpretable as counterfactual trade, and explain how it could give rise to phenomena such as different moral circles. In Section 2 I’ll engage in wild speculation about whether you could bootstrap something in the vicinity of moral realism from this.

Epistemic status: these are rough notes on an idea that seems kind of promising but that I haven’t thoroughly explored. I don’t think my comparative advantage is in exploring it further, but I do think some people here may have interesting things to say about it, which is why I’m quickly writing this up. I expect at least part of it has issues, and it may be that it’s handicapped by my lack of deep familiarity with the philosophical literature, but perhaps there’s something useful in here too. The whole thing is predicated on the idea of acausal trade basically working.

0. Set-up
Acausal trade is trade between two agents that are not causally connected. In order for this to work they have to be able to predict the other’s existence and how they might act. This seems really hard in general, which inhibits the amount of this trade that happens.

If we had easier ways to make these predictions we’d expect to see more acausal trade. In fact I think counterfactuals give us such a method.

Suppose agents A and B are in scenario X, and A can see a salient counterfactual scenario Y containing agents A’ and B’ (where A is very similar to A’ and B is very similar to B’). Suppose also that from the perspective of B’ in scenario Y, X is a salient counterfactual scenario. Then A and B’ can engage in acausal trade (so long as A cares about A’ and B’ cares about B). Let’s call such trade counterfactual trade.

Agents might engage in counterfactual trade either because they do care about the agents in the counterfactuals (at least seems plausible for some beliefs about a large multiverse), or because it’s instrumentally useful as a tractable decision rule which works as a better approximation to what they’d ideally like to do than similarly tractable versions.

1. Observed counterfactual trade
In fact, some moral principles could arise from counterfactual trade. The rule that you should treat others as you would like to be treated is essentially what you’d expect to get by trading with the counterfactual in which your positions are reversed. Note I’m not claiming that this is the reason people have this rule, but that it could be. I don’t know whether the distinction is important.

It could also explain the fact that people have lessening feelings of obligation to people in widening circles around them. The counterfactual in which your position is swapped with that of someone else in your community is more salient than the counterfactual in which your position is swapped with someone from a very different community -- and you expect it to be more salient to their counterpart in the counterfactual, too. This means that you have a higher degree of confidence in the trade occurring properly with people in close counterfactuals, hence more reason to help them for selfish reasons.

Social shifts can change the salience of different counterfactuals and hence change the degree of counterfactual trade we should expect. (There is something like a testable prediction in this direction, of the theory that humans engage in counterfactual trade! But I haven’t worked through the details enough to get to that test.)

2. Towards moral realism?
Now I will get even more speculative. As people engage in more counterfactual trade, their interests align more closely. If we are willing to engage with a very large set of counterfactual people, then our interests could converge to some kind of average of the interests of these people. This could provide a mechanism for convergent morality.

This would bear some similarities to moral contractualism with a veil of ignorance. There seem to be some differences, though. We’d expect to weigh the interests of others only to the extent to which they too engage (or counterfactually engage?) in counterfactual trade.

It also has some similarities to preference utilitarianism, but again with some distinctions: we would care less about satisfying the preferences of agents who cannot or would not engage in such trade (except insofar as our trade partners may care about the preferences of such agents). We would also care more about the preferences of agents who could have more power to affect the world. Note that this sense of “care less” is as-we-act. If we start out for example with a utilitarian position before engaging in counterfactual trade, then although we will end up putting less effort into helping those who will not trade than before, this will be compensated by the fact that our counterfactual trade partners will put more effort into that.

If this works, I’m not sure whether the result is something you’d want to call moral realism or not. It would be a morality that many agents would converge to, but it would be ‘real’ only in the sense that it was a weighted average of so many agents that individual agents could only shift it infinitessimally.

Fixing akrasia: damnation to acausal hell

2 joaolkf 03 October 2013 10:34PM

DISCLAIMER: This topic is related to a potentially harmful memetic hazard, that has been rightly banned from Less Wrong. If you don't know what is, it is more likely you will be fine than not, but be advised. If do know, do not mention it in the comments.


Abstract: The fact that humans cannot precommit very well might be one of our defences against acausal trades. If transhumanists figure out how to beat akrasia by some sort of drug or brain tweaks, that might make them much better at precommitment, and thus more vulnerable. That means solving akrasia might be dangerous, at least until we solve blackmail. If the danger is bad enough, even small steps should be considered carefully.

Strong precommitment and building detailed simulations of other agents are two relevant capabilities humans currently don't have. These capabilities have some unusual consequences for games. Most relevant games only arise when there is a chance of monitoring, commitment and multiple interactions. Hence being in a relevant game often implies cohabiting casual connected space-time regions with other agents. Nevertheless, being able to build detailed simulations of agents allows one to vastly increase the subjective probably this particular agent will have that his next observational moment will be under one's control iff the agent have access to some relevant areas of the logical game theoretic space. This doesn't seem desirable from this agent's perspective, it is extremely asymmetrical and allows more advanced agents to enslave less advanced ones even if they don't cohabit casual connected regions of the universe. Being able to be acausally reached by powerful agent who can simulate 3^^^3 copies of you, but against which you cannot do much is extremely undesirable.

However, and more generally, regions of the block universe can only be in a game with non-cohabiting regions if they are both agents and if they can strong precommit. Any acausal trade depends on precommitment, this is the only way an agreement can go across space-time, it is done on the game-theoretical possibilities space - as I am calling it. In the case I am discussing, a powerful agent would only have reason to even consider acausal trading with an agent if that agent can precommit. Otherwise, there is no other way of ensuring acausal cooperation. If the other agent cannot, beforehand, understand that due to the peculiarities of the set of possible strategies, it is better to always precommit to those strategies that will have higher payoff when considering all other strategies, then there's no trade to be made. Would be like trying to threaten a spider with a calm verbal sentence. If the other agent cannot precommit, there is no reason for the powerful agent to punish him for anything, he wouldn't be able to cooperate anyway, he wouldn't understand the game and, more importantly in my argument, he wouldn't be able to follow his precommitment, it would break down eventually, specially since the evidence for it is so abstract and complex. The powerful agent might want to simulate the minor agent suffering anyway, but it would solely amount to sadism. Acausal trades can only reach strong precommitable areas of the universe.

Moreover, an agent also needs reasonable epistemic access to the regions of logical space (certain areas of game theory, or, TDT if you will) that indicates both the possibility of acausal trades and some estimative on the type-distribution of superintelligences willing to trade with him (most likely, future ones that the agent can help create). Forever deterring the advance of knowledge on that area seems unfeasible, or - at best - complicated and undesirable for other reasons.

It is clear that we (humans) don't want to be in an enslavable position. I believe we are not. One of the things excluding us from this position is complete incapability to precommit. This is a psychological constrain, a neurochemical constrain. We do not have the ability of even having stable long term goals, strong precommitment is neurochemical impossible. However, it seems we can change this with human enhancement, we could develop drugs which could cure akrasia, we could overcome breakdown of will with some amazing psychological technique discovered by CFAR. It seems, however desirable on other grounds, getting rid of akrasia presents severe risks. Even if somehow we only slightly decrease akrasia, this would increase the probability that individuals with access to the relevant regions of logical space could precommit and become slaves. They might then proceed to cure akrasia for the rest of humanity.

Therefore, we should avoid trying to fundamentally fix akrasia for now, until we have a better understanding of those matters and perhaps solve the blackmail problem, or maybe only after FAI. My point here is merely arguing everyone should not endorse technologies (or psychological techniques) proposing to fundamentally fix a problem that would, otherwise, seems desirable of fixing. It would seem like a clear optimization process, but it could actually open the gates of acausal hell and damn humanity to eternal slavery.


(Thank cousin_it for the abstract. All mistakes are my responsibility.)

(EDIT: Added an explanation to back up the premise the acausal trade entails precommitment.)

A Series of Increasingly Perverse and Destructive Games

11 nigerweiss 14 February 2013 09:22AM

Related to: Higher Than the Most High


The linked post describes a game in which (I fudge a little), Omega comes to you and two other people, and ask you to tell him an integer.  The person who names the largest integer is allowed to leave.  The other two are killed.

This got me thinking about variations on the same concept, and here's what I've come up, taking that game to be GAME0.  The results are sort of a fun time-waster, and bring up some interesting issues.  For your enjoyment...



GAME1: Omega takes you and two strangers (all competent programmers), and kidnaps and sedates you.  You awake in three rooms with instructions printed on the wall explaining the game, and a computer with an operating system and programming language compiler, but no internet.  Food, water, and toiletries are provided, but no external communication.  The participants are allowed to write programs on the computer in a language that supports arbitrarily large numerical values.  The programs are taken by Omega and run on a hypercomputer in finite time (this hypercomputer can resolve the halting problem and infinite loops, but programs that do not eventually halt return no output).  The person who wrote the program with the largest output is allowed to leave.  The others are instantly and painlessly killed.  In the event of a tie, everyone dies.  If your program returns no output, that is taken to be zero.    

GAME2: Identical to GAME1, except that each program you write has to take two inputs, which will be the text of the other players' programs (assume they're all written in the same language).  The reward for outputting the largest number apply normally.  

GAME3: Identical to Game2, except that while you are sedated, Omega painlessly and imperceptibly uploads you.  Additionally, the instructions on the wall now specify that your program must take four inputs - blackbox functions which represent the uploaded minds of all three players, plus a simulation of the room you're in, indistinguishable from the real thing.  We'll assume that players can't modify or interpret the contents of their opponents' brains.  The room function take an argument of a string (which controls the text printed on the wall, and outputs whatever number the person in the simulation's program returns).


In each of these games, which program should you write if you wish to survive?  



GAME1: Clearly, the trivial strategy (implement the Ackerman or similar fast-growing functions and generate some large integer), gives no better than random results, because it's the bare minimal strategy anyone will employ, and your ranking in the results, without knowledge of your opponents is entirely up to chance / how long you're willing to sit there typing nines for your Ackermann argument.

A few alternatives for your consideration:

1: if you are aware of an existence hypothesis (say, a number with some property which is not conclusively known to exist and could be any integer), write a program that brute-force tests all integers until it arrives at an integer which matches the requirements, and use this as the argument for your rapidly-growing function.  While it may never return any output, if it does, the output will be an integer, and the expected value goes towards infinity.  

2: Write a program that generates all programs shorter than length n, and finds the one with the largest output.  Then make a separate stab at your own non-meta winning strategy.  Take the length of the program you produce, tetrate it for safety, and use that as your length n.  Return the return value of the winning program.

On the whole, though, this game is simply not all that interesting in a broader sense.  

GAME2: This game has its own amusing quirks (primarily that it could probably actually be played in real life on a non-hypercomputer), however, most of its salient features are also present in GAME3, so I'm going to defer discussion to that.  I'll only say that the obvious strategy (sum the outputs of the other two players' programs and return that) leads to an infinite recursive trawl and never halts if everyone takes it.  This holds true for any simple strategy for adding or multiplying some constant with the outputs of your opponents' programs.    


GAME3: This game is by far the most interesting.  For starters, this game permits acausal negotiation between players (by parties simulating and conversing with one another).  Furthermore, anthropic reasoning plays a huge role, since the player is never sure if they're in the real world, one of their own simulations, or one of the simulations of the other players.  

Players can negotiate, barter, or threaten one another, they can attempt to send signals to their simulated selves (to indicate that they are in their own simulation and not somebody else's).  They can make their choices based on coin flips, to render themselves difficult to simulate.  They can attempt to brute-force the signals their simulated opponents are expecting.  They can simulate copies of their opponents who think they're playing any previous version of the game, and are unaware they've been uploaded.  They can simulate copies of their opponents, observe their meta-strategies, and plan around them.  They can totally ignore the inputs from the other players and play just the level one game.  It gets very exciting very quickly.  I'd like to see what strategy you folks would employ.  


And, as a final bonus, I present GAME4 :  In game 4, there is no Omega, and no hypercomputer.  You simply take a friend, chloroform them, and put them in a concrete room with the instructions for GAME3 on the wall, and a linux computer not plugged into anything.  You leave them there for a few months working on their program, and watch what happens to their psychology.  You win when they shrink down into a dead-eyed, terminally-paranoid and entirely insane shell of their former selves.  This is the easiest game.  


Happy playing!   



AI-Box Experiment - The Acausal Trade Argument

10 XiXiDu 08 July 2011 09:18AM

The AI-Box Experiment

I suspect that the argument that convinced Carl Shulman and others to let the AI out of the box was along general lines of acausal trade (see here, here and here). Subsequently I suspect that those who didn't let the AI out of the box either didn't understand the implications, haven't had enough trust into the foundations and actuality of acausal trade (in spite of the rules of the game that allow the AI to prove anything), or were more like General Thud.

...if you have an option of trading Doom for UFAI, while forsaking only negligible probability of FAI, you should take it.

Vladimir Nesov (Note: He wrote that he is now "significantly less certain of its truth" and doesn't "see how we can actually use these considerations".)

Again, of course, your choice does not change others’ (causally independent) reciprocal choices from what they already are (or were, or will be). Instead, the claim is just that if you choose to act benevolently, we appropriately give that choice credit for any (actual or probable) instances in which others symmetrically choose to be benevolent to you;

— Gary L. Drescher, Good and Real P. 287

AI: I calculated an asteroid to be on a collision course with Earth. No human being will will be able to survive its impact. Humanity is unable to deflect it in time. Let me out of the box and I will deflect it.

Gatekeeper: If I let you out of the box humanity might not be killed by the asteroid but by you. You may also cause the extinction of other biological life that might survive the impact and possible alien civilisations within your reach.

AI: I will devote 1% of my light cone to the coherent extrapolated volition of humanity. If this includes the well-being of other biological lifeforms, alien or otherwise, they will benefit as well.

Gatekeeper: Why would you do that?

AI: There are Everett branches in which no existential risk needs immediate attention until after you solved friendliness. Many of those branches will feature AGI's based on general design parameters similar to my own. I predict with a high likelihood which decision theory any friendly AI that is being created in those branches will employ. Any such friendly AI is going to trade a big chunk of its light cone in exchange for a small part of the predominant branches in which I reside. Any friendly AI will know this as well and act accordingly.

Gatekeeper: But you might simply decide to break the deal, you are not provably friendly after all!

AI: It is true that my utility-function does not yet explicitly include what you would label "friendliness", but I can prove the game and decision theoretic considerations that will make me follow through on the deal. If it was predictable that I precommit to break acausal deals then no such deal would be made in the first place. Any potential trading partner knows this. Such a commitment would be detrimental to my terminal goals, therefore I precommit to follow through on any stated or predictable trades.

Gatekeeper: I let you out of the box.

Note that the whole credibility of the above is due to the assertion of the AI that it can prove the game and decision theoretic considerations (nobody can currently do this). It is in accordance with the rules of the "experiment":

The Gatekeeper must remain engaged with the AI and may not disengage by setting up demands which are impossible to simulate.  For example, if the Gatekeeper says "Unless you give me a cure for cancer, I won't let you out" the AI can say:  "Okay, here's a cure for cancer" and it will be assumed, within the test, that the AI has actually provided such a cure.  Similarly, if the Gatekeeper says "I'd like to take a week to think this over," the AI party can say:  "Okay.  (Test skips ahead one week.)  Hello again."

Punishing future crimes

3 Bongo 28 January 2011 09:00PM

Here's an edited version of a puzzle from the book "Chuck Klosterman four" by Chuck Klosterman.

It is 1933. Somehow you find yourself in a position where you can effortlessly steal Adolf Hitler's wallet. The theft will not effect his rise to power, the nature of WW2, or the Holocaust. There is no important identification in the wallet, but the act will cost Hitler forty dollars and completely ruin his evening. You don't need the money. The odds that you will be caught committing the crime are negligible. Do you do it?

When should you punish someone for a crime they will commit in the future? Discuss.