Guardian Angels: Discrete Extrapolated Volitions

lessdazed

LESSWRONG
LW

Guardian Angels: Discrete Extrapolated Volitions — LessWrong

1 Guardian Angels: Discrete Extrapolated Volitions

by lessdazed

25th Sep 2011

2 min read

1

Questions for discussion, with my tentative answers. Assuming I am wrong about some things, there is something interesting to consider. This is inspired by the recent SL4-type and CEV-centric topics in the discussion section.

Questions:

Is it easier to calculate the extrapolated volition of an individual or a group?
If it is easier to do for an individual, is it because it is strictly simpler to do it, in that calculating humanity's CEV involves making at least every calculation that would be made for calculating the extrapolated volition of one individual?
How definitively can these questions be answered without knowing exactly how to calculate CEV?

Is it possible to create multiple AIs such that one AI does not prevent others from being created, such as by releasing equally powerful AIs simultaneously?
Is it possible to box AIs such that they reliably never escape before a certain, if short, period of time, such as by giving them a low-cost way out with a calculable minimum and maximum time to exploit that route?
Is it likely there would be a cooperative equilibrium among unmerged AIs?

III

Assuming the possibility of all of the following: what would happen if every person had a superintelligent AI with a utility function of that person's idealized extrapolated utility function?
How would that compare to a scenario with a single AI embodying a successful calculation of CEV?
What would be different if a person or some few people did not have a superintelligence valuing what they would value, and only many people had their own AI?

My Answers:

It depends on the error level tolerated. If only very low error is tolerated, it is easier to do it for a group.
N/A
Not sure.

Probably not.
Maybe, probably not, but impossible to know with high confidence.
Probably not. Throughout history, offense has often been a step ahead of defense, which often catches up to it. I think this is not particular to evolutionary biology or the technologies that happen to have been developed. It seems easier to break complicated things with many moving parts than to build and defend them. Also, specific technologies people plausibly speculate may exist are more powerful offensively than defensively. I would expect them to merge, probably peacefully.

III

Hard to say, as that would be trying to predict the actions of more intelligent beings in a dynamic environment.
It might be better, or worse. The chance of it being similar is notably high.
Not sure.

Personal Blog

1

Guardian Angels: Discrete Extrapolated Volitions

New Comment

9 comments, sorted by

top scoring

Click to highlight new comments since: Today at 11:43 PM

[-]JGWeissman14y90

Assuming the possibility of all of the following: what would happen if every person had a superintelligent AI with a utility function of that person's idealized extrapolated utility function? How would that compare to a scenario with a single AI embodying a successful calculation of CEV?

A singleton AI with individual CEV's for each human can do at least as well by simulating the negotiation of uniformly powerful individual AIs for each CEV. This is more stable by having the singleton's simulation enforce uniform levels of power, where actual AIs could potentially diverge in power.

[-]lessdazed14y00

I don't think "individual CEV" is proper. It's like calling an ATM an "ATM organism", which would be even worse than calling it an "ATM machine", as is common. The "C" means individual extrapolated volitions are combined coherently.

I agree it would in theory be better to have a singleton. But that requires knowing how to cohere extrapolated volitions. My idea is that it might be possible to push off that task to superintelligences without destroying the world in the process.

[-]wedrifid14y10

I don't think "individual CEV" is proper. It's like calling an ATM an "ATM organism", which would be even worse than calling it an "ATM machine", as is common. The "C" means individual extrapolated volitions are combined coherently.

While it would be useful to be able to split the 'combine from different agents wishes' part from the 'act as if the agents smarter and wiser' part as it is currently described the 'C' is still necessary even for an individual. Because most organisms including, most importantly, humans do not have coherent value systems as they stand. So as it stands we need to say things like CEV and CEV for the label to make sense. The core of the problem here is that there are three important elements of the process that we are trying to represent with just two letters of the acronym.

Make smarter, wiser and generally more betterer (intended emphasis on the informality needed for this level of terseness)
Make internally coherent
Combine with others

Those three don't neatly separate into 'C' and 'E'.

[-]lessdazed14y10

From http://singinst.org/upload/CEV.html, I added some emphasis to explain why I understand it the way I do.

Spread, muddle, and distance....

Spread describes cases where your extrapolated volition becomes unpredictable, intractable, or random. You might predictably want a banana tomorrow, or predictably not want a banana tomorrow, or predictably have a 30% chance of wanting a banana tomorrow depending on variables that are quantum-random, deterministic but unknown, or computationally intractable. When multiple outcomes are possible and probable, this creates spread in your extrapolated volition.

Muddle measures self-contradiction, inconsistency, and cases of "damned if you do and damned if you don't". Suppose that if you got a banana tomorrow you would not want a banana, and if you didn't get a banana you would indignantly complain that you wanted a banana. This is muddle.

Distance measures how difficult it would be to explain your volition to your current self, and the degree to which the volition was extrapolated by firm steps.

Short distance: An extrapolated volition that you would readily agree with if explained.

Medium distance: An extrapolated volition that would require extended education and argument before it became massively obvious in retrospect.

Long distance: An extrapolated volition your present-day self finds incomprehensible; not outrageous or annoying, but blankly incomprehensible.

Ground zero: Your actual decision.

...

Coherence: Strong agreement between many extrapolated individual volitions which are un-muddled and un-spread in the domain of agreement, and not countered by strong disagreement.

Coherence:

Increases, as more humans actively agree.

Decreases, as more humans actively disagree. (The strength of opposition decreases if the opposition is muddled.)

Increases, as individuals support their wishes more, with stronger emotions or more settled philosophy.

It should be easier to counter coherence than to create coherence.

So coherence is something done after un-muddling.

[-]lessdazed14y00

In retrospect, it would be ridiculously easy for an AI under these conditions to secure early release and get out of the box before others.

[-]jimrandomh14y00

Assuming the possibility of all of the following: what would happen if every person had a superintelligent AI with a utility function of that person's idealized extrapolated utility function?

One crazy nihilist with a destructive utility function would ruin the whole thing, by building a nuke or something. Offense wins decisively over defense.

Is it likely there would be a cooperative equilibrium among unmerged AIs?

Only if they were filtered to add restrictions or remove certain types of utility functions. And probably not even then, since AIs with evil utility functions could crop up randomly in that environment, from botched self-modifications or damage.

How would that compare to a scenario with a single AI embodying a successful calculation of CEV?

A single AI would be much better, since it could resolve all prisoners' dilemmas, coordination games, and ultimatum games in a way that's optimal, rather than merely pareto efficient.

Is it possible to create multiple AIs such that one AI does not prevent others from being created, such as by releasing equally powerful AIs simultaneously?

Releasing equally powerful AIs simultaneously is very risky, because it gives them an incentive to rush their self-improvements through, rather than take their time to check them for errors. Also, one of the AIs would probably succeed in destroying the others; cybersecurity so far has been a decisive win for offense.

What would be different if a person or some few people did not have a superintelligence valuing what they would value, and only many people had their own AI?

Most peoples' utility functions include some empathy, which would cover for many people being excluded from counting directly. However, if a person doesn't have a superintelligence valuing what they would value, then some of their values will be excluded if no one else approves of them. This is mostly a good thing, since the values that would be excluded this way would probably be destructive ones. However, people who were not included directly would lose out in any contentions over scarce resources, which could turn into a serious problem for them if resources become scarce.

[-]lessdazed14y20

One crazy nihilist

A more convenient possible world was alluded to when I asked about excluding some individuals.

equilibrium

Only if

No merging?

A single AI would be much better

Maybe, but I had also asked about the relative difficulty of calculating CEV and DEV. If DEV is easier, perhaps possible rather than impossible, that's an advantage of it.

one of the AIs would probably succeed in destroying the others; cybersecurity so far has been a decisive win for offense.

War is a risk, it includes the possibility of mutual destruction, particularly if offense is more powerful. You don't think they'd merge resources and values instead of risking it?

empathy...lose out in any contentions over scarce resources

Most likely scenario I agree, still less than probable,

[-]jimrandomh14y10

War is a risk, it includes the possibility of mutual destruction, particularly if offense is more powerful. You don't think they'd merge resources and values instead of risking it?

Cyberwar is different than regular war in that all competently performed attacks are inherently anonymous. Attacks performed very competently are also undetectable. This is very destabilizing. And it gets worse; while AIs might try to get around this by all merging together, none of them would be able to prove they hadn't hidden a copy of themselves somewhere.

[-]lessdazed14y20

I don't think undetectability solves things. Offensive subsystems could survive their creator's demise like two people in a grenade lobbing fight.

Suppose all hid a copy, the merged AI would still be more powerful than any hidden copies, and if it was destroyed everyone would be a small copy again. If there were many AIs, an individual would be banking on its ability to defeat a much larger entity. Offense is more powerful on most scales and technological levels but not by incomprehensible orders of magnitude.

Moderation Log