Would a FAI reward us for helping create it?

TwistingFingers

LESSWRONG
LW

Would a FAI reward us for helping create it? — LessWrong

-11

Would a FAI reward us for helping create it?

by TwistingFingers

30th Dec 2011

1 min read

-11

We expect that post-singularity there will still be limited resources in the form of available computational resources until heat death.

Those resources do not necessarily need to be allocated fairly. In fact, I would guess that if they were allocated unfairly the most like beneficiaries would be those people that helped contribute to the creation of a friendly AI.

Now for some open questions:

What probability distribution of extra resources do you expect with respect to various possible contributions to the creation of friendly AI?

Would donating to the SIAI suffice for acquiring these extra resources?

Personal Blog

-11

New Comment

22 comments, sorted by

top scoring

Click to highlight new comments since: Today at 6:45 PM

[-]wedrifid14y50

Would a FAI reward us for helping create it?

Iff we program it to.

Trite but true. This isn't a question about fundamental behavior of AIs. It's a question of what the preferences the GAI creators wanted to impart on their AI and how well they managed to implement them. An AI that rewards to some degree could qualify as friendly but it doesn't seem to be a requirement.

Here's another question: If a group of people cooperated to save you and your species from near certain death and gave you and those dear to you an unbounded life of general awesomeness would you reward them? If so then an FAI may just well reward them too. If most people would reward in that circumstance then an FAI could plausibly also reward. But I don't pretend to know what people's extrapolated volition looks like or how the most likely FAI would be implemented.

[-]Vladimir_Nesov14y100

Compare:

Q: Would a calculator answer "59" if asked "7*8="?
A: Iff we program it to. Trite but true. This isn't a question about fundamental behavior of calculators.

"FAI" is a rather specific kind of program, and it won't do any given thing just because its programmers wanted it to, its behavior isn't controlled by what its programmers want, not in any reasonably direct way; just as the correct answer to what 7*8 is, isn't controlled by what calculator's designers want. If it does answer "7*8=59", it's not a calculator.

[-]wedrifid14y-30

Compare:

Q: Would a calculator answer "59" if asked "7*8="? A: Iff we program it to. Trite but true. This isn't a question about fundamental behavior of calculators.

Comparison result: NOT EQUAL. (For multiple reasons, come to think of it. Those being multiple results, parameterisation, and currently being ambiguously specified.)

"FAI" is a rather specific kind of program

My comment rather clearly assumed for that and further asserted that:

An AI that rewards to some degree could qualify as friendly but it doesn't seem to be a requirement.

That is, there is a class of artificial intelligence algorithms which can be considered 'friendly' and within that class there are algorithms that would reward and other algorithms which would not reward. This is in stark contrast to other behaviors which could be output by algorithms which would necessarily exclude them from being in the class 'friendly' - such as torturing or killing anyone I cared about.

[-]Vladimir_Nesov14y20

My point is that specific behaviors is not the kind of thing that we can make decisions about in programming a FAI, so I don't see how "iff we program it to" applies to a question of plausibility of a specific behavior. Rather, we can talk of which behaviors seem more or less plausible given what abstract properties the idea of "FAI" assumes, and depending on other parameters that influence a particular variant of its implementation (such as whether it optimizes human or chimp values). So on that level, it's not plausible that FAI would start torturing people or maximizing paperclips, and these properties are not within variation of what the concept includes.

there is a class of artificial intelligence algorithms which can be considered 'friendly' and within that class there are algorithms that would reward and other algorithms which would not reward.

"Things that are mostly Friendly" is a huge class in which humanly constructible FAIs are a tiny dot (I expect we can either do a perfect job or none at all, while it's theoretically but not humanly possible to create almost-perfect-but-not-quite FAIs). I'm talking about that dot, and I expect within that dot, the answer to this question is determined one way or the other, and we don't know which. Is it actually the correct decision to "reward FAI's creators"? If it is, FAI does it, if it's not, FAI doesn't do it. Whether programmers want it to be done doesn't plausibly influence whether it's the correct thing to do, and FAI does the correct thing, or it's not a FAI.

(More carefully, it's not even clear what the question means, since it compares counterfactuals, and there is still no reliable theory of counterfactual reasoning. Like, "What do you mean, if we did that other thing? Look at what actually happened." More usefully, the question is probably wrong in the sense that it poses a false dilemma, assumes things some of which will likely break.)

[-]wedrifid14y10

My point is that specific behaviors is not the kind of thing that we can make decisions about in programming a FAI, so I don't see how "iff we program it to" applies to a question of plausibility of a specific behavior.

There is more than one way to program an FAI - see for example CEV which is currently ambiguous. There are also different individuals or groups of individuals which an AI can be friendly to and still qualify as "Friendly Enough" to warrant the label. It is likely that the actual (and coherently extrapolatable) preferences of humans differ with respect to whether rewarding AI-encouragers is a good thing.

"Things that are mostly Friendly" is a huge class in which humanly constructible FAIs are a tiny dot (I expect we can either do a perfect job or none at all, while it's theoretically but not humanly possible to create almost-perfect-but-not-quite FAIs). I'm talking about that dot, and I expect within that dot, the answer to this question is determined one way or the other, and we don't know which.

I'm pleasantly surprised. It seems that we disagree with respect to actual predictions about the universe rather than the expected, and more common "just miscommunication/responding to a straw man". Within that dot the answer is not determined!

Whether programmers want it to be done doesn't plausibly influence whether it's the correct thing to do, and FAI does the correct thing, or it's not a FAI.

I'm familiar with the point - and make it myself rather frequently. It does not apply here - due to the aforementioned rejection of the "determined within the dot" premise.

[-]Will_Newsome14y00

How likely do you think it is that all humanly-buildable AGIs converge on whatever FAI converges on in less time than it takes for a typical black hole to evaporate? (Eghggh. Time breaks down around singularities (at least from a human perspective) so I can't phrase this right, but maybe you get my gist.)

[-][anonymous]14y00

(Just finished updating my reply, hopefully resolving some ambiguities present in its original form.)

[This comment is no longer endorsed by its author]Reply

[-]timtyler14y00

Would a FAI reward us for helping create it?

Iff we program it to.

Trite but true. [...]

This question is perhaps best reimagined by asking whether this is a universal instrumental value.

Then the answer is, fairly clearly, yes.

[-]ArisKatsaris14y30

So you decided to bring it up yourself, seriously what the hell? Isn't that like pushing someone over a cliff while saying "Careful, you were getting near the edge"?

[-]Manfred14y30

Nah, I don't think there's any percentage in acting like that for the AI. You'll just have to do it for the not-getting-turned-into -smileyfaces :P

[-][anonymous]14y20

You're getting into basilisk territory here. It probably makes no difference, at least at this point in time, when nobody has any clue what the most probable superintelligences will look like, and what their exact policies will be on this sort of thing. Although I guess we can assume that a superintelligence will use whatever policy has the highest likelihood of making itself come into existence, which as far as I can tell, is not necessarily the same as whatever policy has the highest likelihood of getting you to contribute to its creation.

Be thankful for that and pre-commit to always refuse extortion, before anyone does figure this out.

[-]Shmi14y20

We expect that post-singularity there will still be limited resources in the form of available computational resources until heat death.

Personally, I expect the FAI to give a baby universe to everyone who wants one, so the question is moot.

If not, I do not expect the FAI to care about past contributions, since its goal would be to maximize something like integral of (fun*population) over time, so the people with the highest fun/resource ratio would be rewarded, most likely those with the lowest IQ, as they would be happy to be injected with the fun drug and kept in suspended animation for as long as possible.

[-]orthonormal14y50

its goal would be to maximize something like integral of (fun*population) over time

That's not what LW refers to as an FAI, but instead a failed FAI. See posts like this one and this one, and this wiki entry.

[-]Shmi14y-10

I mean it in this sense.

[-]orthonormal14y60

I would bet US$100 that, if asked, Eliezer would say that

the people with the highest fun/resource ratio would be rewarded, most likely those with the lowest IQ, as they would be happy to be injected with the fun drug and kept in suspended animation for as long as possible.

shows a complete misinterpretation of Fun Theory.

I'm not dismissing the possibility of your scenario, just pointing out that SIAI is explicitly excluding that type of outcome from their definition of "Friendly".

[-]Shmi14y-10

I'm not dismissing the possibility of your scenario, just pointing out that SIAI is explicitly excluding that type of outcome from their definition of "Friendly".

Only under the unlimited resources assumption, which is not the case here.

[-]Eliezer Yudkowsky13y10

I am explicitly calling that unFriendly given bounded resources.

[-]wedrifid14y30

Personally, I expect the FAI to give a baby universe to everyone who wants one, so the question is moot.

I don't know what I expect but that is certainly what I want it to do.

[-]James_Miller14y10

If they were allocated unfairly the most like beneficiaries would be those people that helped contribute to the creation of a friendly AI.

No, fairly.

[-]endoself14y00

I think they meant 'equally'.

[-]Estarlio14y00

You made a certain prediction about the AI's likely behaviour, and you either did or didn't contribute to the AI's creation based on that. However, whether or not it rewards you wont change that prediction, nor will it change whether you're the sort of thing that will act in a certain manner based on that prediction.

[-]Morendil14y00

I would guess that if they were allocated unfairly the most like beneficiaries would be those people that helped contribute

Why?

Moderation Log