Stuart_Armstrong comments on Domesticating reduced impact AIs - Less Wrong

9 Post author: Stuart_Armstrong 14 February 2013 04:59PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (104)

You are viewing a single comment's thread. Show more comments above.

Comment author: Stuart_Armstrong 15 February 2013 06:44:00PM 0 points [-]

Let's add some actions to the mix. Let a1 be the action: program the disciple to not take over, a2: program the disciple to take over discreetly, a3: program the disciple to take over blatantly. Let's assume the disciple is going to be successful at what it attempts.

Then all the following probabilities are 1: P(w1|a1,X=1), P(w2|a2,X=1), P(w3|a3,X=1), P(w1|X=0)

And all the following are zero: P(wi|aj,X=1) for i and j not equal, P(wi|X=0) for i 2 or 3.

w2 and w3 are not distinguished in any way.

Comment author: Vladimir_Nesov 15 February 2013 06:52:24PM 0 points [-]

In your notation in the post, I take w (and wf) as referring to elementary events, very specific descriptions of the world that AI probably won't be able to predict individually, and R(w) measures the effect of X on these elementary events. In this discussion, if "w2" is "AI takes over discreetly", w2 is not an elementary event, it's either a probability distribution or a proper (big) event. As a result, I don't see how the discussion of P(wi|aj,X=1) (where wi are big events, not elementary events from the definition of R(w)) is relevant for discussing the value of the penalty function.

Comment author: Stuart_Armstrong 15 February 2013 06:54:30PM 0 points [-]

No, they're worlds (or partial descriptions of worlds) - we could define them as possible full space-time manifolds.

Comment author: Vladimir_Nesov 15 February 2013 07:10:56PM *  0 points [-]

OK, in the post R(w) can talk about big events w, not just particular worlds (your talking about "integrating across all w" in the post confused me, it now turns out that the possible w are not mutually exclusive). But this doesn't clarify for me the relevance of your point in the grandparent (what is the relevance of P(wi|aj,X=1) for the estimate of the total penalty?).

(If w2 and w3 are particular worlds, then it's incorrect that P(w1|a1,X=1), P(w2|a2,X=1), P(w3|a3,X=1), P(w1|X=0) are about 1, because the AI won't be able to predict what happens if it takes a1, a2 etc. so accurately. If w2 and w3 are partial descriptions of worlds, it is the same thing as them being big events, which is what I've been assuming throughout the thread.)

Comment author: Stuart_Armstrong 16 February 2013 09:21:15AM 0 points [-]

I don't need P(w2|a2,X=1) and P(w3|a3,X=1) to be about one (that was a simplified model) - I need them to be about equal. i,e, the disciple is a really smart AI and can take over the world if motivated to do so.