You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

pengvado comments on Singularity FAQ - Less Wrong Discussion

16 Post author: lukeprog 19 April 2011 05:27PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (34)

You are viewing a single comment's thread. Show more comments above.

Comment author: Alexandros 22 April 2011 11:22:58AM 1 point [-]

(A) would be the case if the utility function was 'create a world where human desires don't need to be thwarted'. (and even then, depends on the definition of human). But the constraint is 'don't thwart human desires'.

I don't understand (B). If I desire to be able to change my mind, (which I do), wouldn't not being allowed to do so thwart said desire?

I also don't really understand how the result of (C) comes about.

Comment author: pengvado 23 April 2011 01:33:31AM *  1 point [-]

(B): If the static utility function is not based on object-level desires, and instead only on your desire to be able to change your mind and then get whatever you end up deciding on, but you haven't yet decided what to change it to, then that makes the scenario more like (A). The AI has every incentive to find some method of changing your mind to something easy to satisfy, that doesn't violate the desire of not having your head messed with. Maybe it uses extraordinarily convincing ordinary conversation? Maybe it manipulates which philosophers you meet? Maybe it uses some method you don't even understand well enough to have a preference about? I don't know, but you've pitted a restriction against the AI's ingenuity.

(C): Consider two utility functions, U1 based on the desires of humans at time t1, and U2 based on the desires of humans at time t2. U1 and U2 are similar in some ways (depending on how much one set of humans resembles the other), but not identical, and in particular they will tend to have maxima in somewhat different places. At t1, there is an AI with utility function U1, and also with a module that repeatedly scans its environment and overwrites the utility function with the new observed human desires. This AI considers two possible actions: it can self-improve while discarding the utility-updating module and thus keep U1, or it can self-improve in such a way as to preserve the module. The first action leads to a future containing an AI with utility function U1, which will then optimize the world into a maximum of U1. The second action leads to a future containing an AI with utility function U2, which will then optimize the world into a maximum of U2, which is not a maximum of U1. Since at t1 the AI decides by the criteria of U1 and not U2, it chooses the first action.