Mitchell_Porter comments on Imagine a world where minds run on physics - Less Wrong

12 Post author: cousin_it 31 October 2010 07:09PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (29)

You are viewing a single comment's thread.

Comment author: Mitchell_Porter 01 November 2010 02:40:20AM 2 points [-]

Some cognitive architectures intrinsically exhibit instability of values (e.g. those where goals compete stochastically for priority), but Omohundro's drive to protect the utility function from modification should prevent a self-modifying AI with a stable architecture from adopting an architecture that is knowably or even possibly unstable.

However, the human cognitive architecture certainly looks to have value instability, and so this will be a problem for any attempt to codify a fixed human-friendly utility function by renormalizing the existing unstable architecture. Omohundro's drive won't automatically work here since the starting point isn't stable. It's also very possible that there's more than one reflectively stable equilibrium that can be obtained starting from the human decision architecture, because of its stochastic or context-dependent aspects.

Comment author: cousin_it 01 November 2010 10:09:26AM *  0 points [-]

Omohundro's drive to protect the utility function from modification

The machines in my post have no such drive coded in, and this isn't a problem. Just having a utility function over universes works out fine: if there's an action that makes the universe end up in the desired state, the computer will find it and do it. If there's uncertainty about possible interference, it will be taken into account.

Comment author: Mitchell_Porter 02 November 2010 01:36:50AM 2 points [-]

Omohundro's drives are emergent behaviors expected in any sufficiently advanced intelligence, not something that gets coded in at the beginning.

Comment author: cousin_it 02 November 2010 09:33:36AM 0 points [-]

Oh. Thanks.

Comment author: Vladimir_Nesov 02 November 2010 09:55:18AM 1 point [-]

There is also nothing to say that the eventual stable preference will have anything to do with the initial one, while the post argued about the initial utility. In this sense, Omohundro's argument is not relevant.