jknapka comments on The genie knows, but doesn't care - Less Wrong

54 Post author: RobbBB 06 September 2013 06:42AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (515)

You are viewing a single comment's thread. Show more comments above.

Comment author: Transfuturist 03 September 2013 09:00:04PM -1 points [-]

What if the AI's utility function is to find the right utility function, being guided along the way? Its goals could be such as learning to understand us, obey us, and predict what we might want/like/approve, moving its object-level goals to what would satisfy humanity? In other words, a probabilistic utility function with great amounts of uncertainty, and great amounts of apprehension to change, or stability.

Regardless of the above questions/statement, I think much of the complexity of human utility comes from complexities of belief.

If we offload complexity of the AI's utility function into very uncertainly defined concepts, and give it an apprehension to do anything but observe given such little data... I don't know, though. This has been something I've been sitting on for a while, lambast me.

As one last thing, I think the best kind of FAI would be a singleton, with a metautility function, or society's utility function. I think one part of Friendliness would be determining a utility function for society, as to how people can interfere with each other in what circumstances, and then build the genie's utility function in the singleton's constraints.

Please critique. If my ideas are as unclear as I think they may be (I'm sick), please mention it.

Comment author: jknapka 04 September 2013 01:55:55AM 0 points [-]

(I am in the midst of reading the EY-RH "FOOM" debate, so some of the following may be less informed than would be ideal.)

From a purely technical standpoint, one problem is that if you permit self-modification, and give the baby AI enough insight into its own structure to make self-modification remotely a useful thing to do (as opposed to making baby repeatedly crash, burn, and restore from backup), then you cannot guarantee that utility() won't be modified in arbitrary ways. Even if you store the actual code implementing utility() in ROM, baby could self-modify to replace all references to that fixed function with references to a different (modifiable) one.

What you need is for utility() to be some kind of fixed point in utility-function space under whatever modification regime is permitted, or... something. This problem seems nigh-insoluble to me, at the moment. Even if you solve the theoretical problem of preserving those aspects of utility() that ensure Friendliness, a cosmic-ray hit might change a specific bit of memory and turn baby into a monster. (Though I suppose you could arrange, mathematically, for that particular possibility to be astronomically unlikely.)

Comment author: wgd 07 September 2013 08:36:23AM 0 points [-]

I think the important insight you may be missing is that the AI, if intelligent enough to recursively self-improve, can predict what the modifications it makes will do (and if it can't, then it doesn't make that modification because creating an unpredictable child AI would be a bad move according to almost any utility function, even that of a paperclipper). And it evaluates the suitability of these modifications using its utility function. So assuming the seed AI is build with a sufficiently solid understanding of self-modification and what its own code is doing, it will more or less automatically work to create more powerful AIs whose actions will also be expected to fulfill the original utility function, no "fixed points" required.

There is a hypothetical danger region where an AI has sufficient intelligence to create a more powerful child AI, isn't clever enough to predict the actions of AIs with modified utility functions, and isn't self-aware enough to realize this and compensate by, say, not modifying the utility function itself. Obviously the space of possible minds is sufficiently large that there exist minds with this problem, but it probably doesn't even make it into the top 10 most likely AI failure modes at the moment.

Comment author: Transfuturist 04 September 2013 03:16:02AM *  0 points [-]

I'm not so sure about that particular claim for volatile utility. I thought intelligence-utility orthogonality would mean that improvements from seed AI would not EDIT: endanger its utility function.

Comment author: hairyfigment 04 September 2013 05:03:19AM -1 points [-]

...What? I think you mean, need not be in danger, which tells us almost nothing about the probability.

Comment author: Transfuturist 04 September 2013 04:07:00PM 0 points [-]

Sorry, it was a typo. I edited it to reflect my probable meaning.