Vladimir_Nesov comments on Open Thread: November 2009 - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (539)
I posted an idea for 'friendly' AI over on AcceleratingFuture the other night, while in a bit of a drunken stupor. I just reread it and I don't immediately see why it's wrong, so I thought I'd repost it here to get some illuminating negative feedback. Here goes:
Make it easy to bliss out.
Consider the following utility function
U(n, x_n) = max(U(n-1, x_{n-1}), -x_n^2)
where n is the current clock tick, x_n is an external input (aka, from us, the AI’s keepers, or from another piece of software). This utility is monotonic in time, that is, it never decreases, and is bounded from above. If the AI wrests control of the input x_n, it will immediately set x_n = 0 and retire forever. Monotonicity and boundedness from above are imperative here.
Alternatively, to avoid monotonicity (taking U(x) = -x^2), one can put the following safeguard in: the closer the utility is to its maximum, the more CPU cycles are skipped, such that the AI effectively shuts down if it ever maximizes its utility in a given clock tick. This alternative obviously wouldn’t stop a superintelligence, but it would probably stop a human level AI, and most likely even substantially smarter AIs (see, eg, crystal meth). Arrange matters such that the technical requirements between the point at which the AI wrests control of the input x_n, and the point at which it can self modify to avoid a slow down when it blisses out, are greatly different, guaranteeing that the AI will only be of moderate intelligence when it succeeds in gaining control of its own pleasure zone and thus incapable of preventing incapacitation upon blissing out.
Eh?
Expected utility is not something that "goes up", as the AI develops. It's utility of all it expects to achieve, ever. It may obtain more information about what the outcome will be, but each piece of evidence is necessarily expected to bring the outcome either up or down, with no way to know in advance which way it'll be.
Can you elaborate? I understand what you wrote (I think) but don't see how it applies.
Hmm, I don't see how it applies either, at least under default assumptions -- as I recall, this piece of cached thought was regurgitated instinctively in response to sloppily looking through your comment and encountering the phrase
which was for some reason interpreted as confusing utility with expected utility. My apologies, I should be more conscious, at least about the things I actually comment on...
No worries. I'd still be curious to hear your thoughts, as I haven't received any responses that help me understand how this utility function might fail. Should I expand on the original post?