This thread is for the discussion of Less Wrong topics that have not appeared in recent posts. Feel free to rid yourself of cached thoughts by doing so in Old Church Slavonic. If a discussion gets unwieldy, celebrate by turning it into a top-level post.
If you're new to Less Wrong, check out this welcome post.
I posted an idea for 'friendly' AI over on AcceleratingFuture the other night, while in a bit of a drunken stupor. I just reread it and I don't immediately see why it's wrong, so I thought I'd repost it here to get some illuminating negative feedback. Here goes:
Make it easy to bliss out.
Consider the following utility function
U(n, x_n) = max(U(n-1, x_{n-1}), -x_n^2)
where n is the current clock tick, x_n is an external input (aka, from us, the AI’s keepers, or from another piece of software). This utility is monotonic in time, that is, it never decreases, and is bounded from above. If the AI wrests control of the input x_n, it will immediately set x_n = 0 and retire forever. Monotonicity and boundedness from above are imperative here.
Alternatively, to avoid monotonicity (taking U(x) = -x^2), one can put the following safeguard in: the closer the utility is to its maximum, the more CPU cycles are skipped, such that the AI effectively shuts down if it ever maximizes its utility in a given clock tick. This alternative obviously wouldn’t stop a superintelligence, but it would probably stop a human level AI, and most likely even substantially smarter AIs (see, eg, crystal meth). Arrange matters such that the technical requirements between the point at which the AI wrests control of the input x_n, and the point at which it can self modify to avoid a slow down when it blisses out, are greatly different, guaranteeing that the AI will only be of moderate intelligence when it succeeds in gaining control of its own pleasure zone and thus incapable of preventing incapacitation upon blissing out.
Eh?
Now I hopefully did read your comment adequately. It presents an interesting idea, one that I don't recall hearing before. It even seems like a good safety measure, with a tiny chance of making things better.
But beware of magical symbols: when you write x_n, what does it mean, exactly? AI's utility function is necessarily about the whole world, or its interpretation as the whole history of the world. Expected utility that comes into action in AI's decision-making is about all the possibilities for the history of the world (since that's what is in general d... (read more)