TheAncientGeek comments on If epistemic and instrumental rationality strongly conflict - Less Wrong

5 [deleted] 10 May 2012 01:46PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (53)

You are viewing a single comment's thread. Show more comments above.

Comment author: TheAncientGeek 29 March 2014 03:37:18PM 1 point [-]

I don't know what you mean by MY rationality. People who teach rationality teach the same aims and rules to everyone.

You have tacitly assumed that a knowledge-valuing SAI would never realise that turning people into computronium is wrong...that it would never understand morality, or that moral truths cannot be discovered irrespective of cognitive resources.

Comment author: DanielLC 29 March 2014 08:26:51PM 0 points [-]

I don't know what you mean by MY rationality. People who teach rationality teach the same aims and rules to everyone.

You are suggesting we teach this AI that knowledge is all that matters. These are certainly not the aims I'd teach everyone, and I'd hope they're not the aims you'd teach everyone.

You have tacitly assumed that a knowledge-valuing SAI would never realise that turning people into computronium is wrong

It may realise that, but that doesn't mean it would care.

It might care, and it would still be a pretty impressive knowledge-maximizer if it did, but not nearly as good as one that didn't.

Of course, that's just arguing definitions. The point you seem to be making is that the terminal values of a sufficiently advanced intelligence converge. That it would be much more difficult to make an AI that could learn beyond a certain point, and continue to pursue its old values of maximizing knowledge, or whatever they were.

I don't think values and intelligence are completely orthogonal. If you built a self-improving AI without worrying about giving it a fixed goal, there probably are values that it would converge on. It might decide to start wireheading, or it might try to learn as much as it can, or it might generally try to increase its power. I don't see any reason to believe it would necessarily converge on a specific one.

But let's suppose that it does always converge. I still think there's protections it could do to prevent its future self from doing that. It might have a subroutine that takes the outside view, and notices that it's not maximizing knowledge as much as it should, and tweaks its reward function against its bias to being moral. Or it might predict the results of an epiphany, notice that it's not acting according to its inbuilt utility function, declare the epiphany a basilisk, and ignore it.

Or it might do something I haven't thought of. It has to have some way to keep itself from wireheading and whatever other biases might naturally be a problem of intelligence.