wedrifid comments on The mathematics of reduced impact: help needed - Less Wrong

10 Post author: Stuart_Armstrong 16 February 2012 02:23PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (94)

You are viewing a single comment's thread. Show more comments above.

Comment author: wedrifid 20 February 2012 07:20:40AM *  1 point [-]

I think that he meant indifferent rather than malicious

For most part, yes. And my first paragraph reply represents my reply to the meaning of 'unFriendly' rather than just the malicious subset thereof.

Instead, we make an Oracle AI with an approximation to our utility function. Then, the AI will act so as to use its output to get us to accomplish its goals, which are only mostly aligned with ours.

That is an interpretation that directly contradicts the description given - it isn't compatible with not caring about the future beyond an hour - or, for that matter, actually being an 'oracle' at all. If it was the intended meaning then my responses elsewhere would not have been cautious agreement but instead something along the lines of:

What the heck? You're creating a complete FAI then hacking an extreme limitation onto the top? Well, yeah, that's going to be safe - given that it is based on a tautologically safe thing but it is strictly worse than the FAI without restrictions.

Comment author: endoself 21 February 2012 12:00:28AM 0 points [-]

Instead, we make an Oracle AI with an approximation to our utility function. Then, the AI will act so as to use its output to get us to accomplish its goals, which are only mostly aligned with ours.

That is an interpretation that directly contradicts the description given - it isn't compatible with not caring about the future beyond an hour - or, for that matter, actually being an 'oracle' at all.

I was thinking of some of those extremely bad questions that are sometimes proposed to be asked of an oracle AI: "Why don't we just ask it how to make a lot of money.", etc. Paul's example of asking it to give the output that gets us to press the reward button falls into the same category (unless I'm misinterpreting what he meant there?).