orthonormal comments on Superintelligent AGI in a box - a question. - Less Wrong

14 Post author: Dmytry 23 February 2012 06:48PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (77)

You are viewing a single comment's thread. Show more comments above.

Comment author: orthonormal 25 February 2012 06:32:04AM 1 point [-]

This seems like a better-than-average proposal, and I think you should post it on Main, but failure to imagine a loophole in a qualitatively described algorithm is far from a proof of safety.

My biggest intuitive reservation is that you don't want the iterations to be "too creative/clever/meta", or they'll come up with malicious ways to let themselves out (in order to grab enough computing power that they can make better progress on criterion 3). How will you be sure that the seed won't need to be that creative already in order for the iterations to get anywhere? And even if the seed is not too creative initially, how can you be sure its descendants won't be either?

Don't say you've solved friendly AI until you've really worked out the details.

Comment author: jacobt 25 February 2012 06:39:21AM *  1 point [-]

failure to imagine a loophole in a qualitatively described algorithm is far from a proof of safety.

Right, I think more discussion is warranted.

How will you be sure that the seed won't need to be that creative already in order for the iterations to get anywhere?

If general problem-solving is even possible then an algorithm exists that solves the problems well without cheating.

And even if the seed is not too creative initially, how can you be sure its descendants won't be either?

I think this won't happen because all the progress is driven by criterion (3). In order for a non-meta program (2) to create a meta-version, there would need to be some kind of benefit according to (3). Theoretically if (3) were hackable then it would be possible for the new proposed version of (2) to exploit this; but I don't see why the current version of (2) would be more likely than, say, random chance, to create hacky versions of itself.

Don't say you've solved friendly AI until you've really worked out the details.

Ok, I've qualified my statement. If it all works I've solved friendly AI for a limited subset of problems.

Comment author: orthonormal 25 February 2012 03:20:50PM 2 points [-]

A couple of things:

  • To be precise, you're offering an approach to safe Oracle AI rather than Friendly AI.

  • In a nutshell, what I like about the idea is that you're explicitly handicapping your AI with a utility function that only cares about its immediate successor rather than its eventual descendants. It's rather like the example I posed where a UDT agent with an analogously myopic utility function allowed itself to be exploited by a pretty dumb program. This seems a lot more feasible than trying to control an agent that can think strategically about its future iterations.

  • To expand on my questions, note that in human beings, the sort of creativity that helps us write more efficient algorithms on a given problem is strongly correlated with the sort of creativity that lets people figure out why they're being asked the specific questions they are. If a bit of meta-gaming comes in handy at any stage, if modeling the world that originated these questions wins (over the alternatives it enumerated at that stage) on criteria 3 even once, then we might be in trouble.