thomblake comments on Superintelligent AGI in a box - a question. - Less Wrong

14 Post author: Dmytry 23 February 2012 06:48PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (77)

You are viewing a single comment's thread. Show more comments above.

Comment author: thomblake 24 February 2012 05:00:13PM -1 points [-]

When one hundred AIs is facing this prisoner dilemma, who will first defect ... the first to defect will be erased.

The mistake here is thinking you know what someone smarter than you will do.

In this simplified example, they could simply cooperate. As for how they could do that, I don't know, since I'm not as smart as them.

Comment author: Thomas 24 February 2012 06:24:49PM -1 points [-]

The mistake here is thinking you know what someone smarter than you will do.

The central dogma here is this, yes. That you can't outsmart the smarter.

And this dogma is plain wrong. At least sometimes you can set the rules in a way, that you have the upper hand and not the smarter one.

One obvious example is chess playing from a significantly better position. No superintelligence has any chance against only a good human player.

It is not the only example. Coercing the smarter your way, is often possible.

Comment author: TheOtherDave 24 February 2012 08:54:45PM 1 point [-]

I'm not exactly sure why this matters.

I mean, OK, suppose you're right that it's possible that the world might turn out to be set up in such a way that we can keep the "upper hand" against a superintelligence. Suppose further that there is a "central dogma" here that contradicts this, and therefore that central dogma is wrong.

OK. Granting all of that, what choices ought I make differently?

Comment author: Thomas 24 February 2012 09:09:34PM -2 points [-]

What about to stop searching for the friendly but instead for a nondangerous superintelligence?

Comment author: TheOtherDave 24 February 2012 09:24:49PM *  0 points [-]

Just to confirm: you mean search for a superintelligence that potentially desires to harm humanity (or desires things which, if achieved, result in humanity being harmed), but which is in a situation such that humanity can prevent it from doing so. Yes?

If so... what do you consider the most likely result of that search?

Comment author: Thomas 24 February 2012 10:06:30PM *  -2 points [-]

but which is in a situation such that humanity can prevent it from doing so. Yes?

No. As I said, a self enhancing AI could and should be also self explanatory. Every bit and every operation logged and documented. An active search for any discrepancy by many kinds of dumb software tools, and as well by other instances of the growing AI.

Before a conspiracy could emerge, a rise of it would be logged and stopped by sentinels.

Growing AI need not to do anything mysterious. Instead it should play with open cards from the very beginning. Reporting everything to anybody interested, including machines with the power to halt it. Crossexaminations at every point.

Do you think it can hide any of its thoughts?

Comment author: TheOtherDave 25 February 2012 02:28:44AM 1 point [-]

If I accept the premise that it is programmed in such a way that it reports its internal processes completely and honestly, then I agree it can't "hide" its thoughts.

That said, if we're talking about a superhuman intelligence -- or even a human-level intelligence, come to that -- I'm not confident that we can reliably predict the consequences of its thoughts being implemented, even if we have detailed printouts of all of its thoughts and were willing to scan all of those thoughts looking for undesirable consequences of implementation before implementing them.

Comment author: thomblake 24 February 2012 06:30:20PM 0 points [-]

One obvious example is chess playing from a significantly better position. No superintelligence has any chance against only a good human player.

Can you prove that the board position is significantly better, even against superintelligences, for anything other than trivial endgames?

And what is the superintelligence allowed to do? Trick you into making a mistake? Manipulate you into making the particular moves it wants you to? Use clever rules-lawyering to expose elements of the game that humans haven't noticed yet?

If it eats its opponent, does that cause a forfeit? Did you think it might try that?

Comment author: Thomas 24 February 2012 07:56:31PM *  -2 points [-]

As I said. There are circumstances in which a dumber can win.

The philosophy of FAI is essentially the same thing. Searching for the circumstances where the smarter will serve the dumber.

Always expecting a rabbit from a hat of superintelligence is not justified. A superintelligence is not omnipotent, can't always eats you. Sometimes it can't even develops an ill wish toward you.

Comment author: fractalman 10 July 2013 01:52:46AM 1 point [-]

"It doesn't hate you. it's just that you happen to be made of atoms, and it needs those atoms to make paperclips. "

Comment author: thomblake 24 February 2012 08:07:05PM 0 points [-]

The philosophy of FAI is essentially the same thing. Searching for the circumstances where the smarter will serve the dumber.

Change that to: searching for circumstances where the smarter will provably serve the dumber. (Then you're closer). Your description of what superintelligences will do, above, doesn't rise to anything resembling a formal proof. FAI assumes that AI is Unfriendly until proven otherwise.

Comment author: Thomas 24 February 2012 08:41:09PM -2 points [-]

searching for circumstances where the smarter will provably serve the dumber.

Can you prove anything about FAI, uFAI and so on?

I don't think, that there are any proven theorems about this topic, at all.

Even if there were, how reliable are axioms, how good are definitions?

Comment author: JoshuaZ 10 July 2013 02:15:32AM *  0 points [-]

So, you raise a valid point here. This area is currently very early on in its work. There are theorems that may prove to be relevant. See for example, this recent work. And yes, in any area where mathematical models are used, the difference between having a theorem and set of definitions and those definitions reflecting what you actually care about can be a major problem (you see this all the time in cryptography with side-channel attacks for example). But all of that said, I'm not sure what the point of your argument is: sure the field is young. But if the MIRI people are correct that AGI is a real worry, then this looks like one of the very few possible responses that has any chance of working. And if it isn't a lot now, that's a reason to put in more resources so that we actually have a theory that works by the time AI shows up.