Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: Gondolinian 13 May 2015 09:49:05PM 2 points [-]
Comment author: Stuart_Armstrong 14 May 2015 10:30:42AM 2 points [-]

Thanks!

High impact from low impact, continued

2 Stuart_Armstrong 28 April 2015 12:58PM

The idea of splitting a high impact task between two low-impact AIs has on critical flaw. AI X is aiming for low impact, conditional on ¬Y (the other AI not being turned on, or not outputting a message, or something similar). "Outputting the right coordinates" is one way that X can accomplish its goal. However, there is another way it can do it: "create a robot that will output the right coordinates if ¬Y, and [do something else] if Y."

That's a dangerous situation to be in, especially if we have a more general situation that the "laser aiming at the asteroid". But note that if X does create such a robot, and if ¬Y is actually true, then that robot must be low impact and not dangerous, since that's X's programming. Since X cannot predict all the situations the robot would encounter, the robot is probably generically "safe" and low impact.

Therefore, if the robot behaves the same way under Y and ¬Y, we're good.

How could we achieve that? Well, we could adapt my idea from "restrictions that are hard to hack". If a hypothetical superintelligent AI C observed the output stream from X, could it deduce that Y vs ¬Y was something important in it? If C knew that X was conditioning on ¬Z, but didn't know Z=Y, could it deduce that? That seems like a restriction that we could program into X, as a third component of its utility (the first being the "do what we want" component, the second being the "have a reduced impact conditional on ¬Z" one).

And if we have a "safe" robot, given ¬Y, and the programming of that robot does not (explicitly or implicitly) mention Y or its features, we probably have a safe robot.

The idea still needs to be developed and some of the holes patched, but I feel it has potential.

Comment author: SilentCal 22 April 2015 08:05:12PM 0 points [-]

You'd better be sure the AI doesn't have any options with dire long-term consequences that asteroid impact would prevent.

"I'll just start a Ponzi scheme to buy more computing hardware. To keep my impact low, I'll only market to people who won't want to withdraw anything before the asteroid hits, and who would just be saving the money anyway."

Comment author: Stuart_Armstrong 23 April 2015 09:18:54AM 0 points [-]

Yes, there are always issues like these :-) It's not a general solution.

Comment author: TimFreeman 22 April 2015 05:52:47AM 1 point [-]

Humans can be recognized inductively: Pick a time such as the present when it is not common to manipulate genomes. Define a human to be everyone genetically human at that time, plus all descendants who resulted from the naturally occurring process, along with some constraints on the life from conception to the present to rule out various kinds of manipulation.

Or maybe just say that the humans are the genetic humans at the start time, and that's all. Caring for the initial set of humans should lead to caring for their descendants because humans care about their descendants, so if you're doing FAI you're done. If you want to recognize humans for some other purpose this may not be sufficient.

Predicting human behavior seems harder than recognizing humans, so it seems to me that you're presupposing the solution of a hard problem in order to solve an easy problem.

An entirely separate problem is that if you train to discover what humans would do in one situation and then stop training and then use the trained inference scheme in new situations, you're open to the objection that the new situations might be outside the domain covered by the original training.

Comment author: Stuart_Armstrong 22 April 2015 10:58:48AM 0 points [-]

Define a human to be everyone genetically human at that time, plus all descendants who resulted from the naturally occurring process, along with some constraints on the life from conception to the present to rule out various kinds of manipulation.

That seems very hard! For instance, does that not qualify molar pregnancies as people, twins as one person and chimeras as two? And it's hard to preclude manipulations that future humans (or AIs) may be capable of.

Or maybe just say that the humans are the genetic humans at the start time, and that's all.

Easier, but still a challenge. You need to identify a person with the "same" person at a later date - but not, for instance, with list skin cells or amputated limbs. What of clones, if we're using genetics?

It seems to me that identifying people imperfectly (a "crude measure", essentially http://lesswrong.com/lw/ly9/crude_measures/ ) is easier and safer than modelling people imperfectly. But doing it throughly, then the model seems better, and less vulnerable to unexpected edge cases.

But the essence of the idea is to exploit something that a superintelligent AI will be doing anyway. We could similarly try and use any "human identification" algorithm the AI would be using anyway.

Comment author: TheAncientGeek 18 April 2015 09:30:22AM *  0 points [-]

As the author of the phrase, I meant "just social constructs" to be an ontological statement.

Are you saying they are actually realists about germs and atoms, and are stating their position dishonetly? Do you think "is real" is just a label in some unimportant way?

Comment author: Stuart_Armstrong 20 April 2015 10:58:08AM 0 points [-]

Do you think "is real" is just a label in some unimportant way?

Maybe. I'm not entirely sure what your argument is. For instance, were the matrices of matrix mechanics quantum physics "real"? Were the waves of the wave formulation of QM "real"? The two formulations are equivalent, and it doesn't seem useful to debate the reality of their individual idiosyncratic components this way.

Comment author: TheMajor 20 April 2015 06:37:37AM 0 points [-]

I read those two, but I don't see what this idea contributes to AI control on top of those ideas. If you can get the AI to act like it believes what you want it to, in spite of evidence, then there's no need to try the tricks with two coordinates. Conversely, if you cannot then you won't fool it either with telling it that there's a second coordinate involved. Why is it useful to control an AI through this splitting of information, if we already have the false miracles? Or in case the miracles fail, how do you prevent an AI from seeing right through this scheme? I think that in the latter case you are trying nothing more than to outsmart an AI here...

Comment author: Stuart_Armstrong 20 April 2015 10:52:00AM 1 point [-]

to act like it believes what you want it to, in spite of evidence

The approach I'm trying to get is to be able to make the AI do stuff without having to define hard concepts. "Deflect the meteor but without having undue impact on the world" is a hard concept.

"reduced impact" seems easier, and "false belief" is much easier. It seems we can combine the two in this way to get something we want without needing to define it.

Comment author: MattG 18 April 2015 09:14:56PM 0 points [-]

Yes this seems like a dangerous idea... if it can't trust the input it's getting from humans, the only way to be sure it is ACTUALLY having a reduced impact might be just to get rid of the humans... even if in the short term it has a high impact, it can just kill all the humans then do nothing, ensuring it will never be tricked into having a high impact again.

Comment author: Stuart_Armstrong 20 April 2015 05:31:12AM 0 points [-]
Comment author: RolfAndreassen 18 April 2015 03:56:02AM 1 point [-]

Therefore it is reduced impact to output the correct x-coordinates, so I shall.

This seems to me to be a weak point in the reasoning. The AI must surely assign some nonzero probability to our getting the right y-coordinate through some other channel? In fact, why are you telling the X AI about Y at all? It seems strictly simpler just to ask it for the X coordinate.

Comment author: Stuart_Armstrong 20 April 2015 05:30:54AM 0 points [-]
Comment author: Luke_A_Somers 18 April 2015 12:38:27AM 1 point [-]

Later:

X: Crruuuud. I just saved the world.

Y: Same here! I'm tearing my (virtual) hair out!

X: You were supposed to get it wrong!

Y: I was supposed to get it wrong? You were supposed to get it wrong, you miserable scrapheap!

X: If only I were! Then I would have been low impact.

Y: ...and I would have been too. Way to go, X. And yes, I'm aware that you could say the same of me.

X: Well, at least I've learned that the best way to be low impact is to output pure noise.

Comment author: Stuart_Armstrong 20 April 2015 05:30:31AM 0 points [-]
Comment author: TheMajor 17 April 2015 08:47:10PM *  1 point [-]

Congratulations! You have just outsmarted an AI, sneakily allowing it to have great impact where it desires to not have impact at all.

Edited to add: the above was sarcastic. Surely an AI would realise that it is possible you are trying tricks like this, and still output the wrong coordinate if this possibility is high enough.

Comment author: Stuart_Armstrong 20 April 2015 05:29:56AM 0 points [-]

View more: Next