chaosmosis comments on Rationality Quotes August 2012 - Less Wrong

6 Post author: Alejandro1 03 August 2012 03:33PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (426)

You are viewing a single comment's thread. Show more comments above.

Comment author: Eugine_Nier 24 August 2012 06:55:48PM 2 points [-]

The idea, essentially, is that when creating a box AI you would build its utility function such that it wants very badly to stay in the box.

How do you specify precisely what it means to "stay in the box"? In particular, would creating a nearly identical copy of itself except without this limitation outside the box while the original stays in the box count?

Comment author: chaosmosis 24 August 2012 09:39:51PM *  0 points [-]

It would not count, we'd want to make the AI not want this almost identical AI to exist. That seems possible, it would be like how I don't want there to exist an identical copy of me except it eats babies. There are lots of changes to my identity that would be slight but yet that I wouldn't want to exist.

To be more precise, I'd say that it counts as going outside the box if it does anything except think or talk to the Gatekeepers through the text channel. It can use the text channel to manipulate the Gatekeepers to do things, but it can't manipulate them to do things that allow it to do anything other than use the text channel. It would, in a certain sense, be partially deontologist, and be unwilling to do things directly other than text the Gatekeepers. How ironic. Lolz.

Also: how would it do this, anyway? It would have to convince the Gatekeepers to convince the scientists to do this, or teach them computer science, or tell them its code. And if the AI started teaching the Gatekeepers computer code or techniques to incapacitate scientists, we'd obviously be aware that something had gone wrong. And, in the system I'm envisioning, the Gatekeepers would be closely monitored by other groups of scientists and bodyguards, and the scientists would be guarded, and the Gatekeepers wouldn't even have to know who specifically did what on the project.

Comment author: Eugine_Nier 26 August 2012 08:39:13PM 1 point [-]

It would, in a certain sense, be partially deontologist,

And that's the problem. For in practice a partial deontoligist-partial consequentialist will treat its deontoligical rules as obstacles to achieving what its consequentialist part wants and route around them.

Comment author: chaosmosis 27 August 2012 06:17:28PM -2 points [-]

This is both a problem and a solution because it makes the AI weaker. A weaker AI would be good because it would allow us to more easily transition to safer versions of FAI than we would otherwise come up with independently. I think that delaying a FAI is obviously much better than unleashing a UFAI. My entire goal throughout this conversation has been to think of ways that would make hostile FAIs weaker, I don't know why you think this is a relevant counter objection.

You assert that it will just route around the deontological rules, that's nonsense and a completely unwarranted assumption, try to actually back up what you're asserting with arguments. You're wrong. It's obviously possible to program things (eg people) such that they'll refuse to do certain things no matter what the consequences (eg you wouldn't murder trillions of babies to save billions of trillions of babies, because you'd go insane if you tried because your body has such strong empathy mechanisms and you inherently value babies a lot). This means that we wouldn't give the AI unlimited control over its source code, of course, we'd make the part that told it to be a deontologist who likes text channels be unmodifiable. That specific drawback doesn't jive well with the aesthetic of a super powerful AI that's master of itself and the universe, I suppose, but other than that I see no drawback. Trying to build things in line with that aesthetic actually might be a reason for some of the more dangerous proposals in AI, maybe we're having too much fun playing God and not enough despair.

I'm a bit cranky in this comment because of the time sink that I'm dealing with to post these comments, sorry about that.