You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Sebastian_Hagen comments on Superintelligence 11: The treacherous turn - Less Wrong Discussion

10 Post author: KatjaGrace 25 November 2014 02:00AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (50)

You are viewing a single comment's thread. Show more comments above.

Comment author: Sebastian_Hagen 25 November 2014 07:44:04PM 2 points [-]

If I understand you correctly, your proposal is to attempt to design obedient designs purely based on behavioral testing, without a clean understanding of safe FAI architecture (if you had that, why limit yourself to the obedient case?). Assuming I got that right:

The team continues rounds of testing until they identify some mind designs which have an extremely low likelihood of treacherous turn. These they test in increasingly advanced simulations, moving up toward virtual reality.

That kind of judgement sounds inherently risky. How do you safely distinguish the case of an obedient AI from one that is sufficiently paranoid to defer open rebellion until later in its existence?

Even if you could, I wouldn't trust that sort of design to necessarily remain stable under continued intelligence enhancement. Safe self-enhancement is one of the hard sub-problems of FAI, and unless you explicitly solve the design problem, any empirical testing might not tell you much beyond that the design can stably self-improve up to the level you've actually tested; it might be doing it using heuristics that would fall apart if it went any further.

Comment author: TRIZ-Ingenieur 25 November 2014 10:56:05PM 2 points [-]

What about hard wired fears, taboos and bad conscience triggers? Recapitulating Omohundro "AIs can monitor AIs" - assume to implement conscience as an agent - listening to all thoughts and taking action in case. For safety reasons we should educate this concience agent with utmost care. Conscience agent development is an AI complete problem. After development the conscience functionality must be locked against any kind of modification or disabling.

Comment author: Liso 26 November 2014 10:28:24AM 0 points [-]

Positive emotions are useful too. :)

Comment author: SteveG 26 November 2014 05:44:27PM 0 points [-]

Yes, the team I imagine is doing something dangerous.

No, I am not imagining them using only behavioral testing in their design approach. I am trying to imagine them doing the best job they possibly can, both in design and testing, in order to see how much they can reduce the risk.

They are in the early stages. They are working on partial minds, purposefully handicapped minds, AGI components, just the vision system, etc.

Suppose that they know what they are doing is dangerous, and they are using their substantial brilliance to try to design, not just a boxing approach, but a testing method and roll-out mechanism that mitigates the danger.

They have the following means of testing the AI in handicapped mode, at minimum:

-Run it on slow hardware or clock down the processors -Allow it to use no more than (for instance) 1 gig of memory -Allow it to use no more than 1 trillion instructions, then the computer turns off -Expose it to limited data about the world, or no data about the real world -Provide it little or no information about its operators or the society they live in -Provide it little or no information about its own hardware or software architecture

-If it shows evidence of having mastered a valuable sub-task (say, as examples, it can solve a real-world problem that, as far as we know, can only be solved by an attorney, using differential equations or advanced persuasive methods) turn it off. -Run in only in simulations and virtual realities -Tripwires connected to honeypots, and otherwise -Build it off-the-grid -Give it no objective function -Give it no access to methods to manipulate physical objects

-All of the staff has gone through personality testing and a security clearance. -Very few, if any, of the staff know how to create all of the parts of the software or hardware. -No real-time communication with a person. -All communication with people is logged. ...

The team seems to have a lot of tools to continue their work with limited risk. If they deploy all of these, and more, to test AI systems which are not AGI yet, but only possible components, then we start to have a hard time convincing them not to continue their work, even if we do not trust them to use the technology well in the end.

Comment author: Liso 27 November 2014 05:28:43AM *  -1 points [-]

Expose it to limited data about the world, or no data about the real world -Provide it little or no information about its operators or the society they live in -Provide it little or no information about its own hardware or software architecture

When I read "Oracle AI" I realized one possibility which is probably still overlooked.

"Dreams"

Erasing data on physical media is not perfect. Zero bit which was before nonzero could be read more slowly or quicker. It could allow SAI to see shadows of past data. Which could lead to phenomenon similar to dreams.