You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

KatjaGrace comments on Superintelligence 12: Malignant failure modes - Less Wrong Discussion

7 Post author: KatjaGrace 02 December 2014 02:02AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (50)

You are viewing a single comment's thread.

Comment author: KatjaGrace 02 December 2014 02:26:46AM 5 points [-]

Do you think you could you see would-be malignant failure modes more than once and so have time to learn about them by setting an AI up in a limited, simulated environment?

Comment author: diegocaleiro 02 December 2014 05:23:12AM 3 points [-]

Nested environments with many layers might get the AI confused about whether it has reached the real world yet or not. I don't really like this safety procedure, but it is one of the most promising ones. The bottom Russian doll never knows when the series ends, so it doesn't know when to turn treacherous.

Comment author: TRIZ-Ingenieur 04 December 2014 02:44:10AM -2 points [-]

With very little experimenting an AGI instantly can find out, given it has unfalsified knowledge about laws of physics. For nowadays virtual worlds: take a second mirror into a bathroom. If you see yourself many times in the mirrored mirror you are in the real world. Simulated raytracing cancels rays after a finite number of reflections. Other physical phenomena will show similar discrepencies with their simulated counterparts.

An AGI can easily distinguish where it is: it will use its electronic hardware for some experimenting. Similarly could it be possible to detect a nested simulation.

Comment author: selylindi 04 December 2014 03:49:16AM 1 point [-]

That would depend on it knowing what real-world physics to expect.

Comment author: SteveG 02 December 2014 02:26:51PM 1 point [-]

This is an entire direction of research which deserves vastly more than a single throwaway line in one blog. There should be a whole thread just about this, then a proposal, then a research team on it.

Comment author: the-citizen 10 December 2014 11:55:25AM *  0 points [-]

The risk of course is the AI predicting that it's nested in this sort of environment and finding a way to signal to observers. Even if it's blind to the other layers it might try it just in case. What you want is to develop a way for the simulated world environment to detect a harmful intellgience explosion and send a single bit binary communication "out of the box" to indicate that it has occurred. Then you can shut it down and keep trying multiple instances until you get a success at this level of safety. I guess you can then slowly expand the amount of information that can come "out of the box" for safety. I have no idea how the detection process could be implemented, though... perhaps the difficultly of its implementation would defeat the usefulness of this idea?

EDIT> Interaction between the AGI and the detection mechanism could be problematic... it could predict its existence and find a way to deceive it?