You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

William_S comments on Superintelligence 11: The treacherous turn - Less Wrong Discussion

10 Post author: KatjaGrace 25 November 2014 02:00AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (50)

You are viewing a single comment's thread. Show more comments above.

Comment author: William_S 29 November 2014 02:45:30AM 1 point [-]

The presented scenario seems a little too clean. I expect that there's a larger range of things that could happen. I expect some kind of warning sign to be visible for some period of time, unless AI intelligence increase is very rapid.

It might not even be needed if the AI researchers fail to adequately test the system. Ie. if the AI never realizes the scale of the universe during testing, it's utility function might produce the right results during testing, but motivate the wrong behavior when released. This doesn't require active treachery.

AI researchers might notice warning signs that the AI's motivation isn't friendly but ignore them amongst the random bugs of development, requiring less effort at deception on part of the AI.

There might be other variations on the treacherous turn strategy that work better - for example, once the project starts to show promising results, AI shuts down whenever it is in a secure box, and only works once the team is frustrated enough to move it to an environment that turns out to be insecure.

Different AI pathways (neuromorphic, WBI) might have different difficulties for executing treacherous turn, depending on how easy it is to improve themselves vs. being inspected by researchers.