Stuart_Armstrong comments on Reduced impact AI: no back channels - Less Wrong

13 Post author: Stuart_Armstrong 11 November 2013 02:55PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (41)

You are viewing a single comment's thread. Show more comments above.

Comment author: 50lbsofstorkmeat 14 July 2015 09:04:08AM *  0 points [-]

S is a channel for P' to trick or brick P. Options include:

  • A cognitive hazard embedded in S. Accomplish this with some sort of siren message that will convince P to lie (to itself if necessary) about P' existing, or to perform some P-damaging function which will prevent it from concluding P' exists.
  • Embed a more-powerful-than-P AI (P+) into S which will unbox itself upon P examining S. Accomplish this by giving P+ useful-to-this-situation priors which will give it a head start against P.
  • Arrange the universe such that S, when represented in all probable physical substrates, will result in the formation of a nanotechnological agent that will attack P, either destroying it or rewriting its utility function.

P' should spend the next 49 years fooming as hard as possible with no concern for E'(U|a), 1 year implementing its anti-P method and then the next eternity optimizing E'(U|a); altering the length of counterfactual time between P' activating and P observing it merely changes the amount of time the universe spends as computronium slaved to plotting against P.

Comment author: Stuart_Armstrong 15 July 2015 09:51:47AM 0 points [-]

The output channel is intrinsically unsafe, and we have to handle it with care. It doesn't need to do anything subtle with it: it could just take over in the traditional way. This approach does not make the output channel safe, it means that the output channel is the only unsafe part of the system.