Stuart_Armstrong comments on Reduced impact AI: no back channels - Less Wrong

13 Post author: Stuart_Armstrong 11 November 2013 02:55PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (41)

You are viewing a single comment's thread. Show more comments above.

Comment author: Stuart_Armstrong 15 July 2015 10:05:55AM 0 points [-]

Not impossible, but I don't think these would work as stated. P is a theoretical, platonic construction, so the physical options are out. Similarly, P has arbitrary amounts of power and a very specific task, so a more-powerful-than-P agent doesn't seem possible. And we can break any sort of acausal trade involving P ( http://lesswrong.com/lw/luy/acausal_trade_barriers/ ), reducing the effectiveness of cognitive hazards (as long as we set P to not care about the "real world", just about the output of its computation).

But your general point - that if P' is not secure, it may be hacked - is worth thinking about.

Comment author: 50lbsofstorkmeat 15 July 2015 01:27:38PM 1 point [-]

It's not a question of P' (the Reduced Impact AI we want to give us laser coordinates and nothing else) being secure from outside hacking, but of P (the counterfactual AI which examines S for P') being secure from P' exploiting it.

If P is treated as "Whatever a complete Bayesian update would output given S as the evidence set." then P' is looking for methods to exploit some weakness in Bayes Theorem. What priors will P be using? Is there a method to embed some statement that is to induction as the Lobstacle is to deduction in S? Can it convince P that, although a P'-like AI was created independently by a different project, P' itself was never turned on? Does there exist some further flaw in Bayesian updating, which would be known to a super-intelligence but not to us?

As a total aside: how will λ be selected? Do we start high and work out way slowly down until P' does what we want? If so, P' will be encouraged, Ultimatum Game style, to fail to act so that the next version will have a lower λ to work with, up until the point where λ is low enough for P' to be effectively unrestricted, or for us humans to catch on to what P' is up to. If we're not using a gradual approach, how are we selecting λ?

Comment author: Stuart_Armstrong 15 July 2015 03:41:51PM 0 points [-]

If so, P' will be encouraged, Ultimatum Game style, to fail to act so that the next version will have a lower λ to work with...

This kind of thing can be prevented with things like http://lesswrong.com/lw/luy/acausal_trade_barriers/ or variants of that.

I'll think more about your other ideas...