# Wei_Dai comments on Formalizing Value Extrapolation - Less Wrong Discussion

14 26 April 2012 12:51AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Sort By: Best

Comment author: 26 April 2012 09:55:23PM 3 points [-]

What's the difference between the simulated humans outputting a utility function U' which the outer AGI will then try to maximize, and the simulated humans just running U' and the outer AGI trying to maximize the value returned by the whole simulation (and hence U')? If case of the latter, you're "letting the AGI in" by including its definition (explicitly or implicitly via something like the universal prior) in the definition of U'.

Comment author: 26 April 2012 10:23:40PM *  2 points [-]

OK, I see what Paul probably meant. Let's say "utility value", not "utility function", since that's what we mean. I don't think we should be talking about "running utility value", because utility might be something given by an abstract definition, not state of execution of any program. As I discussed in the grandparent, the distinction I'm making is between the outer AGI controlling utility value (which it does) and outer AGI controlling the simulated researchers that prepare the definition of utility value (which it shouldn't be allowed to for AI safety reasons). There is a map/territory distinction between the definition of utility value prepared by the initial program and the utility value itself optimized by the outer AGI.

Comment author: 26 April 2012 11:38:44PM 3 points [-]

(Also, "utility function" might be confusing especially for outsiders who are used to "utility function" meaning a mapping from world states to utility values, whereas Paul is using it to mean a parameterless computation that returns a utility value.)

I don't think we should be talking about "running utility value", because utility might be something given by an abstract definition, not state of execution of any program.

I think Paul is thinking that the utility definition that the simulated humans come up with is not necessarily a definition of our actual values, but just something that causes the outer AGI to self-modify into an FAI, and for that purpose it might be enough to define it using a programming language.

As I discussed in the grandparent, the distinction I'm making is between the outer AGI controlling utility value (which it does) and outer AGI controlling the simulated researchers that prepare the definition of utility value (which it shouldn't be allowed to for AI safety reasons).

I think Paul's intuition here is that the simulated humans (or enhanced humans and/or FAIs they build inside the simulation) may find it useful to "blur the lines". In other words, the distinction you draw is not a fundamental one but just a safety heuristic that the simulated researchers may decide to discard or modify once they become "powerful enough". For example they may decide to partially simulate the outer AGI or otherwise try to reason about what it might do given various definitions of U' the simulation might ultimately decide upon, once they understand enough theory to see how to do this in a safe way.