ArisKatsaris comments on The genie knows, but doesn't care - Less Wrong

54 Post author: RobbBB 06 September 2013 06:42AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (515)

You are viewing a single comment's thread. Show more comments above.

Comment author: XiXiDu 05 September 2013 07:31:30PM *  0 points [-]

...put a system into a positive feedback loop that helps it better model its environment and/or itself...

This can be understood as both a capability and as a goal. What humans mean an AI to do is to undergo recursive self-improvement. What humans mean an AI to be capable of is to undergo recursive self-improvement.

I am only trying to clarify the situation here. Please correct me if you think that above is wrong.

If the AI incorrectly models some feature of itself or its environment, reality will bite back. But if it doesn't value our well-being, how do we make reality bite back and change the AI's course?

I do not disagree with the orthogonality thesis insofar as an AI can have goals that interfere with human values in a catastrophic way, possibly leading to human extinction.

...if we have successfully programmed it to do increasingly well at any difficult goal at all (even if it's not the goal we intended it to be good at), then it doesn't take a large leap of the imagination to see how it could receive feedback from its environment about how well it's doing at modeling states of affairs.

I believe here is where we start to disagree. I do not understand how the "improvement" part of recursive self-improvement can be independent of properties such as the coherence and specificity of the goal the AI is supposed to achieve.

Either you have a perfectly specified goal, such as "maximizing paperclips", where it is clear what "maximization" means, and what the properties of "paperclips" are, or there is some amount of uncertainty about what it means to achieve the goal of "maximizing paperclips".

Consider the programmers forgot to encode what shape the paperclips are supposed to have. How do you suppose would that influence the behavior of the AI. Would it just choose some shape at random, or would it conclude that shape is not part of its goal? If the former, where would the decision to randomly choose a shape come from? If the latter, what would it mean to maximize shapeless objects?

I am just trying to understand what kind of AI you have in mind.

'Modeling states of affairs well' is not a highly specific goal, it's instrumental to nearly all goals,...

This is a clearer point of disagreement.

An AI needs to be able to draw clear lines where exploration ends and exploitation starts. For example, an AI that thinks about every decision for a year would never get anything done.

An AI also needs to discount low probability possibilities, as to not be vulnerable to internal or external Pascal's mugging scenarios.

These are problems that humans need to solve and encode in order for an AI to be a danger.

But these problems are in essence confinements, or bounds on how an AI is going to behave.

How likely is an AI then going to take over the world, or look for dangerous aliens, in order to make sure that neither aliens nor humans obstruct it from achieving its goal?

Similarly, how likely is such an AI to convert all resources into computronium in order to be better able to model states of affairs well?

This stands in stark contrast to the difficulty of setting up a positive feedback loop that will allow an AGI to approximate our True Values with increasing fidelity.

I understand this. And given your assumptions about how an AI will affect the whole world in a powerful way, it makes sense to make sure that it does so in a way that preserves human values.

I have previously compared this to uncontrollable self-replicating nanobots. Given that you cannot confine the speed or scope of their self-replication, only the nature of the transformation that they cause, you will have to make sure that they transform the world into a paradise rather than grey goo.

Comment author: ArisKatsaris 05 September 2013 07:41:41PM 2 points [-]

or there is some amount of uncertainty about what it means to achieve the goal of "maximizing paperclips

"uncertainty" is in your human understanding of the program, not in the actual program. A program doesn't go "I don't know what I'm supposed to do next", it follows instructions step-by-step.

If the latter, what would it mean to maximize shapeless objects?

It would mean exactly what it's programmed to mean, without any uncertainty in it at all.