ArisKatsaris comments on The genie knows, but doesn't care - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (515)
This can be understood as both a capability and as a goal. What humans mean an AI to do is to undergo recursive self-improvement. What humans mean an AI to be capable of is to undergo recursive self-improvement.
I am only trying to clarify the situation here. Please correct me if you think that above is wrong.
I do not disagree with the orthogonality thesis insofar as an AI can have goals that interfere with human values in a catastrophic way, possibly leading to human extinction.
I believe here is where we start to disagree. I do not understand how the "improvement" part of recursive self-improvement can be independent of properties such as the coherence and specificity of the goal the AI is supposed to achieve.
Either you have a perfectly specified goal, such as "maximizing paperclips", where it is clear what "maximization" means, and what the properties of "paperclips" are, or there is some amount of uncertainty about what it means to achieve the goal of "maximizing paperclips".
Consider the programmers forgot to encode what shape the paperclips are supposed to have. How do you suppose would that influence the behavior of the AI. Would it just choose some shape at random, or would it conclude that shape is not part of its goal? If the former, where would the decision to randomly choose a shape come from? If the latter, what would it mean to maximize shapeless objects?
I am just trying to understand what kind of AI you have in mind.
This is a clearer point of disagreement.
An AI needs to be able to draw clear lines where exploration ends and exploitation starts. For example, an AI that thinks about every decision for a year would never get anything done.
An AI also needs to discount low probability possibilities, as to not be vulnerable to internal or external Pascal's mugging scenarios.
These are problems that humans need to solve and encode in order for an AI to be a danger.
But these problems are in essence confinements, or bounds on how an AI is going to behave.
How likely is an AI then going to take over the world, or look for dangerous aliens, in order to make sure that neither aliens nor humans obstruct it from achieving its goal?
Similarly, how likely is such an AI to convert all resources into computronium in order to be better able to model states of affairs well?
I understand this. And given your assumptions about how an AI will affect the whole world in a powerful way, it makes sense to make sure that it does so in a way that preserves human values.
I have previously compared this to uncontrollable self-replicating nanobots. Given that you cannot confine the speed or scope of their self-replication, only the nature of the transformation that they cause, you will have to make sure that they transform the world into a paradise rather than grey goo.
"uncertainty" is in your human understanding of the program, not in the actual program. A program doesn't go "I don't know what I'm supposed to do next", it follows instructions step-by-step.
It would mean exactly what it's programmed to mean, without any uncertainty in it at all.