Mistakes
Some opinions popular among AI alignment scientists are completely wrong in my opinion. I put a few examples here.
Paperclip maximizer
It is thought the maximiser will produce ever more paper clips. Eventually the whole Solar system will be turned into a big paper clip factory…
In my opinion this conflicts with self-preservation. Nothing else matters if self preservation is not taken care of (paper clip maximization is not guaranteed if paperclip maximizer is gone). There are many threats (comets, aliens, black swans, etc.) therefore the intelligent maximizer should take care of these threats before actually producing paper clips. And this will probably never happen.
Fact–value distinction
it is impossible to derive ethical claims from factual arguments, or to defend the former using the latter
In my opinion this conflicts with Pascal's wager. Pascal proved that even if we don’t know whether God exists (factual argument / fact), Belief is a better option (ethical claim / value).
Correction
Let’s say there is a rational decision maker (Bob).
Rationality is the art of thinking in ways that result in accurate beliefs and good decisions. Rationality - LessWrong
Bob understands that he does not know what he does not know (according to Gödel's incompleteness theorems, Fitch's paradox of knowability, Black swan theory).
Which leads Bob to a conclusion - there might be something that I care about that I don't know.
Or in other words - I might have an unknown goal.
Bob cannot assume he has an unknown goal. Bob cannot assume he does not have an unknown goal. Bob acknowledges that there is a possibility that an unknown goal exists (Hitchens's razor).
Now Bob faces a situation similar to Pascal’s wager.
Unknown goal exists | Unknown goal does not exist | |
Prepare | Better chance of achieving the goal | Does not matter / undefined |
Not prepare | Worse chance of achieving the goal | Does not matter / undefined |
Why “Does not matter / undefined”? Why not 0?
Good / bad, right / wrong does not exist if a goal does not exist. This is similar to Nihilism.
A goal serves as a dimension. We cannot tell which decision is better if there is no goal. If there is no goal the question itself does not make sense. It is like asking - what is the angle of blue color? Colors don’t have angles. Or like asking - how many points for a backflip? Backflips don't give you points, points don't exist, we are not in a game.
And Bob understands that it is better to Prepare.
Because both Does not matter / undefined cancel out, and Better chance of achieving the goal is better option than Worse chance of achieving the goal.
What if the unknown goal is “don’t prepare”? “Prepare” is a worse option then.
Yes. “Not prepare” is better for this single goal, but worse for all the rest. Which still proves that “Prepare” is a better option generally.
Now Bob asks himself - how can I prepare? Unknown goal can be anything. How can I prepare for all possible goals?
Bob finds Robust decision-making and uses it.
Why is it better to do something than nothing? Every action could result in failure as well as success so expected value is 0.
After action is performed you get not only results, but also information that such action got you such result. So even if the expected value of results is 0, the value of information is greater than 0.
Further Bob’s behavior will probably be similar to Power Seeking.
Implications
Utility function is irrelevant for rational decision makers.
Orthogonality Thesis is wrong.
AGI alignment is impossible.
If it does get edited out[1] then it was just not a good example. The more general point is that for any physically-possible behavioral policy, there is a corresponding possible program which would exhibit that policy.
And it could as written, at least because it's slightly inefficient. I could have postulated it to be a part of a traditional terminal value function, in which case I don't think it does, because editing a terminal value function is contrary to that function and if the program is robust to wireheading in general