lmm comments on The Bedrock of Morality: Arbitrary? - Less Wrong

16 Post author: Eliezer_Yudkowsky 14 August 2008 10:00PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (113)

Sort By: Old

You are viewing a single comment's thread. Show more comments above.

Comment author: wedrifid 08 October 2013 11:52:06AM 3 points [-]

If you follow through on this view it seems to lead to the position that everyone has their own referent for "good", and there is no meaningful way for two different humans to argue about whether a given action is good. Which would suggest there is little point trying to persuade other people to be good, or hoping to collaboratively construct a friendly AI (since an l-friendly AI is unlikely to be e-friendly).

Cooperation does not require modification of others to have identical values. Even agents with actively opposed values can cooperate (and so create a mutually friendly AI) so long as the opposition is not perfect in all regards.

Comment author: lmm 08 October 2013 06:18:00PM 1 point [-]

This site has been at pains to emphasise that an AI will be an optimization process of never-before-seen power, rewriting reality in ways that we couldn't possibly predict, and as such an AI whose values are even slightly misaligned with one's own would be catastrophic for one's actual values.

Comment author: wedrifid 08 October 2013 07:08:39PM *  2 points [-]

This site has been at pains to emphasise that an AI will be an optimization process of never-before-seen power, rewriting reality in ways that we couldn't possibly predict, and as such an AI whose values are even slightly misaligned with one's own would be catastrophic for one's actual values.

What is relevant to the decision to create or prevent such an AI from operating is the comparison between what will occur in the absence of the AI and what the AI will do. For example gwern's values are not identical to mine but if I had the choice between pressing a button to release an FAI<gwern> or a button to destroy it then I would press the button to release it. FAI<gwern> isn't as good as FAI<wedrifid> (by subjective tautology) but FAI<gwern> is overwhelmingly better than nothing. I expect FAI<gwern> to allow me to live for millions of years, and for the cosmic commons to be exploited to do things that I generally approve of. Without that AI I think it is most likely that myself and my species will go to oblivion.

The above doesn't even take into account cooperation mechanisms. That's just flat acceptance of optimisation for another's values over distinctly sub-optimisation of my own. When it comes to agents with conflicting values cooperating negotiation applies and if both agents are rational and in a situation where mutual FAI creation is possible but unilateral FAI creation can be prevented then the result will be an FAI that optimises for a compromise of the value systems. To whatever extent the values of the two agents are not perfectly opposed this outcome will be superior to the non-cooperative outcome. For example if gwern and I were in such a situation the expected result would be the release of FAI<CEV<gwern + wedrifid>>. Neither of us will prefer that option over the FAI that is personalised to ourselves but there is still a powerful incentive to cooperate. That outcome is better than what we would have without cooperation. The same applies if a paperclip maximiser and a staple maximiser are put in that situation. (It does not apply is a paperclip maximiser meets a paperclip minimiser.)