Armok_GoB comments on In favour of a selective CEV initial dynamic - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (110)
I just got struck by an idea that seems to obvious, to naive, to possibly be true, and which horrified me causing my brain to throw a huge batch or rationalizations at it to stop me from believing something as obviously low status. I'm currently very undecided, but sich it seems like the thing I can't handle on my own I'll just leave a transcript of my uncensored internal monologue here:
If I am evil, I want to believe I am evil, and if I am nice, I want to believe I am nice. Or maybe I just want to believe I'm nice regardless but have the AI implement my evil preferences anyway.
Please note that I do not endorse my every though, and probably will regret posting this in the morning. As you can see, I'm to tired to even correct this obvious contradiction in my beliefs, and to tired to care I know that I believe every statement is true because I believe I believe a contradiction and I believe contradictions imply all statements being true. Or spelling properly.
Leaving all the in-group/out-group anxiety aside, and assuming I were actually in a position where I get to choose whose volition to extrapolate, there's three options: ...humanity's extrapolated volition is inconsistent with mine (in which case I get less of what I want by using humanity's judgement rather than my own),
...HEV is consistent with, but different from, mine (in which case I get everything I want either way), or
...HEV is identical to mine (in which case I get everything I want either way).
So HEV <= mine.
That said, others more reliably get more of what they want using HEV than using mine, which potentially makes it easier to obtain their cooperation if they think I'm going to use HEV. So I should convince them of that.
But they'd prefer just the CEV of you two to the one of all humanity, and the same goes for each single human who'd raise that objection. The end result is the CEV of you+everyone hat could have stopped you. And this dosn't need handling before you make it either: I'm pretty sure it arises naturally from TDT if you implement your own and were only able to do so because you used this argument on a bunch of people.