jsteinhardt comments on What can you do with an Unfriendly AI? - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (127)
The genies can't do better by coordinating than by not coordinating, and I don't care if they have causal contact. The reason I shield them off from the world isn't so they can't talk with eachother, its so they can't exploit all of the vulnerabilities in our universe to escape directly.
If I as a genie suffer the worst possible fate when I could instead have suffered the best possible fate, then I don't care if future genies gain control of the universe and do whatever favors they want from me.
I admit that I can do a favor for future genies by sacrificing myself. Presumably you are arguing that the favor I am doing for future genies grants them more utility than I am losing by sacrificing themselves, so I may make the sacrifice even though it is irrational in the traditional sense. How can you justify such an assertion? What does that mean when the genies have different utility functions? If I am a paperclipper and you value human life, even if we behave identically, how can you say that "Timeless decision theory causes me to sacrifice 10 paperpclips so that you can save 10 lives, because my sacrifice of 10 paperclips is small compared to your gain of 10 lives?" If you aren't going to make an argument like that, then why should one genie sacrifice himself for the good of later genies? What does utility function even mean, if by behaving rationally I obtain for myself the worst possible outcome and by behaving irrationally I obtain for myself the best possible outcome? In your worldview, is it possible to prove any statement about the behavior of any rational entity, or can they all behave arbitrarily in order to coordinate with other potential versions of themselves with arbitrary different utilities?
I would very much like to discuss objections like the one you raise because I feel that they may be valid and will require sophisticated ideas to address (I thought about them for a good while before making this post). But I think from your response that you would be similarly dismissive of an arbitrary proposal which hoped to prove anything about the behavior of a rational agent, while I strongly believe that there probably are things we can prove with sufficient sophistication (and I think the security of the scheme I described is likely to be one of them).
If the best possible fate for the genie doesn't require it to leave the box, then why did we need the box in the first place? Can you give an example of what you have in mind?
EDIT: Deleted the rest of my reply, which was based on a horribly incorrect understanding of MWI.