I've spent some time over the last two weeks thinking about problems around FAI. I've committed some of these thoughts to writing and put them up here.
There are about a dozen real posts and some scraps. I think some of this material will be interesting to certain LWers; there is a lot of discussion of how to write down concepts and instructions formally (which doesn't seem so valuable in itself, but it seems like someone should do it at some point) some review and observations on decision theory, and some random remarks on complexity theory, entropy, and prediction markets.
I really like your post about hazards which seems to be the most important one in the sequence. You examined an obvious-looking strategy of making an AI figure out what "humans" are, and found a crippling flaw in it. But I don't understand some of your assumptions:
1) Why do you think that humans have a short description, or that our branch of the multiverse has a short pointer to it? I asked that question here. In one of the posts you give a figure of 10000 bits. I have no idea why that would be enough, considering the amount of quantum randomness that was involved in human evolution. I don't even know any argument why the conditional K-complexity of one big human artifact given a full description of another big human artifact has to be low.
2) Why do you think a speed prior is a good idea for protection against the attacks? It doesn't seem likely to me that an actual human is the most computationally efficient predictor of a human's decisions. If we allow a little bit of error, an enemy AI can probably create much more efficient predictors.
(10000 was an upper bound no the extra penalty imposed on the "globa... (read more)