I've spent some time over the last two weeks thinking about problems around FAI. I've committed some of these thoughts to writing and put them up here.
There are about a dozen real posts and some scraps. I think some of this material will be interesting to certain LWers; there is a lot of discussion of how to write down concepts and instructions formally (which doesn't seem so valuable in itself, but it seems like someone should do it at some point) some review and observations on decision theory, and some random remarks on complexity theory, entropy, and prediction markets.
Have you seen Paul's latest post yet? It seems much more well formed than his previous posts on the subject.
I left a comment there, but it's still under moderation, so I'll copy it here.
This seems like a problematic part of the argument. The reason we think torturing humans would be bad according to U is that we have an informal model of humans in our mind, and we know that U is actually a simulation of something that contains a human. Our “suspicion” does not come from studying U as a mathematical object, which is presumably all that a U-maximizer would do, since all it has is a formal definition of U and not our informal knowledge of it.
I agree, though it doesn't go as far afield as many of the other posts. It's actually another plausible winning scenario that I forgot about in the recent discussions: implement WBE via AGI (as opposed to normal engineering route, thus winning the WBE race), and then solve the remaining problems from within. Might be possible to implement when the FAI puzzle is not yet solved completely.