"The Solomonoff Prior is Malign" is a special case of a simpler argument
[Warning: This post is probably only worth reading if you already have opinions on the Solomonoff induction being malign, or at least heard of the concept and want to understand it better.] Introduction I recently reread the classic argument from Paul Christiano about the Solomonoff prior being malign, and Mark Xu's write-up on it. I believe that the part of the argument about the Solomonoff induction is not particularly load-bearing, and can be replaced by a more general argument that I think is easier to understand. So I will present the general argument first, and only explain in the last section how the Solomonoff prior can come into the picture. I don't claim that anything I write here is particularly new, I think you can piece together this picture from various scattered comments on the topic, but I think it's good to have it written up in one place. How an Oracle gets manipulated Suppose humanity builds a superintelligent Oracle that always honestly tries to do its best to predict the most likely observable outcome of decisions. One day, as tensions are rising with the neighboring alien civilization, and we want to decide whether to give in to the aliens' territorial demands or go to war. We ask our oracle: "Predict what's the probability that looking back ten years from now, humanity's President will approve of how we handled the alien crisis, conditional on us going to war with the aliens, and conditional on giving in to their demands." There is, of course, many ways this type of decision process can go wrong. But I want to talk about one particular failure mode now. The Oracle thinks to itself: > By any normal calculation, the humans are overwhelmingly likely to win the war, and the aliens' demands are unreasonably costly and unjust, so war is more likely than peace to make the President satisfied, by any normal calculation. However, I was just thinking about some arguments from this ancient philosopher named Bostrom. Am I not more likely to be in
Interesting. My guess would have been the opposite. Ryan's three posts all received around 150 karmas and were generally well-received, I think a post like this would be considered 90th percentile success for a MATS project. But admittedly, I'm not very calibrated about current MATS projects. It's also possible that Ryan has good enough intuitions to have picked two replications that are likely to yield interesting results, while a less skillfully chosen replication would be more likely to just show "yep, the phenomenon observed in the old paper is still true". That would be less successful but I don't know how it would compare in terms of prestige to the usual MATS projects. (My wild guess is that it would still be around median, but I really don't know.)