In his TED Talk, Bostrom's solution to the alignment problem is to build in at the ground level the goal of predicting what we will approve of so that no matter what other goals it's given, it will aim to achieve those goals only in ways that align with our values.
How (and where) exactly does Yudkowsky object to this solution? I can make guesses based on what Yudkowsky says, but so far, I've found no explicit mention by Yudkowsky of Bostrom's solution. More generally, where should I go to find any objections to or evaluations of Bostrom's solution?
In his TED Talk, Bostrom's solution to the alignment problem is to build in at the ground level the goal of predicting what we will approve of so that no matter what other goals it's given, it will aim to achieve those goals only in ways that align with our values.
How (and where) exactly does Yudkowsky object to this solution? I can make guesses based on what Yudkowsky says, but so far, I've found no explicit mention by Yudkowsky of Bostrom's solution. More generally, where should I go to find any objections to or evaluations of Bostrom's solution?