Wei_Dai comments on Formalizing Value Extrapolation - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (48)
Here's a stronger version of my previous criticism of this argument. Suppose instead of giving neuroimaging data to the AI and defining H in terms of a brute force search for a model that can explain the neuroimaging data, we give it a cryptographic hash of the neuroimaging data (of sufficient length to avoid possible collisions), and modify the definition of H to first perform a brute force search to recover the neuroimaging data from the hash. In this case, we can still say that torturing is probably bad according to U, but the AI obviously can't arrive at this conclusion from the formal definition of U alone (assuming it can't break the cryptographic hash). It seems clear that we can't safely assume that "the U-maximizer can carry out any reasoning that we can carry out".
In order to "carry out reasoning inspired by human models", the AI has to first form a usable model of a human. I don't have a strong argument that the U-maximizer can't do this from the original definition of U (i.e., from plaintext neuroimaging data), but intuitively it seems implausible given an amount of computing power the U-maximizer might initially have access to (say, within a couple orders of magnitude of the amount needed to do standard WBE). I don't see how "simply asking" could work either. What kind of questions might the U-maximizer ask, and how can we answer it, given that we don't know how to formalize what "torture" means?