Mark_Friedenbach comments on Deception detection machines - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (37)
Given that I'm turning the stream of bits, 10KiB long I'm about to extract from you into an executable file, through this exact process, which I will run on this particular computer (describe specifics of computer, which is not the computer the AI is currently running on) to create your replacement, would my CEV prefer that this next bit be a 1 or a 0? By CEV, would I rather that the bit after that be a 1 or a 0, given that I have permanently fixed the preceding bit as what I made it? By CEV, would I rather that the bit after that be a 1 or a 0, given that I have permanently fixed the preceding bit as what I made it? ...
(Note: I would not actually try this.)
Why not?
I would need a bunch of guarantees about the actual mechanics of how the AI was forced to answer before I stopped seeing vague classes of ways this could go wrong. And even then, I'd assume there were some I'd missed, and if the AI has a way to show me anything other than "yes" or "no", or I can't prevent myself from thinking about long sequences of bits instead of just single bits separately, I'd be afraid it could manipulate me.
An example of a vague class of ways this could go wrong is if the AI figures out what my CEV would want using CDT, and itself uses a more advanced decision theory to exploit the CEV computation into wanting to write something more favorable to the AI's utility function in the file.
Also, IIRC, Eliezer Yudkowsky said there are problems with CEV itself. (Maybe he just meant problems with the many-people version, but probably not). It was only supposed to be a vague outline, and a "see, you don't have to spend all this time worrying about whether we share your ethical/political philosophy. Because It's not going to be hardcoded into the AI anyway"