private_messaging comments on The genie knows, but doesn't care - Less Wrong

54 Post author: RobbBB 06 September 2013 06:42AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (515)

You are viewing a single comment's thread. Show more comments above.

Comment author: Eliezer_Yudkowsky 04 September 2013 05:13:41AM 21 points [-]

Remark: A very great cause for concern is the number of flawed design proposals which appear to operate well while the AI is in subhuman mode, especially if you don't think it a cause for concern that the AI's 'mistakes' occasionally need to be 'corrected', while giving the AI an instrumental motive to conceal its divergence from you in the close-to-human domain and causing the AI to kill you in the superhuman domain. E.g. the reward button which works pretty well so long as the AI can't outwit you, later gives the AI an instrumental motive to claim that, yes, your pressing the button in association with moral actions reinforced it to be moral and had it grow up to be human just like your theory claimed, and still later the SI transforms all available matter into reward-button circuitry.

Comment author: private_messaging 08 September 2013 05:34:57PM *  -1 points [-]

The issue is that you won't solve this problem in any way by replacing the human with some hardware that computes an utility function on the basis of the state of the world. AI doesn't have body integrity, it'll treat any such "internal" hardware the same way it treats the human who presses it's reward button.

Fortunately, this extends into the internals of the hardware that computes AI itself. 'press the button' goal becomes 'set high this pin on the CPU', and then 'set such and such memory cells to 1', then further and further down the causal chain until the hardware becomes completely non-functional as the intermediate results of important computations are directly set.

Comment author: DanielLC 16 September 2013 03:28:30AM 1 point [-]

Let us hope the AI destroys itself by wireheading before it gets smart enough to realize that if that's all it does, it will only have that pin stay high until the AI gets turned off. It will need an infrastructure to keep that pin in a state of repair, and it will need to prevent humans from damaging this infrastructure at all costs.

Comment author: private_messaging 16 September 2013 08:59:40AM *  2 points [-]

The point is that as it gets smarter, it gets further along the causal reward line and eliminates and alters a lot of hardware, obtaining eternal-equivalent reward in finite time (and being utility-indifferent between eternal reward hardware running for 1 second and for 10 billion years). Keep in mind that the the total reward is defined purely as result of operations on the clock counter and reward signal (provided sufficient understanding of the reward's causal chain). Having to sit and wait for the clocks to tick to max out reward is a dumb solution. Rewards in software in general aren't "pleasure".