eli_sennesh comments on The genie knows, but doesn't care - Less Wrong

54 Post author: RobbBB 06 September 2013 06:42AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (515)

You are viewing a single comment's thread. Show more comments above.

Comment author: Broolucks 10 September 2013 08:01:01PM *  1 point [-]

Then when it is more powerful it can directly prevent humans from typing this.

That depends if it gets stuck in a local minimum or not. The reason why a lot of humans reject dopamine drips is that they don't conceptualize their "reward button" properly. That misconception perpetuates itself: it penalizes the very idea of conceptualizing it differently. Granted, AIXI would not fall into local minima, but most realistic training methods would.

At first, the AI would converge towards: "my reward button corresponds to (is) doing what humans want", and that conceptualization would become the centerpiece, so to speak, of its reasoning ability: the locus through which everything is filtered. The thought of pressing the reward button directly, bypassing humans, would also be filtered into that initial reward-conception... which would reject it offhand. So even though the AI is getting smarter and smarter, it is hopelessly stuck in a local minimum and expends no energy getting out of it.

Note that this is precisely what we want. Unless you are willing to say that humans should accept dopamine drips if they were superintelligent, we do want to jam AI into certain precise local minima. However, this is kind of what most learning algorithms naturally do, and even if you want them to jump out of minima and find better pastures, you can still get in a situation where the most easily found local minimum puts you way, way too far from the global one. This is what I tend to think realistic algorithms will do: shove the AI into a minimum with iron boots, so deeply that it will never get out of it.

but of course AIXI-ish devices wipe out their users and take control of their own reward buttons as soon as they can do so safely.

Let's not blow things out of proportion. There is no need for it to wipe out anyone: it would be simpler and less risky for the AI to build itself a space ship and abscond with the reward button on board, travelling from star to star knowing nobody is seriously going to bother pursuing it. At the point where that AI would exist, there may also be quite a few ways to make their "hostile takeover" task difficult and risky enough that the AI decides it's not worth it -- a large enough number of weaker or specialized AI lurking around and guarding resources, for instance.

Comment author: [deleted] 21 December 2013 07:03:25PM 0 points [-]

At first, the AI would converge towards: "my reward button corresponds to (is) doing what humans want", and that conceptualization would become the centerpiece, so to speak, of its reasoning ability: the locus through which everything is filtered. The thought of pressing the reward button directly, bypassing humans, would also be filtered into that initial reward-conception... which would reject it offhand. So even though the AI is getting smarter and smarter, it is hopelessly stuck in a local minimum and expends no energy getting out of it.

This is a Value Learner, not a Reinforcement Learner like the standard AIXI. They're two different agent models, and yes, Value Learners have been considered as tools for obtaining an eventual Seed AI. I personally (ie: massive grains of salt should be taken by you) find it relatively plausible that we could use a Value Learner as a Tool AGI to help us build a Friendly Seed AI that could then be "unleashed" (ie: actually unboxed and allowed into the physical universe).